Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
217 views
in Technique[技术] by (71.8m points)

python - BeautifulSoup Find定期返回无(BeautifulSoup Find periodically returns None)

I am trying to get a value from a class.

(我试图从课堂上获得价值。)

From time to time, find returns the value I need, but another time it no longer works.

(有时,find返回我需要的值,但是下一次它不再起作用。)

Code:

(码:)

import requests
from bs4 import BeautifulSoup

url = 'https://beru.ru/catalog/molotyi-kofe/76321/list'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
                         '(KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

item_count = (soup.find('div', class_='_2StYqKhlBr')).text.split()[4]

print(item_count)
  ask by Nikita Biryukov translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The reason why that you get the values sometimes and sometimes not.

(有时(有时)无法获得值的原因。)

That's because the website is protected by CAPTCHA

(那是因为该网站受CAPTCHA保护)

So when the request is blocked by CAPTCHA

(因此,当请求被CAPTCHA阻止时)

It's became like the following:

(它变得像下面这样:)

https://beru.ru/showcaptcha?retpath=https://beru.ru/catalog/molotyi-kofe/76321/list?ncrnd=4561_aa1b86c2ca77ae2b0831c4d95b9d85a4&t=0/1575204790/b39289ef083d539e2a4630548592a778&s=7e77bfda14c97f6fad34a8a654d9cd16

You can verify by parse the response content:

(您可以通过解析响应内容来验证:)

import requests
from bs4 import BeautifulSoup


r = requests.get(
    'https://beru.ru/catalog/molotyi-kofe/76321/list')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': '_2StYqKhlBr _1wAXjGKtqe'}):
    print(item)

for item in soup.findAll('div', attrs={'class': 'captcha__image'}):
    for captcha in item.findAll('img'):
        print(captcha.get('src'))

And you will get the CAPTCHA image link:

(您将获得CAPTCHA图像链接:)

https://beru.ru/captchaimg?aHR0cHM6Ly9leHQuY2FwdGNoYS55YW5kZXgubmV0L2ltYWdlP2tleT0wMEFMQldoTnlaVGh3T21WRmN4NWFJRUdYeWp2TVZrUCZzZXJ2aWNlPW1hcmtldGJsdWU,_0/1575206667/b49556a86deeece9765a88f635c7bef2_df12d7a36f0e2d36bd9c9d94d8d9e3d7

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...