I am trying to extract this data(number) for many pages from the HTML. The data is different for each page. When I try to use soup.select('span[class="pull-right"]') it should give me the number, but only the tag comes. I believe it is because Javascript is used in the webpage. 180,476 is the position of data at this specific HTML that I want for many pages:
<div class="legend-block--body">
<div class="linear-legend--counts">
Pageviews:
<span class="pull-right">
180,476
</span>
</div>
<div class="linear-legend--counts">
Daily average:
<span class="pull-right">
8,594
</span>
</div></div>
My code(this is in a loop to work for many pages):
res = requests.get(wiki_page, timeout =None)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
ab=soup.select('span[class="pull-right"]')
print(ab)
output:
[<span class="pull-right">
<label class="logarithmic-scale">
<input
class="logarithmic-scale-option" type="checkbox"/>
Logarithmic scale
</label>
</span>, <span class="pull-right">
<label class="begin-at-
zero">
<input class="begin-at-zero-option" type="checkbox"/>
Begin at
zero </label>
</span>, <span class="pull-right">
<label class="show-
labels">
<input class="show-labels-option" type="checkbox"/>
Show
values </label>
</span>]
Example URL:https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=Star_Wars:_The_Last_Jedi
I want the Pageviews
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…