Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
474 views
in Technique[技术] by (71.8m points)

python - beautifulSoup find_all() with series of tags

I am looking to search a website for specific tags within tags with bs find_all(). for example, searching for data within:

<li class='x'>
    <small class='y'>

I am currently using this code to search but am coming up with extra results from elsewhere on the html page because I haven't specified that I only want to search within li tags with class x.

labels = [element.text for element in soup.find_all('small', {'class':'label'})]

how do I specify specifically where I want to search?

question from:https://stackoverflow.com/questions/65932800/beautifulsoup-find-all-with-series-of-tags

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can specify like this:

optionA = [element.text for element in soup.find('ul').find_all('small', {'class':'label'})]

Will first find the parent <ul> and than all <small>

optionB = [element.text for element in soup.select('ul small.label')]

Alternativ, use the css selectors, in my opinion much better for chaining tags and classes.

Example

from bs4 import BeautifulSoup

html = '''<ul>
  <li><small class="label">Coffee</small></li>
  <li><small class="label">Tea</small></li>
  <li><small class="label">Milk</small></li>
</ul>'''

soup = BeautifulSoup(html,)

optionA = [element.text for element in soup.find('ul').find_all('small', {'class':'label'})]
optionB = [element.text for element in soup.select('ul small.label')]

print(optionA)
print(optionB)

Output

['Coffee', 'Tea', 'Milk']
['Coffee', 'Tea', 'Milk']

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...