regex - Python regular expression for HTML parsing (BeautifulSoup)

Question

Welcome To Ask or Share your Answers For Others

regex - Python regular expression for HTML parsing (BeautifulSoup)

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - Python regular expression for HTML parsing (BeautifulSoup)

I want to grab the value of a hidden input field in HTML.

<input type="hidden" name="fooId" value="12-3456789-1111111111" />

I want to write a regular expression in Python that will return the value of fooId, given that I know the line in the HTML follows the format

<input type="hidden" name="fooId" value="**[id is here]**" />

Can someone provide an example in Python to parse the HTML for the value?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:04:21+0000

For this particular case, BeautifulSoup is harder to write than a regex, but it is much more robust... I'm just contributing with the BeautifulSoup example, given that you already know which regexp to use :-)

from BeautifulSoup import BeautifulSoup

#Or retrieve it from the web, etc. 
html_data = open('/yourwebsite/page.html','r').read()

#Create the soup object from the HTML data
soup = BeautifulSoup(html_data)
fooId = soup.find('input',name='fooId',type='hidden') #Find the proper tag
value = fooId.attrs[2][1] #The value of the third attribute of the desired tag 
                          #or index it directly via fooId['value']

Categories

regex - Python regular expression for HTML parsing (BeautifulSoup)

regex - Python regular expression for HTML parsing (BeautifulSoup)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags