python - BeautifulSoup trying to remove HTML data from list

Question

Welcome To Ask or Share your Answers For Others

python - BeautifulSoup trying to remove HTML data from list

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - BeautifulSoup trying to remove HTML data from list

As mentioned above, I am trying to remove HTML from the printed output to just get text and my dividing | and -. I get span information as well as others that I would like to remove. As it is part of the program that is a loop, I cannot search for the individual text information of the page as they change. The page architecture stays the same, which is why printing the items in the list stays the same. Wondering what would be the easiest way to clean the output. Here is the code section:

        infoLink = driver.find_element_by_xpath("//a[contains(@href, '?tmpl=component&detail=true&parcel=')]").click()
        driver.switch_to.window(driver.window_handles[1])
        aInfo = driver.current_url
        data = requests.get(aInfo)
        src = data.text
        soup = BeautifulSoup(src, "html.parser")
        parsed = soup.find_all("td")
        for item in parsed:
            Original = (parsed[21])
            Owner = parsed[13]
            Address = parsed[17]
            print (*Original, "|",*Owner, "-",*Address)

Example output is:

<span class="detail-text">123 Main St</span> | <span class="detail-text">Banner,Bruce</span> - <span class="detail-text">1313 Mockingbird Lane<br>Santa Monica, CA  90405</br></span>

Thank you!

question from:https://stackoverflow.com/questions/66067920/beautifulsoup-trying-to-remove-html-data-from-list

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:01:25+0000

To get the text between the tags just use get_text() but you should be aware, that there is always text between the tags to avoid errors:

for item in parsed:
    Original = (parsed[21].get_text(strip=True))
    Owner = parsed[13].get_text(strip=True)
    Address = parsed[17].get_text(strip=True)

Categories

python - BeautifulSoup trying to remove HTML data from list

python - BeautifulSoup trying to remove HTML data from list

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags