As mentioned above, I am trying to remove HTML from the printed output to just get text and my dividing | and -. I get span information as well as others that I would like to remove. As it is part of the program that is a loop, I cannot search for the individual text information of the page as they change. The page architecture stays the same, which is why printing the items in the list stays the same. Wondering what would be the easiest way to clean the output. Here is the code section:
infoLink = driver.find_element_by_xpath("//a[contains(@href, '?tmpl=component&detail=true&parcel=')]").click()
driver.switch_to.window(driver.window_handles[1])
aInfo = driver.current_url
data = requests.get(aInfo)
src = data.text
soup = BeautifulSoup(src, "html.parser")
parsed = soup.find_all("td")
for item in parsed:
Original = (parsed[21])
Owner = parsed[13]
Address = parsed[17]
print (*Original, "|",*Owner, "-",*Address)
Example output is:
<span class="detail-text">123 Main St</span> | <span class="detail-text">Banner,Bruce</span> - <span class="detail-text">1313 Mockingbird Lane<br>Santa Monica, CA 90405</br></span>
Thank you!
question from:
https://stackoverflow.com/questions/66067920/beautifulsoup-trying-to-remove-html-data-from-list 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…