I've been using BeautifulSoup on and off for a few years, and I still get tripped up from time to time. I put together this code.
from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
import requests
resp = requests.get("https://finance.yahoo.com/gainers")
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
soup = BeautifulSoup(resp.content, from_encoding=encoding)
myclass = soup.findAll("a", {"class": "Fw(600) C($linkColor)"})
myclass
That gives me this.
[<a class="Fw(600) C($linkColor)" data-reactid="79" href="/quote/TSNP?p=TSNP" title="Tesoro Enterprises, Inc.">TSNP</a>,
<a class="Fw(600) C($linkColor)" data-reactid="105" href="/quote/FDVRF?p=FDVRF" title="Facedrive Inc.">FDVRF</a>,
<a class="Fw(600) C($linkColor)" data-reactid="131" href="/quote/SKLZ?p=SKLZ" title="Skillz Inc.">SKLZ</a>,
<a class="Fw(600) C($linkColor)" data-reactid="157" href="/quote/GOOS?p=GOOS" title="Canada Goose Holdings Inc.">GOOS</a>,
<a class="Fw(600) C($linkColor)" data-reactid="183" href="/quote/WMS?p=WMS" title="Advanced Drainage Systems, Inc.">WMS</a>, etc., etc.
What I really want is the stock symbols: TSNP, FDVRF, SKLZ, GOOS, WMS, etc., etc.
How can I modify this code to get just the stock symbols? I tried to use regex, but I've never been very proficient with that.
Thanks everyone.
question from:
https://stackoverflow.com/questions/66056527/trying-to-get-a-small-part-of-an-html-a-class-element 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…