Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
450 views
in Technique[技术] by (71.8m points)

web scraping - How to use python to extract/download and web-scrape a doc.google.com/spreadsheet link found in a websites source code?

Thanks for looking at my question.

When inspecting a pages source information I found a lot of data I want to retrieve. On the website's source I opened network to find a XHR/.js file with useful data, when I went to its header, I see the following information:

Request URL: https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0
Request Method: GET
Status Code: 200 
Remote Address: 172.217.12.206:443
Referrer Policy: strict-origin-when-cross-origin

Does anyone know of any way to download this doc.google data? preferably using python and one of its libraries?

Thank you

question from:https://stackoverflow.com/questions/65925616/how-to-use-python-to-extract-download-and-web-scrape-a-doc-google-com-spreadshee

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
import requests

r = requests.get('https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0')

with open('google_docs.txt', 'wb') as f:
    f.write(r.content)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...