Setting the Host header for redirected URLs with Python requests module

Question

Welcome To Ask or Share your Answers For Others

Setting the Host header for redirected URLs with Python requests module

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

Setting the Host header for redirected URLs with Python requests module

I'm working on a web scraping project in Python. I get a daily email from a service that has links in it. A typical link looks like:

http://clicks.serviceprovider.com/track/click/12345/www.serviceprovider.com?p=eyJzI...<snip>...JdfSJ9

In a browser, I can see that the server redirects from http://clicks.serviceprovider.com to https://www.serviceprovider.com?pageId=12345. Naturally, I want to scrape pageId 12345 with my Python code.

If I just do a requests.get(url), the server never responds. I suspect, but don't know for sure, that this is because requests isn't including a Host header.

If I set headers={'Host':'clicks.serviceprovider.com'}, I end up getting an HTTP 403 error. What I think is happening, but cannot demonstrate, is that requests is sending the original http GET request, is getting the HTTP 301 redirect, but when it does a GET for the https:// redirected page, it is still using the Host header for clicks.serviceprovider.com instead of www.serviceprovider.com from the redirected URL.

How can I tell requests to change the Host header with the redirect?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

Setting the Host header for redirected URLs with Python requests module

Setting the Host header for redirected URLs with Python requests module

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags