I have written a crawler script that sends a post request to "sci-hub.do" and i've set it up running on Heroku . But when it tries to send a post or get request, i mostly get 403 forbidden message.
Strange thing is that this only happens when that script is running on Heroku cloud and when i run it on my PC it's all good and i get the 200 status code.
I have tried using a session but it did not work.
I also checked robots.txt of that website and set a User-Agent header to "Twitterbot/1.0" but it still failed.
What am i doing wrong? Why is it only happening when the script is running on Heroku.
I'm pretty sure that the webserver is detecting my script as a crawler bot and tries to block it. But why even after adding a proper "User-agent"?
question from:
https://stackoverflow.com/questions/65925003/403-forbidden-error-when-crawling-a-website-using-python-requests-on-heroku 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…