web scraping - Wait page to load before getting data with requests.get in python 3

Question

Welcome To Ask or Share your Answers For Others

web scraping - Wait page to load before getting data with requests.get in python 3

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

web scraping - Wait page to load before getting data with requests.get in python 3

I have a page that i need to get the source to use with BS4, but the middle of the page takes 1 second(maybe less) to load the content, and requests.get catches the source of the page before the section loads, how can I wait a second before getting the data?

r = requests.get(URL + self.search, headers=USER_AGENT, timeout=5 )
    soup = BeautifulSoup(r.content, 'html.parser')
    a = soup.find_all('section', 'wrapper')

The page

<section class="wrapper" id="resultado_busca">

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T23:10:10+0000

It doesn't look like a problem of waiting, it looks like the element is being created by JavaScript, requests can't handle dynamically generated elements by JavaScript. A suggestion is to use selenium together with PhantomJS to get the page source, then you can use BeautifulSoup for your parsing, the code shown below will do exactly that:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "http://legendas.tv/busca/walking%20dead%20s03e02"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
a = soup.find('section', 'wrapper')

Also, there's no need to use .findAll if you are only looking for one element only.

Categories

web scraping - Wait page to load before getting data with requests.get in python 3

web scraping - Wait page to load before getting data with requests.get in python 3

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags