Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
226 views
in Technique[技术] by (71.8m points)

python - 使用硒刮多页(Scrape multiple pages using selenium)

I'm attempting to webscrape https://stats.nba.com/players/traditional/?sort=PTS&dir=-1 .

(我正在尝试抓取https://stats.nba.com/players/traditional/?sort=PTS&dir=-1 。)

I know I can webscrape the first page.

(我知道我可以网页抓取第一页。)

Now, my dilemma is after I click the button for the next list of players, how would I scrape it again?

(现在,我的困境是在单击下一个球员列表的按钮之后,我将如何再次刮取它?)

This code gives me an error after the first button click.

(第一次单击按钮后,此代码给我一个错误。)

The link of the page does not change after the button click.

(单击按钮后,页面链接不会更改。)

The table chances.

(桌子的机会。)

So after the button click the goal is to scrape the table again for more information.

(因此,单击按钮后,目标是再次刮擦表格以获取更多信息。)

The line with the button click throws this kind of ERROR

(单击按钮的行会引发这种错误)

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=78.0.3904.108)
  (Driver info: chromedriver=2.41.578737 (49da6702b16031c40d63e5618de03a32ff6c197e),platform=Windows NT 10.0.18362 x86_64
driver = webdriver.Chrome(executable_path="./chromedriver/windows/chromedriver.exe")
driver.get(nba_players)
player = driver.find_elements_by_xpath('/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]/div[1]/table/tbody/tr')
new_split = []
player_stats = []

for i in player:
    player_stats.append(i.text.split('
'))
    for z in player_stats:
        new_split.append(z[2].split(' '))
#     button = driver.find_element_by_xpath('/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div/div/a[2]')
#     button.click()
#     time.sleep(120)
  ask by lydol translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This will indeed fail as the element will become stale after you have clicked the button and navigated to a different page.

(实际上,这将失败,因为单击按钮并导航到其他页面后,该元素将变得过时。)

A solution to this is simply to look again each time for your elements.

(一个解决方案是每次都再次查找您的元素。)

Something like this:

(像这样:)

for ind in range(len(player)):
    player = driver.find_elements_by_xpath('/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[2]/div[1]/table/tbody/tr')
    i = player[ind]
    player_stats.append(i.text.split('
'))
    for z in player_stats:
        new_split.append(z[2].split(' '))
#     button = driver.find_element_by_xpath('/html/body/main/div[2]/div/div[2]/div/div/nba-stat-table/div[1]/div/div/a[2]')
#     button.click()
#     time.sleep(120)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...