Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
212 views
in Technique[技术] by (71.8m points)

python - Selenium Force Entire Page To Load

I'm using selenium with python, and I'm trying to scrape this page. https://www.vexforum.com/u?period=all. I want to be able to get the data for all 40,000 or so users on this forum, but it only loads 50 initially. You can keep scrolling on the page to load all of the forum's members. Is there any way to request the entire page initially, with all 40k members? Thanks for any help you can provide!

question from:https://stackoverflow.com/questions/65838293/selenium-force-entire-page-to-load

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You should use requests (if the robots.txt allow that):

import requests

count = 2
while True:
    try:
        headers = {
            'Accept': 'application/json, text/javascript, */*; q=0.01',
            'Cookie': '_ga=GA1.2.439277064.1611329580; _gat=1; _gid=GA1.2.1557861689.1611329580',
            'Referer': 'https://www.vexforum.com/u?period=all',
            'Host': 'www.vexforum.com',
            'Accept-Language': 'it-it',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.1 Safari/605.1.15',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive',
            'X-CSRF-Token': 'undefined',
            'Discourse-Present': 'true',
            'X-Requested-With': 'XMLHttpRequest',

        }

        params = {
            'order': 'likes_received',
            'page': str(count),
            'period': 'all'
        }

        r = requests.get('https://www.vexforum.com/directory_items?order=likes_received&page=2&period=all', headers=headers, params=params)
        print(r.json())
        print('


')
        print('___________________________________________________')
        print('


')
        count +=1
    except:
        pass

You now have only to parse the json response grab the information you want.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.8k users

...