Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
140 views
in Technique[技术] by (71.8m points)

How to correctly define in response.css and yield in scrapy

I am new to Scrapy and there is one thing I was trying for two days but still not succeed. I am practicing to extract information of football players listed in https://sofifa.com/. I adopted the code sample from https://docs.scrapy.org/ and edit it as below. The information I am practicing to extract is OVA.

Does anyone know how should I correctly define the element of "span.something..." in the code below?

Many thanks, James

import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
    name = "player-css"
    start_urls = [
        'https://sofifa.com/players?type=all&tm%5B0%5D=1&r=210024&set=true',
    ]

    **def parse(self, response):
        for playerInfor in response.css("div.card"):
            yield {**
                **'OVA': playerInfor.css("span.bp3-tag p::bp3-tag p").extract()**
            }

        next_page_url = response.css("li.next > a::attr(href)").extract_first()
        if next_page_url is not None:
            yield scrapy.Request(response.urljoin(next_page_url))
question from:https://stackoverflow.com/questions/65944731/how-to-correctly-define-in-response-css-and-yield-in-scrapy

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

use this css selector response.css("tbody.list") instead of response.css("div.card")

for the response.css("tbody.list") data is easy to extract but when i use response.css("div.card") result was some empty list with expected output.

for playerInfor in response.css("tbody.list"):
     print( playerInfor.css('td.col.col-oa.col-sort span::text').getall())

output

['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']

another approach

def parse(self, response):
        mydata =response.css('tbody.list td.col.col-oa.col-sort span::text').extract()
        yield {
            "OVA":mydata
        }

#output of mydata

['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...