How to correctly define in response.css and yield in scrapy

Question

Welcome To Ask or Share your Answers For Others

How to correctly define in response.css and yield in scrapy

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

How to correctly define in response.css and yield in scrapy

I am new to Scrapy and there is one thing I was trying for two days but still not succeed. I am practicing to extract information of football players listed in https://sofifa.com/. I adopted the code sample from https://docs.scrapy.org/ and edit it as below. The information I am practicing to extract is OVA.

Does anyone know how should I correctly define the element of "span.something..." in the code below?

Many thanks, James

import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
    name = "player-css"
    start_urls = [
        'https://sofifa.com/players?type=all&tm%5B0%5D=1&r=210024&set=true',
    ]

    **def parse(self, response):
        for playerInfor in response.css("div.card"):
            yield {**
                **'OVA': playerInfor.css("span.bp3-tag p::bp3-tag p").extract()**
            }

        next_page_url = response.css("li.next > a::attr(href)").extract_first()
        if next_page_url is not None:
            yield scrapy.Request(response.urljoin(next_page_url))

question from:https://stackoverflow.com/questions/65944731/how-to-correctly-define-in-response-css-and-yield-in-scrapy

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T18:55:50+0000

use this css selector response.css("tbody.list") instead of response.css("div.card")

for the response.css("tbody.list") data is easy to extract but when i use response.css("div.card") result was some empty list with expected output.

for playerInfor in response.css("tbody.list"):
     print( playerInfor.css('td.col.col-oa.col-sort span::text').getall())

output

['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']

another approach

def parse(self, response):
        mydata =response.css('tbody.list td.col.col-oa.col-sort span::text').extract()
        yield {
            "OVA":mydata
        }

#output of mydata

['87', '84', '84', '82', '80', '80', '80', '80', '79', '79', '79', '79', '79', '78', '77', '77', '77', '76', '76', '76', '75', '75', '74', '74', '73', '72', '72', '70', '62', '62', '60', '58', '56']

Categories

How to correctly define in response.css and yield in scrapy

How to correctly define in response.css and yield in scrapy

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

output

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags