Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
188 views
in Technique[技术] by (71.8m points)

javascript - rvest html_table() error max(p) returning -Inf

I'm trying to scrape a table from the web (here https://www.cryptoslam.io/nba-top-shot/marketplace).

I have been researching how to do this and seem to have gotten closest using library rvest and the html_table() function. In fact I was able to download the "FIFA World Cup Record" table from here https://en.wikipedia.org/wiki/Brazil_national_football_team using the code

webpage_url <- "https://en.wikipedia.org/wiki/Brazil_national_football_team"
webpage <- xml2::read_html(webpage_url)
tbls <- html_nodes(webpage, "table")
head(tbls)
tbls_ls <- webpage %>%
  html_nodes("table") %>%
  .[[6]] %>%
  html_table(fill = TRUE)

Note that I have the libraries library(xml2), library(rvest) loaded. I then am using essentially the same code here:

webpage_url <- "https://www.cryptoslam.io/nba-top-shot/marketplace"
webpage <- xml2::read_html(webpage_url)
tbls <- html_nodes(webpage, "table")
head(tbls)
tbls_ls <- webpage %>%
  html_nodes("table") %>%
  .[[1]] %>%
  html_table(fill = TRUE)

but getting the error

Error in matrix(NA_character_, nrow = n, ncol = maxp) : 
  invalid 'ncol' value (too large or NA)
In addition: Warning messages:
1: In max(p) : no non-missing arguments to max; returning -Inf
2: In matrix(NA_character_, nrow = n, ncol = maxp) :
  NAs introduced by coercion to integer range

I have not been able to find any discussion of this error anywhere else. One thing that is different between the two tables is the existence of a thead tag in the second one which won't work. I have quite limited knowledge of html so I may be missing some other important differences between the table implementations.

question from:https://stackoverflow.com/questions/66054630/rvest-html-table-error-maxp-returning-inf

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

One approach is with RSelenium:

library(RSelenium)
library(rvest) #requires xml2, no need to load separately 
driver <- rsDriver(browser="chrome", port=4234L, chromever ="87.0.4280.87")
client <- driver[["client"]]
client$navigate(webpage_url)
source <- client$getPageSource()[[1]]
  
read_html(source) %>% 
  html_nodes("table") %>%
  html_table() %>%
  `[[`(1) -> result

head(result)
            Listed  Rank                   Crypto        Set                  Team Play Category   SN# Current Price          Owner
1 NA 5 minutes ago 10324     2020-21 Bradley Beal   Base Set    Washington Wizards       Handles 10691   (10.00 USD)        P1BenEe
2 NA 5 minutes ago  1096     2019-20 Kelly Olynyk The Finals            Miami Heat         Layup   360  (180.00 USD) Top_Shot3point
3 NA 5 minutes ago  3138      2019-20 Alex Caruso   Base Set    Los Angeles Lakers         Block   679     67.00 USD CaptainThunder
4 NA 5 minutes ago  3586  2020-21 Kelly Oubre Jr.   Base Set Golden State Warriors          Dunk  3583      5.00 USD       dddd9999
5 NA 5 minutes ago  3318  2020-21 Bismack Biyombo   Base Set     Charlotte Hornets         Layup  3315      7.00 USD       ectoasty
6 NA 5 minutes ago  4940 2020-21 DeMarcus Cousins   Base Set       Houston Rockets     3 Pointer  4937    (3.00 USD) StoneColdBroke

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...