Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
253 views
in Technique[技术] by (71.8m points)

pandas - Python Webscraping a table into a Dataframe

I'm trying to figure out how to take this table and put it into a dataframe, however I can't seem to figure out how to do it. So far I've been attempting to go about this from some of the things I have learned in class with a mixture of an answer that was posted here in this forum. But I still can't get it to work. Can anyone help me and explain what they did. I have put my code below:

import requests
import pandas
from bs4 import BeautifulSoup

page = requests.get("https://www.sports-reference.com/cbb/schools/duke/2021-schedule.html")
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", attrs={"class":"sortable stats_table now_sortable"})
table_rows = table.find_all('tr')

l = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    l.append(row)
#test columns
df = pd.DataFrame(l, columns=["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O"])
print(df)
question from:https://stackoverflow.com/questions/65895241/python-webscraping-a-table-into-a-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

My solution,

import pandas as pd

df = pd.read_html("https://www.sports-reference.com/cbb/schools/duke/2021-schedule.html")[1]

# Generate a list of the new columns
new_columns = [chr(x) for x in range(ord('A'), ord('O')+1)]
columns = dict(zip(df.columns, new_columns)
df.rename(columns=columns, inplace=True)
print(df)
     A                  B       C    D    E                     F        G    H     I     J   K    L    M    N                                     O
0    1  Sat, Nov 28, 2020   2:00p  REG  NaN          Coppin State     MEAC    W  81.0  71.0 NaN  1.0  0.0  W 1                Cameron Indoor Stadium
1    2   Tue, Dec 1, 2020   7:30p  REG  NaN    Michigan State (8)  Big Ten    L  69.0  75.0 NaN  1.0  1.0  L 1                Cameron Indoor Stadium
2    3   Fri, Dec 4, 2020   7:00p  REG  NaN            Bellarmine    A-Sun    W  76.0  54.0 NaN  2.0  1.0  W 1                Cameron Indoor Stadium
3    4   Tue, Dec 8, 2020   9:30p  REG  NaN          Illinois (6)  Big Ten    L  68.0  83.0 NaN  2.0  2.0  L 1                Cameron Indoor Stadium
4    5  Wed, Dec 16, 2020   9:00p  REG    @            Notre Dame      ACC    W  75.0  65.0 NaN  3.0  2.0  W 1  Purcell Pavilion at the Joyce Center
5    6   Wed, Jan 6, 2021   8:30p  REG  NaN        Boston College      ACC    W  83.0  82.0 NaN  4.0  2.0  W 2                Cameron Indoor Stadium
6    7   Sat, Jan 9, 2021  12:00p  REG  NaN           Wake Forest      ACC    W  79.0  68.0 NaN  5.0  2.0  W 3                Cameron Indoor Stadium
7    8  Tue, Jan 12, 2021   7:00p  REG    @    Virginia Tech (20)      ACC    L  67.0  74.0 NaN  5.0  3.0  L 1                      Cassell Coliseum
8    9  Tue, Jan 19, 2021   9:00p  REG    @            Pittsburgh      ACC    L  73.0  79.0 NaN  5.0  4.0  L 2                Petersen Events Center
9   10  Sat, Jan 23, 2021   4:00p  REG    @            Louisville      ACC    L  65.0  70.0 NaN  5.0  5.0  L 3                       KFC Yum! Center
10  11  Tue, Jan 26, 2021   9:00p  REG  NaN          Georgia Tech      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
11  12  Sat, Jan 30, 2021  12:00p  REG  NaN          Clemson (20)      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
12  13   Mon, Feb 1, 2021   7:00p  REG    @            Miami (FL)      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
13  14   Sat, Feb 6, 2021   6:00p  REG  NaN        North Carolina      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
14  15   Tue, Feb 9, 2021   4:00p  REG  NaN            Notre Dame      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
15  16  Sat, Feb 13, 2021   4:00p  REG    @  North Carolina State      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
16  17  Wed, Feb 17, 2021   8:30p  REG    @           Wake Forest      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
17  18  Sat, Feb 20, 2021     NaN  REG  NaN         Virginia (13)      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
18  19  Mon, Feb 22, 2021   7:00p  REG  NaN              Syracuse      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
19  20  Sat, Feb 27, 2021   6:00p  REG  NaN            Louisville      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
20  21   Tue, Mar 2, 2021   7:00p  REG    @          Georgia Tech      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN
21  22   Sat, Mar 6, 2021   6:00p  REG    @        North Carolina      ACC  NaN   NaN   NaN NaN  NaN  NaN  NaN                                   NaN

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...