Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
113 views
in Technique[技术] by (71.8m points)

python - How to extract values in a JSON file into separate columns in a dataframe row


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
  • The 'values' column in 'metrics' is a list of dicts
    • In order to extract 'value', the lists need to be expanded with .explode() so that each dict is on a separate row.
    • 'values' is now a column of dicts, which needs to be converted into a dataframe.
import pandas as pd
import json
from pathlib import Path

# path to JSON file
p = Path('test.json')

# load the JSON file into a python object
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# convert the metrics key into a dataframe
df = pd.json_normalize(data, 'metrics', ['id', 'start_epoch_ms', 'end_epoch_ms'])

# explode the values column
dfe = df.explode('values').reset_index(drop=True)

# convert the column of dicts into a dataframe and join it back to dfe
dfj = dfe.join(pd.DataFrame(dfe.pop('values').values.tolist()), rsuffix='_values')

# groupby the type column and then aggregate the value column into a list
dfg = dfj.groupby('type')['value'].agg(list).reset_index(name='values_list')

# merge the desired list of values back to df
df = df.merge(dfg, on='type').drop(columns=['values'])

# select the final types
desired = df.loc[df['type'].isin(['steps', 'speed', 'pace'])]

# to separate each value in the list to a separate column
final = pd.DataFrame(desired.values_list.to_list(), index=desired.type.to_list())

# display(final.iloc[:, :5])
               0          1         2          3         4        ...
steps  13.000000  11.000000  6.000000  13.000000  5.000000        ...
speed   0.000000   0.000000  0.000000   0.000000  0.000000        ...
pace    8.651985   8.651985  6.542049   6.542049  6.173452        ...

# aggregate calculations
final.agg({'steps': 'sum', 'speed': 'mean', 'pace': 'mean'}, axis=1)

steps    2676.000000
speed       9.657251
pace        5.544723
dtype: float64

Screenshots of dataframes

  • There's to much data in the dataframe to post text examples, so here are some screenshots to given an idea of the break down

Initial df

  • 9 total rows enter image description here

dfe

  • Exploding the column creates a total of 699 rows enter image description here

dfj

  • Create a dataframe from the column and joined it to dfe enter image description here

dfg

  • Creates a list of the desired values enter image description here

Final df

  • values_list are the desired values enter image description here

desired

  • Selected only the desired 'types' enter image description here

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...