How to split the column in python, Spark json file

Question

Welcome To Ask or Share your Answers For Others

How to split the column in python, Spark json file

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

How to split the column in python, Spark json file

I have a spark dataframe (Json file) with many columns (contributors, country name, prices ...) The column Prices has a list of values.

Example for one country (one row in a table):

{'contributors': '200',
'name': 'Andorra',
'prices': [{'min price': '23.0',
'avg price': '25.5',
'max price': '32.5',
'item name': 'meal'},
{'min price': '17.0',
'avg price': '20.5',
'max price': '24.5',
'item name': 'drinks'}, ....] }

I want to split a column with prices, with item_names as a column names, and average prices as values in that column.

It should look like this:

{'contributors': '200',
'name': 'Andorra',
'meal': '25.5',
'drinks': '20.5',
 .... }

There 55 items for prices. So the table should have 55 columns for prices instead of one.

question from:https://stackoverflow.com/questions/65849367/how-to-split-the-column-in-python-spark-json-file

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:29:33+0000

You can explode the prices array, and then do a group by and pivot based on the relevant struct values:

import pyspark.sql.functions as F

df2 = (df.selectExpr('name', 'explode(prices) prices')
         .groupBy('name')
         .pivot('prices.item_name')
         .agg(F.first('prices.average_price'))
      )

Categories

How to split the column in python, Spark json file

How to split the column in python, Spark json file

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags