I'm analyzing Political Advertisements from Facebook, which is a dataset released here, by ProPublica.
There's an entire column of 'targets'
that I want to analyze, but it's formatted such that every observation is a list
of dicts
in string
form (e.g. "[{k1: v1}, {k2: v2}]"
).
import pandas as pd
data = {0: '[{"target": "Age", "segment": "18 and older"}, {"target": "MinAge", "segment": "18"}, {"target": "Segment", "segment": "Multicultural affinity: African American (US)."}, {"target": "Region", "segment": "the United States"}]', 1: '[{"target": "Age", "segment": "45 and older"}, {"target": "MinAge", "segment": "45"}, {"target": "Retargeting", "segment": "people who may be similar to their customers"}, {"target": "Region", "segment": "the United States"}]', 2: '[{"target": "Age", "segment": "18 and older"}, {"target": "MinAge", "segment": "18"}, {"target": "Region", "segment": "Texas"}, {"target": "List"}]', 3: '[]', 4: '[{"target": "Interest", "segment": "The Washington Post"}, {"target": "Gender", "segment": "men"}, {"target": "Age", "segment": "34 to 49"}, {"target": "MinAge", "segment": "34"}, {"target": "MaxAge", "segment": "49"}, {"target": "Region", "segment": "the United States"}]'}
df = pd.DataFrame.from_dict(data, orient='index', columns=['targets'])
# display(df)
targets
0 [{"target": "Age", "segment": "18 and older"}, {"target": "MinAge", "segment": "18"}, {"target": "Segment", "segment": "Multicultural affinity: African American (US)."}, {"target": "Region", "segment": "the United States"}]
1 [{"target": "Age", "segment": "45 and older"}, {"target": "MinAge", "segment": "45"}, {"target": "Retargeting", "segment": "people who may be similar to their customers"}, {"target": "Region", "segment": "the United States"}]
2 [{"target": "Age", "segment": "18 and older"}, {"target": "MinAge", "segment": "18"}, {"target": "Region", "segment": "Texas"}, {"target": "List"}]
3 []
4 [{"target": "Interest", "segment": "The Washington Post"}, {"target": "Gender", "segment": "men"}, {"target": "Age", "segment": "34 to 49"}, {"target": "MinAge", "segment": "34"}, {"target": "MaxAge", "segment": "49"}, {"target": "Region", "segment": "the United States"}]
I need to separate every "target"
value
to become the column header, with each corresponding "segment"
value
to be a value within that column.
Or, is the solution to create a function, to call each dictionary key within each row, to count frequency?
This is what it's supposed to look like as the output:
NAge MinAge Retargeting Region ... Interest Location Granularity Country Gender NAge MinAge Retargeting Region ... Interest Location Granularity Country Gender
0 21 and older 21 people who may be similar to their customers the United States ... NaN NaN NaN NaN
1 18 and older 18 NaN NaN ... Republican Party (United States) country the United States NaN
2 18 and older 18 NaN NaN ... NaN country the United States women```
Someone on Reddit posted this solution:
import json
for id,row in enumerate(df.targets):
for d in json.loads(row):
df.loc[id,d['target']] = d['segment']
df = df.drop(columns=['targets'])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-53-339ae1670258> in <module>
2 for id,row in enumerate(df.targets):
3 for d in json.loads(row):
----> 4 df.loc[id,d['target']] = d['segment']
5
6 df = df.drop(columns=['targets'])
KeyError: 'segment'
question from:
https://stackoverflow.com/questions/65623631/how-to-take-a-column-of-lists-of-dictionary-values-and-create-new-columns-using