Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
137 views
in Technique[技术] by (71.8m points)

python - How to link data frame and calculate

Currently have 2 data frames, one stored visit record, while the other stored purchase record.

The objective is to count the number of consumer for each bag brand. In Visit record, the visit id is unique. But in purchase record, as one purchase could include more than one brand of bags, the same visit id could have multiple bag id.

Please note that in consumer record, there's a Date field. As the aim is to calculate how many customers brought certain brand of bag regardless of the time. eg if the customer Amy brought the same brand of bag like Prada in both 2019 and 2020. It would still be counted as 1 towards the customer count for Prada. But for now, the count show 2 instead of 1. Any idea? Thanks

Visit record [df1]

[
   {
      "Date":"2019",
      "Visit id":"12435",
      "Consumer":[
         {
            "Reference id":"R0009",
            "Consumer name":"Sally"
         }
      ]
   },
   {
      "Date":"2020",
      "Visit id":"23453",
      "Consumer":[
         {
            "Reference id":"R0009",
            "Consumer name":"Sally"
         }
      ]
   },
   {
      "Date":"2019",
      "Visit id":"56674",
      "Consumer":[
         {
            "Reference id":"R0010",
            "Consumer name":"Amy"
         }
      ]
   },
   {
      "Date":"2020",
      "Purchase id":"65432",
      "Consumer":[
         {
            "Reference id":"R0010",
            "Consumer name":"Amy"
         }
      ]
   }
]

Purchase record [df2]

[
   {
      "Visit id":"12435",
      "Product":[
         {
            "Bag id":"BAG001",
            "Bag name":"Prada"
         },
         {
            "Bag id":"BAG004",
            "Bag name":"Gucci"
         }
      ]
   },
   {
      "Visit id":"23453",
      "Product":[
         {
            "Bag id":"BAG006",
            "Bag name":"Chanel"
         },
         {
            "Bag id":"BAG007",
            "Bag name":"Marc"
         }
      ]
   },
   {
      "Visit id":"56674",
      "Product":[
         {
            "Bag id":"BAG001",
            "Bag name":"Prada"
         },
         {
            "Bag id":"BAG006",
            "Bag name":"Chanel"
         }
      ]
   },
   {
      "Purchase id":"65432",
      "Product":[
         {
            "Bag id":"BAG001",
            "Bag name":"Prada"
         },
         {
            "Bag id":"BAG007",
            "Bag name":"Marc"
         }
      ]
   }
]

Code

df1=pd.json_normalize(data=json.loads(json1), record_path='Consumer', meta=['Date','Visit id'])
df2=pd.json_normalize(data=json.loads(json2), record_path='Purchase', meta=['Visit id'])


df3=df1.merge(df2, on='Visit id', how='left').dropna()
df3=df3.groupby(['Bag id', 'Bag name']).agg({'Consumer name': [list, 'count']})
df3

Output

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...