Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
257 views
in Technique[技术] by (71.8m points)

python 3.x - Get combinations of column values from dataframe

I have a DataFrame like this for 70581 rows

    id    created_at ... resource_id
230789    2017-01-19 ...         490
230722    2017-01-19 ...         514
   ...           ... ...         ...
312341    2017-08-27 ...         551

I want to get all possible pairs of resource_id column. If pair is repeated I want to increment the counter of a pair by 1. Result may be something like: (490,514) count 5.

I've tried to use list(itertools.combinations(df['resource_id'],2)) to get the pairs, but instead got MemoryError. How can I get what I want?

question from:https://stackoverflow.com/questions/65651802/get-combinations-of-column-values-from-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You cannot use an iterator for counts elements. The number of elements is hidden before you iterate them.

In addition, the preferred approach in terms of efficiency is to first count each of the elements and then find the pairs. This saves you a lot of time because the pair share is in total multiplying the number of times each of the elements appears.

You can try something like that:

from collections import Counter
import itertools
MyList = [1,1,1,2,3,4,4] # you can insert here your df.series as list
a = dict(Counter(MyList))

combinations = itertools.combinations(list(a.keys()),2)
for i in combinations:
    print (i,a[i[0]]*a[i[1]])

The output is:

 (1, 2) 3
 (1, 3) 3
 (1, 4) 6
 (2, 3) 1
 (2, 4) 2
 (3, 4) 2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...