Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
257 views
in Technique[技术] by (71.8m points)

python - Comparing values in different pairs of columns in Pandas

I would like to count how many times column A has the same value with B and with C. Similarly, I would like to count how many time A2 has the same value with B2 and with C2.

I have this dataframe:

,A,B,C,A2,B2,C2
2018-12-01,7,0,8,17,17,17
2018-12-02,0,0,8,20,18,18
2018-12-03,9,8,8,17,17,18
2018-12-04,8,8,8,17,17,18
2018-12-05,8,8,8,17,17,17
2018-12-06,9,8,8,15,17,17
2018-12-07,8,9,9,17,17,16
2018-12-08,0,0,0,17,17,17
2018-12-09,8,0,0,17,20,18
2018-12-10,8,8,8,17,17,17
2018-12-11,8,8,9,17,17,17
2018-12-12,8,8,8,17,17,17
2018-12-13,8,8,8,17,17,17
2018-12-14,8,8,8,17,17,17
2018-12-15,9,9,9,17,17,17
2018-12-16,12,0,0,17,19,17
2018-12-17,11,9,9,17,17,17
2018-12-18,8,9,9,17,17,17
2018-12-19,8,9,8,17,17,17
2018-12-20,9,8,8,17,17,17
2018-12-21,9,9,9,17,17,17
2018-12-22,10,9,0,17,17,17
2018-12-23,10,11,10,17,17,17
2018-12-24,10,10,8,17,19,17
2018-12-25,7,10,10,17,17,18
2018-12-26,10,0,10,17,19,17
2018-12-27,9,10,8,18,17,17
2018-12-28,9,9,9,17,17,17
2018-12-29,10,10,12,18,17,17
2018-12-30,10,0,10,16,19,17
2018-12-31,11,8,8,19,17,16

I expect the following value:

A with B = 14
A with C = 14
A2 with B2 = 14
A2 with C2 = 14

I have done this:

ia = 0
for i in range(0,len(dfr_h_max1)):
    if dfr_h_max1['A'][i] == dfr_h_max1['B'][i]:
        ia=ia+1
        
ib = 0
for i in range(0,len(dfr_h_max1)):
    if dfr_h_max1['A'][i] == dfr_h_max1['C'][i]:
        ib=ib+1 

In order to take advantage of pandas, this is one possible solution:

import numpy as np
dfr_h_max1['que'] = np.where((dfr_h_max1['A'] == dfr_h_max1['B']), 1, 0)

After that I could sum all the elements in the new column 'que'. Another possibility could be related to some sort of boolean variable. Unfortunately, I still do not have enough knowledge about that.

Any other more efficient or elegant solutions?

question from:https://stackoverflow.com/questions/65837380/comparing-values-in-different-pairs-of-columns-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The primary calculation you need here is, for example, dfr_h_max1['A'] == dfr_h_max1['B'] - as you've done in your edit. That gives you a Series of True/False values based on the equality of each pair of items in the two series. Since True evaluates to 1 and False evaluates to 0, the .sum() is the count of how many True's there were - hence, how many matches.

Put that in a loop and add the required "text" for the output you want:

mains = ('A', 'A2')  # the main columns
comps = (['B', 'C'], ['B2', 'C2'])  # columns to compare each main with

for main, pair in zip(mains, comps):
    for col in pair:
        print(f'{main} with {col} = {(dfr_h_max1[main] == dfr_h_max1[col]).sum()}')
        # or without f-strings, do:
        # print(main, 'with', col, '=', (dfr_h_max1[main] == dfr_h_max1[col]).sum())

Output:

A with B = 14
A with C = 14
A2 with B2 = 21
A2 with C2 = 20

Btw, (df[main] == df[comp]).sum() for Series.sum() can also be written as sum(df[main] == df[comp]) for Python's builtin sum().


In case you have more than two "triplets" of columns (not just A & A2), change the mains and comps to this, so that it works on all triplets:

mains = dfr_h_max1.columns[::3]  # main columns (A's), in steps of 3
comps = zip(dfr_h_max1.columns[1::3],  # offset by 1 column (B's),
            dfr_h_max1.columns[2::3])  # offset by 2 columns (C's),
                                       # in steps of 3

(Or even using the column names / starting letter.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...