python - Remove duplicate rows from Pandas dataframe where only some columns have the same value

Question

Welcome To Ask or Share your Answers For Others

python - Remove duplicate rows from Pandas dataframe where only some columns have the same value

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Remove duplicate rows from Pandas dataframe where only some columns have the same value

I have a pandas dataframe as follows:

I want that only 1 row remains of rows that share the same values in specific columns. In the example above I mean columns A and B. In other words, if the values of columns A and B occur more than once in the dataframe, only one row should remain (which one does not matter).

FWIW: the maximum number of so called duplicate rows (that is, where column A and B are the same) is 2.

The result should looke like this:

or

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T22:32:22+0000

Use drop_duplicates with parameter subset, for keeping only last duplicated rows add keep='last':

df1 = df.drop_duplicates(subset=['A','B'])
#same as
#df1 = df.drop_duplicates(subset=['A','B'], keep='first')
print (df1)
   A  B  C
0  1  2  x
2  3  4  z
3  3  5  x

df2 = df.drop_duplicates(subset=['A','B'], keep='last')
print (df2)
   A  B  C
1  1  2  y
2  3  4  z
3  3  5  x

Categories

python - Remove duplicate rows from Pandas dataframe where only some columns have the same value

python - Remove duplicate rows from Pandas dataframe where only some columns have the same value

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags