Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

python - Multi index dataframe delete row with maximum value per group

I have a multi index dataframe like this:

PID    Fid    x      y

  A      1    2      3
         2    6      1
         3    4      6
  B      1    3      5
         2    2      4
         3    5      7

I would like to delete the rows with the highest x-value per patient (PID). I need to get a new dataframe with the remaining rows and all columns to continue my analysis on these data, for example the mean of the remaining y-values. The dataframe should look like this:

PID    Fid    x      y

  A      1    2      3
         3    4      6
  B      1    3      5
         2    2      4

I used the code from Python Multiindex Dataframe remove maximum

idx = (df.reset_index('Fid')
                   .groupby('PID')['x']
                   .max()
                   .reset_index()
                   .values.tolist())
df_s = df.loc[df.index.difference(idx)]

I can get idx, but not remove them from the dataframe. It says TypeError: unhashable type: 'list'

What do I do wrong?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can try this:

idx = df.groupby(level=0)['x'].idxmax()
df[~df.index.isin(idx)]

         x  y
PID Fid      
A   1    2  3
    3    4  6
B   1    3  5
    2    2  4

Or

You can use pd.Index.difference here.

df.loc[df.index.difference(df['x'].groupby(level=0).idxmax())] #Use level=0 if index is unnamed
                                         #('PID').idxmax())] 
         x  y
PID Fid      
A   1    2  3
    3    4  6
B   1    3  5
    2    2  4

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...