Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
242 views
in Technique[技术] by (71.8m points)

python - pandas category that includes the closest greater value

I have the following dataframe:

df = pd.DataFrame({'id': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c','c','c'], 'cumsum': [1, 3, 6, 9, 10, 4, 9, 11, 13, 5, 8, 19]})


   id   cumsum
0   a   1
1   a   3
2   a   6
3   a   9
4   a   10
5   b   4
6   b   9
7   b   11
8   b   13
9   c   5
10  c   8
11  c   19

I would like to get a new column with a category such that, for a specific input, for each id it will take the closest greater (or equal) value to be in the first category.

For example:

input = 8

desired output:

    id  cumsum  category
0   a   1   0
1   a   3   0
2   a   6   0
3   a   9   0
4   a   10  1
5   b   4   0
6   b   10  0
7   b   11  1
8   b   13  1
9   c   5   0
10  c   8   0
11  c   19  1
question from:https://stackoverflow.com/questions/65881093/pandas-category-that-includes-the-closest-greater-value

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can get first value greater of equal by input by GroupBy.first and filtered by Series.ge, then compare by Series.gt mapped values by Series.map with Id and last convert mask to integers:

val = 8

s = df[df['cumsum'].ge(val)].groupby('id')['cumsum'].first()

df['category'] = df['cumsum'].gt(df['id'].map(s)).astype(int)
print (df)
   id  cumsum  category
0   a       1         0
1   a       3         0
2   a       6         0
3   a       9         0
4   a      10         1
5   b       4         0
6   b       9         0
7   b      11         1
8   b      13         1
9   c       5         0
10  c       8         0
11  c      19         1

Another idea is use Series.where with GroupBy.transform:

val = 8

s1 = df['cumsum'].where(df['cumsum'].ge(val)).groupby(df['id']).transform('min')
#alternative
s1 = df['cumsum'].where(df['cumsum'].ge(val)).groupby(df['id']).transform('first')

df['category'] =  df['cumsum'].gt(s1).astype(int)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...