Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
144 views
in Technique[技术] by (71.8m points)

python - Apply function to grouped dataframe and save multiple outputs into the dataframe

I have a dataframe that looks like this:

          X    Z  participantNum  obsScenario  startPos  targetPos
16000 -16.0 -5.0         6950203            2         2          3
16001 -16.0 -5.0         6950203            2         2          3
16002 -16.0 -5.0         6950203            2         2          3
16003 -16.0 -5.0         6950203            2         2          3
16004 -16.0 -5.0         6950203            2         2          3
16005 -16.0 -5.0         6950203            2         2          3
16006 -16.0 -5.0         6950203            2         2          3
16007 -16.0 -5.0         6950203            2         2          3
16008 -16.0 -5.0         6950203            2         2          3
16009 -16.0 -5.0         6950203            2         2          3

I am trying to apply a function to the 'X' and 'Z' columns that returns 3 outputs. I want to save these outputs into the dataframe. I need the function to applied to the grouped dataframe.

I've tried several ways, using something like this:

def mean_confidence_interval(data, confidence=0.95):
    a = 1.0*np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * scp.stats.t._ppf((1+confidence)/2., n-1)
    return m, m+h, m-h

allDataF['mean_ci'] = allDataF.groupby(['obsScenario', 'startPos', 'targetPos'])['X', 'Z'].apply(mean_confidence_interval)

But I get an error: TypeError: incompatible index of inserted column with frame index

question from:https://stackoverflow.com/questions/66056803/apply-function-to-grouped-dataframe-and-save-multiple-outputs-into-the-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

you can use:

mean_ci = df.groupby(['obsScenario', 'startPos', 'targetPos'])['X'].apply(mean_confidence_interval)
df.join(mean_ci.rename('mean_ci'), 
        on = ['obsScenario', 'startPos', 'targetPos'])

as mean_ci is a Series with indexs ['obsScenario', 'startPos', 'targetPos'], you can not assign the values to the orgin allDataF whose index is 16000 ~ 16009.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...