Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
172 views
in Technique[技术] by (71.8m points)

python - Pandas groupby.sum() not working properly?

I gamble on a site called PredictIt. I've downloaded a csv file with all of my Profit and Loss for all markets that I've ever gambled in. I called it dat.

The csv has data for Profit and Loss on every trade ('ProfitLoss'), and the name of the market in which I traded ('MarketName'). After converting the financial data to floats (the $ had to be removed and the () had to be removed for negative numbers), I try to use groupby to get profit and loss for each market, rather than for each trade. But the market_groups file does not include a column for profit/loss. It's adding other numerical columns, but not the one I modified.

for index, row in dat.iterrows():
    if dat['ProfitLoss'][index][0] == '(':
        length = len(dat['ProfitLoss'][index])
        dat['ProfitLoss'][index] = float(dat['ProfitLoss'][index][2:length-1]) * -1
    else:
        length = len(dat['ProfitLoss'][index])
        dat['ProfitLoss'][index] = float(dat['ProfitLoss'][index][1:length-1])
    print(type(dat['ProfitLoss'][index]))

market_groups = dat.groupby(['MarketName']).sum()
question from:https://stackoverflow.com/questions/66054667/pandas-groupby-sum-not-working-properly

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The problem is that the dtype of the Series for "ProfitLoss" is inferred from the original data, i.e. string. You need to either set the Series to float, or set numeric_only=False in sum().

That is, either changing the dtype of "ProfitLoss"

dat = dat.astype({'ProfitLoss': 'float'})
market_groups = dat.groupby(['MarketName']).sum()

Or setting numeric_only to False

market_groups = dat.groupby(['MarketName']).sum(numeric_only= False)

should work.

Also wanted to point out that the -1 in dat['ProfitLoss'][index] = float(dat['ProfitLoss'][index][1:length-1]) should probably not be there? I think you meant this to remove the ending bracket ) but you don't need to do that in the else clause.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...