Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
343 views
in Technique[技术] by (71.8m points)

python - Random Seed Chose Different Rows

I was applying .sample with random_state set to a constant and after using set_index it started selecting different rows. A member dropped that was previously included in the subset. I'm unsure how seeding selects rows. Does it make sense or did something go wrong?

Here is what was done:

df.set_index('id',inplace=True, verify_integrity=True)

df_small_F = df.loc[df['gender']=='F'].apply(lambda x: x.sample(n=30000, random_state=47))

df_small_M = df.loc[df['gender']=='M'].apply(lambda x: x.sample(n=30000, random_state=46))

df_small=pd.concat([df_small_F,df_small_M],verify_integrity=True)

When I sort df_small by index and print, it produces different results.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Applying .sort_index() after reading in the data and before performing .sample() corrected the issue. As long as the data remains the same, this will produce the same sample everytime.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...