Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
268 views
in Technique[技术] by (71.8m points)

R and/or Pandas-Python Function to Create a New Variable Based on Conditions if a Different Variable

I've had trouble finding the correct methods to solve a problem. I have an NBA dataset where one of the columns/variables is the player's position. For example, C for Center, SG for Shooting Guard, SG-SF for Shooting Guard/Small Fowrad. My goal is to create 5 new variables - one for each position in basketball: PG, SG, SF, PF, C where the player has a value of 1 in each column that new position column are listed as in the original dataset.

For example, Tyson Chandler would have a 1 in the new C column but a zero in PG, SF, SF, and PF.

I've looked at dpylr's mutate and similar methods but they seem geared towards editing a column based on a condition of the data already existing in that column rather than checking a condition in a different column.

The temporary workaround I've found is too split the dataframe into smaller ones and add the appropriate values for each position group then recombine the sub-dataframes into the full dataframe. However, I'm hoping to find a more elegant solution.

Thanks.

question from:https://stackoverflow.com/questions/65886359/r-and-or-pandas-python-function-to-create-a-new-variable-based-on-conditions-if

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using pandas, there is an short iterative solution to the problem.

First, let us construct a test dataframe called df:

import pandas as pd

df = pd.DataFrame(
    [
        ['Player1', 'PG'],
        ['Player2', 'SG'],
        ['Player3', 'SF'],
        ['Player4', 'PF'],
        ['Player5', 'C'],
        ['Player6', 'PG'],
        ['Player7', 'SF'],
        ['Player8', 'PF'],
        ['Player9', 'C'],
    ],
    columns=['name', 'position']
)

df

Out[1]: 
      name position
0  Player1       PG
1  Player2       SG
2  Player3       SF
3  Player4       PF
4  Player5        C
5  Player6       PG
6  Player7       SF
7  Player8       PF
8  Player9        C

Then, we use the transform method to create new columns (looping over the positions). For each column, we use a lambda function returning 1 if the position is the one matching the column, and 0 otherwise:

for pos in ['PG', 'SG', 'SF', 'PF', 'C']:
    df[pos] = df['position'].transform(lambda p: 1 if p == pos else 0)

df

Out[2]: 
      name position  PG  SG  SF  PF  C
0  Player1       PG   1   0   0   0  0
1  Player2       SG   0   1   0   0  0
2  Player3       SF   0   0   1   0  0
3  Player4       PF   0   0   0   1  0
4  Player5        C   0   0   0   0  1
5  Player6       PG   1   0   0   0  0
6  Player7       SF   0   0   1   0  0
7  Player8       PF   0   0   0   1  0
8  Player9        C   0   0   0   0  1

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...