Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
232 views
in Technique[技术] by (71.8m points)

python 3.x - Comparing categorical variables between columns in pandas.DataFrame

How do I make this comparison using the set categorical rules instead of the lexicon order rule?

Given the dataset:

df = pd.DataFrame({
    'NUMBER':[12, 26, 16, 34, 38, 1, 26, 8],
    'SHIRT_SIZE':['S', 'M', 'XL', 'L', 'S', 'M', 'L', 'XL'],
    'SHIRT_SIZE2':['M', 'S', 'L', 'XL', 'M', 'L', 'XL', 'S']
})
from pandas.api.types import CategoricalDtype
c_dtype = CategoricalDtype(categories = ["S","M","L","XL"],ordered = True)
df['SHIRT_SIZE'] = df['SHIRT_SIZE'].astype(c_dtype)
df['SHIRT_SIZE2'] = df['SHIRT_SIZE2'].astype(c_dtype)
NUMBER SHIRT_SIZE SHIRT_SIZE2
0 12 S M
1 26 M S
2 16 XL L
3 34 L XL
4 38 S M
5 1 M L
6 26 L XL
7 8 XL S
question from:https://stackoverflow.com/questions/65836986/comparing-categorical-variables-between-columns-in-pandas-dataframe

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Use numpy select to compare the values and genereate your new column:

condlist = [df.SHIRT_SIZE.gt(df.SHIRT_SIZE2), df.SHIRT_SIZE.lt(df.SHIRT_SIZE2)]
result_list = ["LARGER", "SMALLER"]
compare_size = np.select(condlist, result_list, "SAME")
df.assign(compare_size=compare_size)


    NUMBER  SHIRT_SIZE  SHIRT_SIZE2     compare_size
0   12  S   M   SMALLER
1   26  M   S   LARGER
2   16  XL  L   LARGER
3   34  L   XL  SMALLER
4   38  S   M   SMALLER
5   1   M   L   SMALLER
6   26  L   XL  SMALLER
7   8   XL  S   LARGER

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...