Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

scikit learn - Random Forest "Feature Importance"

I am currently working on Random Forest Classifier. One of the parameters of Random Forest Classifier is "Criterion" which has 2 options : Gini or Entropy. Low value of Gini is preffered and high value of Entropy is preffered. By default, gini is criterion for Random Forest Classifier.

There is an attribute called feature_importances_ provided by sklearn, where we get the values of the attributes/features provided. By using we can select some features and eliminate some using "threshold and SelectFromModel"

My doubt is that, on what basis these feature_importances_ are calculated? Assume default criterion "Gini" is available. If I assume the feature_importances_ are "Gini Importances" then low value is preffered, but in feature importances, high values are preffered

question from:https://stackoverflow.com/questions/66059092/random-forest-feature-importance

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

features_importances_ always output the importance of the features. If the value is bigger, more important is the feature, don't take in consideration gini or entropy criterion, it doesn't matter. Criterion is used to build the model. Feature importance is applied after the model is trained, you only "analyze" and observe which values have been more relevant in your trained model.

Moreover, you will see that all features_importances_ sums to 1, so the importance is seen as a percentage too.

Since RandomForest is formed by several trees, feature importances are averaged over all the trees.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...