machine learning - Is that overfitting problem? or what is going on?

Question

Welcome To Ask or Share your Answers For Others

machine learning - Is that overfitting problem? or what is going on?

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

machine learning - Is that overfitting problem? or what is going on?

This the data:

square  AP-00  AP-01  AP-02   AP-03  AP-04  AP-05  AP-06   AP-07  AP-08    
s-01     -30   -28    -40     -44    -62    -60    -78     -60    -62   
s-01     -30   -52    -38     -44    -62    -60    -78     -60    -68   
s-01     -30   -17    -36     -40    -62    -58    -66     -60    -68   
s-01     -28   -19    -36     -36    -62    -56    -36     -52    -68   
s-01     -28   -17    -36     -40    -54    -56    -36     -52    -64 
...      ...   ...    ...     ...    ...    ...    ...     ...    ...

-Shape of data: 15071 rows × 10 columns -The Target (y) is a square column
-The Features (X) are AP-00 AP-01 AP-02 AP-03 AP-04 AP-05 AP-06 AP-07 AP-08

The Values are Xs are RSSI values, which depends on the collected RSSI values should classify it in the required square Square Column is multiclass ( s-01, s-02, s-03)

I fit it with RandomForest Classifier

clf = RandomForestClassifier()
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.30,random_state = 42)
clf.fit(x_train,y_train)
y_hat = clf.predict(x_test)
accuracy_score(y_hat,y_test)

0.9838746309334545

-NOTE: Data is balanced, so I thought its overfitting

I decided to make a cross-validation: to X_train and Y_train

model = RandomForestClassifier()
scores1 = cross_val_score(model,x_train,y_train, cv = 5)
print(scores1)
array([0.98199513, 0.98199513, 0.98442822, 0.97955209, 0.98344693])

Again to X_test and Y_test:

scores2 = cross_val_score(model,x_test,y_test, cv = 10)
print(scores2)
array([0.98637911, 0.97048808, 0.97616345, 0.975     , 0.96931818])

So, Is that mean my model doesn't overfit? or can explain what is going on? and can it give this accuracy without any hyperparameter tuning !!

question from:https://stackoverflow.com/questions/66046665/is-that-overfitting-problem-or-what-is-going-on

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:16:32+0000

You are not overfiting, so your model is correct.

As you said, if you obtain similar accuracies values between train and test data, you are not overfitting. Probably your problem is quite easy to solve with these features (lucky hahah).

I recommend you plot which features are the most important for your model, this will help you understand a little bit more which features makes you achieve these huge accuracy:

feat_importances = pd.Series(clf.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')

Moreover, Yes, you can obtain such a good accuracy without tuning hyperparameters. Default hyperparameters usually works pretty well, obviously you can increase the accuracies a little bit changing some fields. Tuning hyperparameters are super useful to avoid overfiting, but it's not your case.

Categories

machine learning - Is that overfitting problem? or what is going on?

machine learning - Is that overfitting problem? or what is going on?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags