Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
208 views
in Technique[技术] by (71.8m points)

machine learning - Is that overfitting problem? or what is going on?

This the data:

square  AP-00  AP-01  AP-02   AP-03  AP-04  AP-05  AP-06   AP-07  AP-08    
s-01     -30   -28    -40     -44    -62    -60    -78     -60    -62   
s-01     -30   -52    -38     -44    -62    -60    -78     -60    -68   
s-01     -30   -17    -36     -40    -62    -58    -66     -60    -68   
s-01     -28   -19    -36     -36    -62    -56    -36     -52    -68   
s-01     -28   -17    -36     -40    -54    -56    -36     -52    -64 
...      ...   ...    ...     ...    ...    ...    ...     ...    ...   

-Shape of data: 15071 rows × 10 columns -The Target (y) is a square column
-The Features (X) are AP-00 AP-01 AP-02 AP-03 AP-04 AP-05 AP-06 AP-07 AP-08

The Values are Xs are RSSI values, which depends on the collected RSSI values should classify it in the required square Square Column is multiclass ( s-01, s-02, s-03)

I fit it with RandomForest Classifier

clf = RandomForestClassifier()
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.30,random_state = 42)
clf.fit(x_train,y_train)
y_hat = clf.predict(x_test)
accuracy_score(y_hat,y_test)

0.9838746309334545

-NOTE: Data is balanced, so I thought its overfitting

I decided to make a cross-validation: to X_train and Y_train

model = RandomForestClassifier()
scores1 = cross_val_score(model,x_train,y_train, cv = 5)
print(scores1)
array([0.98199513, 0.98199513, 0.98442822, 0.97955209, 0.98344693])

Again to X_test and Y_test:

scores2 = cross_val_score(model,x_test,y_test, cv = 10)
print(scores2)
array([0.98637911, 0.97048808, 0.97616345, 0.975     , 0.96931818])

So, Is that mean my model doesn't overfit? or can explain what is going on? and can it give this accuracy without any hyperparameter tuning !!

question from:https://stackoverflow.com/questions/66046665/is-that-overfitting-problem-or-what-is-going-on

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You are not overfiting, so your model is correct.

As you said, if you obtain similar accuracies values between train and test data, you are not overfitting. Probably your problem is quite easy to solve with these features (lucky hahah).

I recommend you plot which features are the most important for your model, this will help you understand a little bit more which features makes you achieve these huge accuracy:

feat_importances = pd.Series(clf.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')

Moreover, Yes, you can obtain such a good accuracy without tuning hyperparameters. Default hyperparameters usually works pretty well, obviously you can increase the accuracies a little bit changing some fields. Tuning hyperparameters are super useful to avoid overfiting, but it's not your case.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...