This the data:
square AP-00 AP-01 AP-02 AP-03 AP-04 AP-05 AP-06 AP-07 AP-08
s-01 -30 -28 -40 -44 -62 -60 -78 -60 -62
s-01 -30 -52 -38 -44 -62 -60 -78 -60 -68
s-01 -30 -17 -36 -40 -62 -58 -66 -60 -68
s-01 -28 -19 -36 -36 -62 -56 -36 -52 -68
s-01 -28 -17 -36 -40 -54 -56 -36 -52 -64
... ... ... ... ... ... ... ... ... ...
-Shape of data: 15071 rows × 10 columns
-The Target (y) is a square column
-The Features (X) are AP-00 AP-01 AP-02 AP-03 AP-04 AP-05 AP-06 AP-07 AP-08
The Values are Xs are RSSI values, which depends on the collected RSSI values should classify it in the required square
Square Column is multiclass ( s-01, s-02, s-03)
I fit it with RandomForest Classifier
clf = RandomForestClassifier()
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.30,random_state = 42)
clf.fit(x_train,y_train)
y_hat = clf.predict(x_test)
accuracy_score(y_hat,y_test)
0.9838746309334545
-NOTE: Data is balanced, so I thought its overfitting
I decided to make a cross-validation: to X_train and Y_train
model = RandomForestClassifier()
scores1 = cross_val_score(model,x_train,y_train, cv = 5)
print(scores1)
array([0.98199513, 0.98199513, 0.98442822, 0.97955209, 0.98344693])
Again to X_test and Y_test:
scores2 = cross_val_score(model,x_test,y_test, cv = 10)
print(scores2)
array([0.98637911, 0.97048808, 0.97616345, 0.975 , 0.96931818])
So, Is that mean my model doesn't overfit? or can explain what is going on? and can it give this accuracy without any hyperparameter tuning !!
question from:
https://stackoverflow.com/questions/66046665/is-that-overfitting-problem-or-what-is-going-on