In theory, the training MSE for k = 1 should be zero. However, the following script shows otherwise. I first generate some toy data: x represents sleeping hours and y represents happiness. Then I train the data and predict the outcome. Finally, I calculate the MSE for the training data via two methods. Can anyone tell me what goes wrong?
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor(n_neighbors=1)
import numpy as np
x = np.array([7,8,6,7,5.7,6.8,8.6,6.5,7.8,5.7,9.8,7.7,8.8,6.2,7.1,5.7]).reshape(16,1)
y = np.array([5,7,4,5,6,9,7,6.8,8,7.6,9.3,8.2,7,6.2,3.8,6]).reshape(16,1)
model = model.fit(x,y)
for hours_slept in range(1,11):
happiness = model.predict([[hours_slept]])
print("if you sleep %.0f hours, you will be %.1f happy!" %(hours_slept, happiness))
# calculate MSE
# fast method
def model_mse(model,x,y):
predictions = model.predict(x)
return np.mean(np.power(y-predictions,2))
print(model_mse(model,x,y))
The output:
if you sleep 1 hours, you will be 6.0 happy!
if you sleep 2 hours, you will be 6.0 happy!
if you sleep 3 hours, you will be 6.0 happy!
if you sleep 4 hours, you will be 6.0 happy!
if you sleep 5 hours, you will be 6.0 happy!
if you sleep 6 hours, you will be 4.0 happy!
if you sleep 7 hours, you will be 5.0 happy!
if you sleep 8 hours, you will be 7.0 happy!
if you sleep 9 hours, you will be 7.0 happy!
if you sleep 10 hours, you will be 9.3 happy!
0.15999999999999992 #strictly larger than 0!
question from:
https://stackoverflow.com/questions/66064161/k-nn-training-mse-with-k-1-not-equal-to-0