machine learning - K-NN: training MSE with K=1 not equal to 0

Question

Welcome To Ask or Share your Answers For Others

machine learning - K-NN: training MSE with K=1 not equal to 0

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

machine learning - K-NN: training MSE with K=1 not equal to 0

In theory, the training MSE for k = 1 should be zero. However, the following script shows otherwise. I first generate some toy data: x represents sleeping hours and y represents happiness. Then I train the data and predict the outcome. Finally, I calculate the MSE for the training data via two methods. Can anyone tell me what goes wrong?

from sklearn.neighbors import KNeighborsRegressor

model = KNeighborsRegressor(n_neighbors=1)

import numpy as np
x = np.array([7,8,6,7,5.7,6.8,8.6,6.5,7.8,5.7,9.8,7.7,8.8,6.2,7.1,5.7]).reshape(16,1)
y = np.array([5,7,4,5,6,9,7,6.8,8,7.6,9.3,8.2,7,6.2,3.8,6]).reshape(16,1)

model = model.fit(x,y)

for hours_slept in range(1,11):
    happiness = model.predict([[hours_slept]])
    print("if you sleep %.0f hours, you will be %.1f happy!" %(hours_slept, happiness))


# calculate MSE

# fast method
def model_mse(model,x,y):
    predictions = model.predict(x)
    return np.mean(np.power(y-predictions,2))
print(model_mse(model,x,y))

The output:

if you sleep 1 hours, you will be 6.0 happy!
if you sleep 2 hours, you will be 6.0 happy!
if you sleep 3 hours, you will be 6.0 happy!
if you sleep 4 hours, you will be 6.0 happy!
if you sleep 5 hours, you will be 6.0 happy!
if you sleep 6 hours, you will be 4.0 happy!
if you sleep 7 hours, you will be 5.0 happy!
if you sleep 8 hours, you will be 7.0 happy!
if you sleep 9 hours, you will be 7.0 happy!
if you sleep 10 hours, you will be 9.3 happy!
0.15999999999999992 #strictly larger than 0!

question from:https://stackoverflow.com/questions/66064161/k-nn-training-mse-with-k-1-not-equal-to-0

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T03:04:05+0000

In theory, the training MSE for k = 1 should be zero

An implicit assumption here is that there are not duplicate samples x, or, to be precise, that same features x have same values y. Is it the case here? Let's see

pred = model.predict(x)

np.where(pred!=y)[0]
# array([9])

So, there is a single value where y and pred are indeed different:

y[9]
# array([7.6])

pred[9]
# array([6.])

where

x[9]
# array([5.7])

How many samples x have a value of 5.7, and what are the correspondent y's?

ind = np.where(x==5.7)[0]
ind
# array([ 4,  9, 15])

y[ind]
# result:
array([[6. ],
       [7.6],
       [6. ]])

pred[ind]
# result
array([[6.],
       [6.],
       [6.]])

So, what is actually happening here is that for x=5.7 the algorithm unsuprisingly cannot decide unambiguously which exact sample is the single closest neighbor - the one with y=6 or the one with y=7.6; and here it has chosen the one that does not coincide with the true y, leading to a non-zero MSE.

I guess that digging into the knn source code one would be able to justify exactly how such cases are handled internally, but I'm leaving this as an exercise.

Categories

machine learning - K-NN: training MSE with K=1 not equal to 0

machine learning - K-NN: training MSE with K=1 not equal to 0

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags