I am confused as to which metric GridsearchCV is using in its parameter search. My understanding is that my model object feeds it a metric and this is what is used to determine the "best_params". But this doesn't appear to be the case. I thought that score=None is the default and as a result the first metric given in the metrics option of model.compile() was used. So in my case the the scoring function used should be the mean_squred_error. My explanation for this issue is described next.
Here is what I am doing. I simulated some regression data using sklearn with 10 features on 100,000 observations. I am playing around with keras because I typically used pytorch in the past and never really dabbled with keras until now. I am noticing a discrepancy in the loss function output from my GridsearchCV call vs the model.fit() call after I have my optimal set of parameters. Now I know I can just refit=True and not re-fit the model again, but I am trying to get a feel for the output of the keras and sklearn GridsearchCV functions.
To be explicit about the discrepancy here is what I am seeing. I simulated some data using sklearn as follows:
# Setting some data basics
N = 10000
feats = 10
# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)
# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]
I have created a "create_model" function that is looking to tune which activation function I am using (again this is a simple example for a proof of concept).
def create_model(activation_fn):
# create model
model = Sequential()
model.add(Dense(30, input_dim=feats, activation=activation_fn,
kernel_initializer='normal'))
model.add(Dropout(0.2))
model.add(Dense(10, activation=activation_fn))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
# Compile model
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['mean_squared_error','mae'])
return model
Performing the grid search I get the following output
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
Best: -21.163454 using {'activation_fn': 'linear'}
Ok, so the best metric is the mean squared error of 21.16 (I understand they flip the sign to create a maximization problem). So, when I fit the model using the activation_fn = 'linear' the MSE I get is totally different.
best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)
.....
.....
Epoch 49/50
8000/8000 [==============================] - 0s 48us/step - loss: 344.1636 - mean_squared_error: 344.1636 - mean_absolute_error: 12.2109
Epoch 50/50
8000/8000 [==============================] - 0s 48us/step - loss: 326.4524 - mean_squared_error: 326.4524 - mean_absolute_error: 11.9250
history.history['mean_squared_error']
Out[723]:
[10053.778002929688,
9826.66806640625,
......
......
344.16363830566405,
326.45237121582034]
The difference is in 326.45 vs. 21.16. Any insight as to what I am misunderstanding would be greatly appreciated. I would be more comfortable if they were within a reasonable neighborhood of each other, given one is the error from one fold vs the entire training data set. But 21 is nowhere near 326. Thanks!
The entire code is seen here.
import pandas as pd
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from keras.constraints import maxnorm
from sklearn import preprocessing
from sklearn.preprocessing import scale
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt
# Setting some data basics
N = 10000
feats = 10
# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)
# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]
def create_model(activation_fn):
# create model
model = Sequential()
model.add(Dense(30, input_dim=feats, activation=activation_fn,
kernel_initializer='normal'))
model.add(Dropout(0.2))
model.add(Dense(10, activation=activation_fn))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
# Compile model
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['mean_squared_error','mae'])
return model
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)
best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)
history.history.keys()
plt.plot(history.history['mean_absolute_error'])
# summarize results
grid_result.cv_results_
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))