Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
505 views
in Technique[技术] by (71.8m points)

python - val_loss did not improve from inf + loss:nan Error while training

I have a problem that occurs when I start training my model. This error says that val_loss did not improve from inf and loss: nan. At the beginning I thought it was because of the learning rate but now I'am not sure what it is because I've tried ceveral different learning rates and none of those worked for me. I hope that someone can help me.

My preferences optimizer = adam, learning rate = 0.01 (I've already tried a bunch of different learning rates for example: 0.0005, 0.001, 0.00146,0.005,0.5,0.6,0.7,0.8 but none of these worked for me) EarlyStopping = enabled (Training is stopping because of the EarlyStopping at epoch 3 because there is no improvement. I've also disabled EarlyStopping every time the model stopped the training at epoch 3 and let it make 100 epochs without EarlyStopping enabled.) ReduceLR = disabled

On what I try to train my model I try to train this model on my gpu (EVGA RTX 3080 FTW3 ULTRA)

model = Sequential()


model.add(Conv2D(32,(3,3),padding='same',kernel_initializer='he_normal',input_shape=(img_rows, img_cols,1)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(32,(3,3),padding='same',kernel_initializer='he_normal',input_shape=(img_rows,img_cols,1)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))


model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))


model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))


model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))


model.add(Flatten())
model.add(Dense(64,kernel_initializer='he_normal'))
model.add(BatchNormalization())
model.add(Dropout(0.5))


model.add(Dense(64,kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))


model.add(Dense(num_classes,kernel_initializer='he_normal'))
model.add(Activation('softmax'))


print(model.summary())


from keras.optimizers import RMSprop,SGD,Adam
from keras.callbacks import ModelCheckpoint,EarlyStopping,ReduceLROnPlateau 


checkpoint = ModelCheckpoint('Wave.h5',
                             monitor='val_loss',
                             mode='min',
                             save_best_only=True,
                             verbose=1)


earlystop = EarlyStopping(monitor='val_loss',
                              min_delta=0,
                              patience=3,
                              verbose=1,
                              restore_best_weights=True)


'''reduce_lr = ReduceLROnPlateau(monitor='val_loss',
                              factor=0.2,
                              patience=3,
                              verbose=1,
                              min_delta=0.0001)'''


callbacks = [earlystop,checkpoint] #reduce_lr


model.compile(loss='categorical_crossentropy',
              optimizer= Adam(lr=0.01),   
              metrics=['accuracy'])
question from:https://stackoverflow.com/questions/65545918/val-loss-did-not-improve-from-inf-lossnan-error-while-training

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Few Comments...

In these kind of situation, the most preferable is the trial and error approach. It seems like your parameters have diverged while training. Lots of possibilities could be the issue. Also, it seems like you are regularizing your network as well (dropouts, BatchNorm, etc)

Suggestions:

  • Normalize your input data before feeding into the network
  • Comment out/remove all the dropouts (regularization)/kernel_initializer(use default initialization)/Early stopping etc that you're using from your preference, and let the network be a plain CNN network with just conv layer, pooling, batchnorm, and dense layer. If you see improvements, then start uncommenting one by one and you'll understand what was causing the problem.
  • Try using larger units in the dense layer like 1000 for example, as the dense layer extracts everything (features of the image) the CNN layers have compressed.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...