python - Tensorflow - NaN or Close to Zero loss while training a discriminator model using CrossEntropy loss

Question

Welcome To Ask or Share your Answers For Others

python - Tensorflow - NaN or Close to Zero loss while training a discriminator model using CrossEntropy loss

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Tensorflow - NaN or Close to Zero loss while training a discriminator model using CrossEntropy loss

I am trying to implement a model, that will take input as a (q, a) pair where q is Question and a is Answer and both q and a are positional encoded. The output will be how real the answer is based on the given question. So this boils down to a binary classification task where output will be between 0 (fake) and 1 (real).

My model looks like this:

I take in two inputs, concatenate them, pass it through RNN and then use sigmoid to get the probability. I have defined each train step as:

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(1e-2)

@tf.function
def train_step(ip, tg, label):
    with tf.GradientTape() as tape:
        out = model([ip, tg])
        loss = cross_entropy(label, out)
        print(label, out)
        
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

and calling the step for each batch using

for epoch in range(epochs):
        print("Epoch: %s"%(epoch + 1))
        batch_loss = 0.0
        
        for batch, ((ip, tg), label) in enumerate(concat_dataset.take(steps_per_epoch)):
            loss = train_step(ip, tg, label)
            batch_loss += loss

where ip, tg is (q, a) pair and label of 0 or 1 indicates fake or real (q, a) sample. When I train the model, I keep getting NaNs or loss as small as 1e-20 I cannot figure out what is wrong here. I thought it was either exploding or diminishing gradients and i tried lowering and increasing the learning rate of adam. I also used SGD but same results.

question from:https://stackoverflow.com/questions/66055728/tensorflow-nan-or-close-to-zero-loss-while-training-a-discriminator-model-usin

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - Tensorflow - NaN or Close to Zero loss while training a discriminator model using CrossEntropy loss

python - Tensorflow - NaN or Close to Zero loss while training a discriminator model using CrossEntropy loss

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags