I am trying to implement a model, that will take input as a (q, a) pair where q is Question and a is Answer and both q and a are positional encoded. The output will be how real the answer is based on the given question. So this boils down to a binary classification task where output will be between 0 (fake) and 1 (real).
My model looks like this:
I take in two inputs, concatenate them, pass it through RNN and then use sigmoid to get the probability.
I have defined each train step as:
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam(1e-2)
@tf.function
def train_step(ip, tg, label):
with tf.GradientTape() as tape:
out = model([ip, tg])
loss = cross_entropy(label, out)
print(label, out)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss
and calling the step for each batch using
for epoch in range(epochs):
print("Epoch: %s"%(epoch + 1))
batch_loss = 0.0
for batch, ((ip, tg), label) in enumerate(concat_dataset.take(steps_per_epoch)):
loss = train_step(ip, tg, label)
batch_loss += loss
where ip, tg is (q, a) pair and label of 0 or 1 indicates fake or real (q, a) sample.
When I train the model, I keep getting NaNs or loss as small as 1e-20
I cannot figure out what is wrong here. I thought it was either exploding or diminishing gradients and i tried lowering and increasing the learning rate of adam. I also used SGD but same results.
question from:
https://stackoverflow.com/questions/66055728/tensorflow-nan-or-close-to-zero-loss-while-training-a-discriminator-model-usin 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…