python - What is the standard weight decay used when not a specific weight decay is declared?

Question

Welcome To Ask or Share your Answers For Others

python - What is the standard weight decay used when not a specific weight decay is declared?

asked Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - What is the standard weight decay used when not a specific weight decay is declared?

I want to implement an autoencoder where the paper said it used the adam optimizer with an initial learning rate of... and a weight decay set to...

I have read that there are several types of weight decay, what would be the standard used when the exact type of weight decay is not given?

I have also seen (https://www.pyimagesearch.com/2019/07/22/keras-learning-rate-schedules-and-decay/) that there is a standard decay schedule implemented in Keras but could not find it in the Keras documentation. Can this weight decay schedule be used?

The shown standard decay schedule is used like this:

opt = SGD(lr=1e-2, decay=1e-2/epochs)

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-02-19T04:01:55+0000

I think you may've got confused with Learning Rate Decay and Weight Decay, they're both different terms. The link you've shared it's about Learning Rate Decay not Weight Decay.

Weight decay is an additional term added to the gradient descent formula to help to regularize the weights of the network and causes them to exponentially decay to zero (thus prevents from overfitting). If you go through the literature, you'll hear terms like L1 regularizer/L2 regularizer, These are the weight decays we're talking about.

On the other hand, learning rate is a constant term (alpha) in the optimization formula (gradient descent) which decides how big of a step we're going to take towards the gradient. It mostly decides the convergence rate of the algorithm, If we take too big of a step (learning rate) then we may diverge from the optimal solution and If our step (learning rate) is too small, It'd take a long time for us to reach to the goal or we may never reach there at all (because of too little steps). Hence, a great amount of research has gone into finding the "Best" learning rate. Learning Rate Decay is a product of such research, it's a method where the learning rate keeps getting decayed after a certain period, It helps the network to converge faster.

If you're worried about how to use them in Keras then go through these:

Using Learning Rate Schedules for Deep Learning Models in Python with Keras

How to Use Weight Decay to Reduce Overfitting of Neural Network in Keras

Categories

python - What is the standard weight decay used when not a specific weight decay is declared?

python - What is the standard weight decay used when not a specific weight decay is declared?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags