Initializing neural networks

Published

September 4, 2022

High-level libraries are usually enough to solve complex problems without worrying about low-level details.

But Gabriella didn’t have that luxury. Her research got her very deep into how neural networks work.

As part of the project, she wants to write a blog post about the influence of different initialization schemes on how neural networks learn. She wants to cover the problems that could happen when we use the wrong initialization.

Which of the following problems could we face if we don’t initialize the weights correctly?

  1. The network might suffer from vanishing gradients, which could cause a slow down in training.

  2. The network might suffer from exploding gradients, preventing it from learning correctly.

  3. Every network neuron might learn the same features during training.

  4. Every neuron of the network might learn different features during training.

1, 2, 3

Correctly initializing a neural network can have a significant impact on convergence.

We may suffer from the vanishing gradient problem if we use small values to initialize the network’s weights and the exploding gradient problem if we use large weight values instead. The gradients will become smaller or larger as we move from the output layer toward the input layer during backpropagation. “Initializing neural networks” illustrates a specific example with a 9-layer neural network using the identity function as the activation on every layer.

The third choice is correct: if we initialize the network with zeros, every neuron will learn the same features. Here is an excerpt from “Initializing neural networks” explaining the consequences of using the same weight values:

Thus, both hidden units will have identical influence on the cost, which will lead to identical gradients. Thus, both neurons will evolve symmetrically throughout training, effectively preventing different neurons from learning different things.

The final choice, however, is not a problem. We want every neuron to learn different features.

Recommended reading