Convergence rate of gradient descent

These are notes from a talk I presented at the seminar on June 22nd. All this material is drawn from Chapter 7 of Bishop’s Neural Networks for Pattern Recognition, 1995.

In these notes we study the rate of convergence of gradient descent in the neighbourhood of a local minimum. The eigenvalues of the Hessian at the local minimum determine the maximum learning rate and the rate of convergence along the axes corresponding to the orthonormal eigenvectors.

See the eigendecomposition of real, symmetric matrices for the linear algebra preliminaries.