Re-parameterising for non-negativity yields multiplicative updates

Suppose you have a model that depends on real-valued parameters, and that you would like to constrain these parameters to be non-negative. For simplicity, suppose the model has a single parameter a \in \mathbb R. Let E denote the error function. To constrain a to be non-negative, parameterise a as the square of a real-valued parameter \alpha \in \mathbb R:

    \[a = \alpha^2, \quad \alpha \in \mathbb R.\]

We can now minimise E by choosing \alpha without constraints, e.g. by using gradient descent. Let \lambda > 0 be the learning rate. We have

    \begin{eqnarray*} \alpha^{\text{new}} &=& \alpha - \lambda \frac{\partial E}{\partial \alpha} \\ &=& \alpha - \lambda \textstyle{\frac{\partial E}{\partial a} \frac{\partial a}{\partial \alpha}} \\ &=& \alpha - \lambda 2 \alpha \textstyle \frac{\partial E}{\partial a} \\ &=& \alpha \cdot (1 - 2 \lambda \textstyle \frac{\partial E}{\partial a}) \end{eqnarray*}

by the chain rule. Thus

    \begin{eqnarray*} a^{\text{new}} &=& (\alpha^{\text{new}})^2 \\ &=& \alpha^2 (1 - 2 \lambda \textstyle\frac{\partial E}{\partial a})^2 \\ &=& a \cdot (1 - 2 \lambda \textstyle\frac{\partial E}{\partial a})^2. \end{eqnarray*}

Thus we’ve obtained a multiplicative update rule for a that is in terms of a, only. In particular, we don’t need \alpha anymore!

Leave a Reply

Your email address will not be published.