Re-parameterising for non-negativity yields multiplicative updates

Suppose you have a model that depends on real-valued parameters, and that you would like to constrain these parameters to be non-negative. For simplicity, suppose the model has a single parameter $a \in R$ . Let $E$ denote the error function. To constrain $a$ to be non-negative, parameterise $a$ as the square of a real-valued parameter $α \in R$ :

$a = α^{2}, α \in R .$

We can now minimise $E$ by choosing $α$ without constraints, e.g. by using gradient descent. Let $λ > 0$ be the learning rate. We have

$\begin{array}{rcl} α^{new} & = & α - λ \frac{\partial E}{\partial α} \\ = & α - λ \frac{\partial E}{\partial a} \frac{\partial a}{\partial α} \\ = & α - λ 2 α \frac{\partial E}{\partial a} \\ = & α \cdot (1 - 2 λ \frac{\partial E}{\partial a}) \end{array}$

by the chain rule. Thus

$\begin{array}{rcl} a^{new} & = & (α^{new})^{2} \\ = & α^{2} (1 - 2 λ \frac{\partial E}{\partial a})^{2} \\ = & a \cdot (1 - 2 λ \frac{\partial E}{\partial a})^{2} . \end{array}$

Thus we’ve obtained a multiplicative update rule for $a$ that is in terms of $a$ , only. In particular, we don’t need $α$ anymore!

Leave a Reply Cancel reply