\usepackageamsmath

Re-parameterising for non-negativity yields multiplicative updates

Suppose you have a model that depends on real-valued parameters, and that you would like to constrain these parameters to be non-negative. For simplicity, suppose the model has a single parameter aR. Let E denote the error function. To constrain a to be non-negative, parameterise a as the square of a real-valued parameter αR:

a=α2,αR.

We can now minimise E by choosing α without constraints, e.g. by using gradient descent. Let λ>0 be the learning rate. We have

αnew=αλEα=αλEaaα=αλ2αEa=α(12λEa)

by the chain rule. Thus

anew=(αnew)2=α2(12λEa)2=a(12λEa)2.

Thus we’ve obtained a multiplicative update rule for a that is in terms of a, only. In particular, we don’t need α anymore!

Leave a Reply

Your email address will not be published. Required fields are marked *