diff options
author | Andreas Köpf <andreas.koepf@xamla.com> | 2016-06-09 00:14:18 +0300 |
---|---|---|
committer | Andreas Köpf <andreas.koepf@xamla.com> | 2016-06-09 00:14:18 +0300 |
commit | 9c08fde975c5998cc25d5ebf265754486dd5c160 (patch) | |
tree | 616b93ed411a9b5e96237f95c18b152180c26e8e | |
parent | 6759dc8a210b1f93184a23bda9c4ca5eb8c2b71a (diff) |
Init rmsprop mean square state 'm' with 1 instead 0
With alpha near 1 (e.g. the default value 0.99) the gradient was
likely scaled up by a division by a number <1 during the first
few iterations.
With the original impl the learning rate had to be set to a much
smaller value when using rmsprop compared to plain-vanilla sgd in
order not to diverge.
-rw-r--r-- | rmsprop.lua | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/rmsprop.lua b/rmsprop.lua index 8947b18..038af21 100644 --- a/rmsprop.lua +++ b/rmsprop.lua @@ -40,7 +40,7 @@ function optim.rmsprop(opfunc, x, config, state) -- (3) initialize mean square values and square gradient storage if not state.m then - state.m = torch.Tensor():typeAs(x):resizeAs(dfdx):zero() + state.m = torch.Tensor():typeAs(x):resizeAs(dfdx):fill(1) state.tmp = torch.Tensor():typeAs(x):resizeAs(dfdx) end |