Init rmsprop mean square state 'm' with 1 instead 0

With alpha near 1 (e.g. the default value 0.99) the gradient was likely scaled up by a division by a number <1 during the first few iterations. With the original impl the learning rate had to be set to a much smaller value when using rmsprop compared to plain-vanilla sgd in order not to diverge.
author: Andreas Köpf <andreas.koepf@xamla.com> 2016-06-09 00:14:18 +0300
committer: Andreas Köpf <andreas.koepf@xamla.com> 2016-06-09 00:14:18 +0300
commit: 9c08fde975c5998cc25d5ebf265754486dd5c160 (patch)
tree: 616b93ed411a9b5e96237f95c18b152180c26e8e
parent: 6759dc8a210b1f93184a23bda9c4ca5eb8c2b71a (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/rmsprop.lua b/rmsprop.lua
index 8947b18..038af21 100644
--- a/rmsprop.lua
+++ b/rmsprop.lua
@@ -40,7 +40,7 @@ function optim.rmsprop(opfunc, x, config, state)
 
     -- (3) initialize mean square values and square gradient storage
     if not state.m then
-      state.m = torch.Tensor():typeAs(x):resizeAs(dfdx):zero()
+      state.m = torch.Tensor():typeAs(x):resizeAs(dfdx):fill(1)
       state.tmp = torch.Tensor():typeAs(x):resizeAs(dfdx)
     end
author	Andreas Köpf <andreas.koepf@xamla.com>	2016-06-09 00:14:18 +0300
committer	Andreas Köpf <andreas.koepf@xamla.com>	2016-06-09 00:14:18 +0300
commit	9c08fde975c5998cc25d5ebf265754486dd5c160 (patch)
tree	616b93ed411a9b5e96237f95c18b152180c26e8e
parent	6759dc8a210b1f93184a23bda9c4ca5eb8c2b71a (diff)