Fix state/config improper documentation

author: Alfredo Canziani <alfredo.canziani@gmail.com> 2016-06-29 07:09:27 +0300
committer: Alfredo Canziani <alfredo.canziani@gmail.com> 2016-06-30 05:51:21 +0300
commit: c0c4bbfcc14fad7bc484358821563fddd0b9031e (patch)
tree: 5cbac231bfa6bc4e961e075e80604a1b0fd52cba
parent: 8755acb1fc6e91afaa9c7973f9efd4239e295d1a (diff)
2 files changed, 64 insertions, 57 deletions
diff --git a/doc/algos.md b/doc/algos.md
index b69cca7..a671420 100644
--- a/doc/algos.md
+++ b/doc/algos.md
@@ -25,12 +25,12 @@ Some of these algorithms support a line search, which can be passed as a functio
 General interface:
 
 ```lua
-x*, {f}, ... = optim.method(opfunc, x, state)
+x*, {f}, ... = optim.method(opfunc, x[, config][, state])
 ```
 
 
 <a name='optim.sgd'></a>
-## sgd(opfunc, x, state)
+## sgd(opfunc, x[, config][, state])
 
 An implementation of *Stochastic Gradient Descent* (*SGD*).
 
@@ -56,7 +56,7 @@ Returns:
 
 
 <a name='optim.asgd'></a>
-## asgd(opfunc, x, state)
+## asgd(opfunc, x[, config][, state])
 
 An implementation of *Averaged Stochastic Gradient Descent* (*ASGD*):
 
@@ -72,11 +72,11 @@ Arguments:
 
   * `opfunc`: a function that takes a single input `X`, the point of evaluation, and returns `f(X)` and `df/dX`
   * `x`: the initial point
-  * `state`: a table describing the state of the optimizer; after each call the state is modified
-  * `state.eta0`: learning rate
-  * `state.lambda`: decay term
-  * `state.alpha`: power for eta update
-  * `state.t0`: point at which to start averaging
+  * `config`: a table with configuration parameters for the optimizer
+  * `config.eta0`: learning rate
+  * `config.lambda`: decay term
+  * `config.alpha`: power for eta update
+  * `config.t0`: point at which to start averaging
 
 Returns:
 
@@ -86,7 +86,7 @@ Returns:
 
 
 <a name='optim.lbfgs'></a>
-## lbfgs(opfunc, x, state)
+## lbfgs(opfunc, x[, config][, state])
 
 An implementation of *L-BFGS* that relies on a user-provided line search function (`state.lineSearch`).
 If this function is not provided, then a simple learning rate is used to produce fixed size steps.
@@ -100,13 +100,13 @@ Arguments:
 
   * `opfunc`: a function that takes a single input `X`, the point of evaluation, and returns `f(X)` and `df/dX`
   * `x`: the initial point
-  * `state`: a table describing the state of the optimizer; after each call the state is modified
-  * `state.maxIter`: Maximum number of iterations allowed
-  * `state.maxEval`: Maximum number of function evaluations
-  * `state.tolFun`: Termination tolerance on the first-order optimality
-  * `state.tolX`: Termination tol on progress in terms of func/param changes
-  * `state.lineSearch`: A line search function
-  * `state.learningRate`: If no line search provided, then a fixed step size is used
+  * `config`: a table with configuration parameters for the optimizer
+  * `config.maxIter`: Maximum number of iterations allowed
+  * `config.maxEval`: Maximum number of function evaluations
+  * `config.tolFun`: Termination tolerance on the first-order optimality
+  * `config.tolX`: Termination tol on progress in terms of func/param changes
+  * `config.lineSearch`: A line search function
+  * `config.learningRate`: If no line search provided, then a fixed step size is used
 
 Returns:
   * `x*`: the new `x` vector, at the optimal point
@@ -116,7 +116,7 @@ Returns:
 
 
 <a name='optim.cg'></a>
-## cg(opfunc, x, state)
+## cg(opfunc, x[, config][, state])
 
 An implementation of the *Conjugate Gradient* method which is a rewrite of `minimize.m` written by Carl E. Rasmussen.
 It is supposed to produce exactly same results (give or take numerical accuracy due to some changed order of operations).
@@ -132,11 +132,12 @@ Arguments:
 
   * `opfunc`: a function that takes a single input, the point of evaluation.
   * `x`: the initial point
+  * `config`: a table with configuration parameters for the optimizer
+  * `config.maxEval`: max number of function evaluations
+  * `config.maxIter`: max number of iterations
   * `state`: a table of parameters and temporary allocations.
-  * `state.maxEval`: max number of function evaluations
-  * `state.maxIter`: max number of iterations
-  * `state.df[0, gc, gc, gc]`: if you pass torch.Tensor they will be used for temp storage
-  * `state.[s, gc0]`: if you pass torch.Tensor they will be used for temp storage
+  * `state.df[0, 1, 2, 3]`: if you pass `Tensor` they will be used for temp storage
+  * `state.[s, x0]`: if you pass `Tensor` they will be used for temp storage
 
 Returns:
 
@@ -147,9 +148,9 @@ Returns:
 
 
 <a name='optim.adadelta'></a>
-## adadelta(opfunc, x, config, state)
+## adadelta(opfunc, x[, config][, state])
 
-*AdaDelta* implementation for *SGD* http://arxiv.org/abs/1212.5701
+*AdaDelta* implementation for *SGD* http://arxiv.org/abs/1212.5701.
 
 Arguments:
 
@@ -169,16 +170,17 @@ Returns:
 
 
 <a name='optim.adagrad'></a>
-## adagrad(opfunc, x, config, state)
+## adagrad(opfunc, x[, config][, state])
 
-*AdaGrad* implementation for *SGD*
+*AdaGrad* implementation for *SGD*.
 
 Arguments:
 
-* `opfunc`: a function that takes a single input `X`, the point of evaluation, and returns `f(X)` and `df/dX`
-* `x`: the initial point
-* `state`: a table describing the state of the optimizer; after each call the state is modified
-* `state.learningRate`: learning rate
+  * `opfunc`: a function that takes a single input `X`, the point of evaluation, and returns `f(X)` and `df/dX`
+  * `x`: the initial point
+  * `config`: a table with configuration parameters for the optimizer
+  * `config.learningRate`: learning rate
+  * `state`: a table describing the state of the optimizer; after each call the state is modified
   * `state.paramVariance`: vector of temporal variances of parameters
 
 Returns:
@@ -188,7 +190,7 @@ Returns:
 
 
 <a name='optim.adam'></a>
-## adam(opfunc, x, config, state)
+## adam(opfunc, x[, config][, state])
 
 An implementation of *Adam* from http://arxiv.org/pdf/1412.6980.pdf.
 
@@ -210,9 +212,9 @@ Returns:
 
 
 <a name='optim.adamax'></a>
-## adamax(opfunc, x, config, state)
+## adamax(opfunc, x[, config][, state])
 
-An implementation of *AdaMax* http://arxiv.org/pdf/1412.6980.pdf
+An implementation of *AdaMax* http://arxiv.org/pdf/1412.6980.pdf.
 
 Arguments:
 
@@ -223,7 +225,7 @@ Arguments:
   * `config.beta1`: first moment coefficient
   * `config.beta2`: second moment coefficient
   * `config.epsilon`: for numerical stability
-  * `state`: a table describing the state of the optimizer; after each call the state is modified.
+  * `state`: a table describing the state of the optimizer; after each call the state is modified
 
 Returns:
 
@@ -232,7 +234,7 @@ Returns:
 
 
 <a name='optim.FistaLS'></a>
-## FistaLS(f, g, pl, xinit, params)
+## FistaLS(f, g, pl, xinit[, params])
 
 *Fista* with backtracking *Line Search*:
 
@@ -264,7 +266,7 @@ Algorithm is published in http://epubs.siam.org/doi/abs/10.1137/080716542
 
 
 <a name='optim.nag'></a>
-## nag(opfunc, x, config, state)
+## nag(opfunc, x[, config][, state])
 
 An implementation of *SGD* adapted with features of *Nesterov's Accelerated Gradient method*, based on the paper "On the Importance of Initialization and Momentum in Deep Learning" (Sutsveker et. al., ICML 2013) http://www.cs.toronto.edu/~fritz/absps/momentum.pdf.
 
@@ -272,12 +274,12 @@ Arguments:
 
   * `opfunc`: a function that takes a single input `X`, the point of evaluation, and returns `f(X)` and `df/dX`
   * `x`: the initial point
-  * `state`: a table describing the state of the optimizer; after each call the state is modified
-  * `state.learningRate`: learning rate
-  * `state.learningRateDecay`: learning rate decay
-  * `astate.weightDecay`: weight decay
-  * `state.momentum`: momentum
-  * `state.learningRates`: vector of individual learning rates
+  * `config`: a table with configuration parameters for the optimizer
+  * `config.learningRate`: learning rate
+  * `config.learningRateDecay`: learning rate decay
+  * `config.weightDecay`: weight decay
+  * `config.momentum`: momentum
+  * `config.learningRates`: vector of individual learning rates
 
 Returns:
 
@@ -286,7 +288,7 @@ Returns:
 
 
 <a name='optim.rmsprop'></a>
-## rmsprop(opfunc, x, config, state)
+## rmsprop(opfunc, x[, config][, state])
 
 An implementation of *RMSprop*.
 
@@ -309,21 +311,21 @@ Returns:
 
 
 <a name='optim.rprop'></a>
-## rprop(opfunc, x, config, state)
+## rprop(opfunc, x[, config][, state])
 
-A plain implementation of *Rprop* (Martin Riedmiller, Koray Kavukcuoglu 2013)
+A plain implementation of *Rprop* (Martin Riedmiller, Koray Kavukcuoglu 2013).
 
 Arguments:
 
   * `opfunc`: a function that takes a single input `X`, the point of evaluation, and returns `f(X)` and `df/dX`
   * `x`: the initial point
-  * `state`: a table describing the state of the optimizer; after each call the state is modified
-  * `state.stepsize`: initial step size, common to all components
-  * `state.etaplus`: multiplicative increase factor, > 1 (default 1.2)
-  * `state.etaminus`: multiplicative decrease factor, < 1 (default 0.5)
-  * `state.stepsizemax`: maximum stepsize allowed (default 50)
-  * `state.stepsizemin`: minimum stepsize allowed (default 1e-6)
-  * `state.niter`: number of iterations (default 1)
+  * `config`: a table with configuration parameters for the optimizer
+  * `config.stepsize`: initial step size, common to all components
+  * `config.etaplus`: multiplicative increase factor, > 1 (default 1.2)
+  * `config.etaminus`: multiplicative decrease factor, < 1 (default 0.5)
+  * `config.stepsizemax`: maximum stepsize allowed (default 50)
+  * `config.stepsizemin`: minimum stepsize allowed (default 1e-6)
+  * `config.niter`: number of iterations (default 1)
 
 Returns:
 
@@ -332,7 +334,7 @@ Returns:
 
 
 <a name='optim.cmaes'></a>
-## cmaes(opfunc, x, config, state)
+## cmaes(opfunc, x[, config][, state])
 
 An implementation of *CMAES* (*Covariance Matrix Adaptation Evolution Strategy*), ported from https://www.lri.fr/~hansen/barecmaes2.html.
 
@@ -341,8 +343,12 @@ Note that this method will on average take much more function evaluations to con
 
 Arguments:
 
+If `state` is specified, then `config` is not used at all.
+Otherwise `state` is `config`.
+
   * `opfunc`: a function that takes a single input `X`, the point of evaluation, and returns `f(X)` and `df/dX`. Note that `df/dX` is not used and can be left 0
   * `x`: the initial point
+  * `state`: a table describing the state of the optimizer; after each call the state is modified
   * `state.sigma`: float, initial step-size (standard deviation in each coordinate)
   * `state.maxEval`: int, maximal number of function evaluations
   * `state.ftarget`: float, target function value
diff --git a/doc/intro.md b/doc/intro.md
index 54d167f..b387235 100644
--- a/doc/intro.md
+++ b/doc/intro.md
@@ -1,17 +1,18 @@
 <a name='optim.overview'></a>
-## Overview
+# Overview
 
 Most optimization algorithms have the following interface:
 
 ```lua
-x*, {f}, ... = optim.method(opfunc, x, state)
+x*, {f}, ... = optim.method(opfunc, x[, config][, state])
 ```
 
 where:
 
 * `opfunc`: a user-defined closure that respects this API: `f, df/dx = func(x)`
 * `x`: the current parameter vector (a 1D `Tensor`)
-* `state`: a table of parameters, and state variables, dependent upon the algorithm
+* `config`: a table of parameters, dependent upon the algorithm
+* `state`: a table of state variables, if `nil`, `config` will contain the state
 * `x*`: the new parameter vector that minimizes `f, x* = argmin_x f(x)`
 * `{f}`: a table of all `f` values, in the order they've been evaluated (for some simple algorithms, like SGD, `#f == 1`)
 
@@ -24,7 +25,7 @@ It's usually initialized once, by the user, and then passed to the optim functio
 Example:
 
 ```lua
-state = {
+config = {
    learningRate = 1e-3,
    momentum = 0.5
 }
@@ -34,7 +35,7 @@ for i, sample in ipairs(training_samples) do
        -- define eval function
        return f, df_dx
     end
-    optim.sgd(func, x, state)
+    optim.sgd(func, x, config)
 end
 ```
author	Alfredo Canziani <alfredo.canziani@gmail.com>	2016-06-29 07:09:27 +0300
committer	Alfredo Canziani <alfredo.canziani@gmail.com>	2016-06-30 05:51:21 +0300
commit	c0c4bbfcc14fad7bc484358821563fddd0b9031e (patch)
tree	5cbac231bfa6bc4e961e075e80604a1b0fd52cba
parent	8755acb1fc6e91afaa9c7973f9efd4239e295d1a (diff)