1 files changed, 3053 insertions, 0 deletions
diff --git a/dok/index.dok b/dok/index.dok
new file mode 100644
index 0000000..ded5265
--- /dev/null
+++ b/dok/index.dok
@@ -0,0 +1,3053 @@
+====== Neural Network Package =======
+{{anchor:nn.dok}}
+
+This package provides an easy way to build and train simple or complex
+neural networks.
+
+Each module of a network is composed of [[#nn.Modules|Modules]] and there
+are several sub-classes of ''Module'' available: container classes like
+[[#nn.Sequential|Sequential]], [[#nn.Parallel|Parallel]] and
+[[#nn.Concat|Concat]] , which can contain simple layers like
+[[#nn.Linear|Linear]], [[#nn.Mean|Mean]], [[#nn.Max|Max]] and
+[[#nn.Reshape|Reshape]], as well as convolutional layers, and transfer
+functions like [[#nn.Tanh|Tanh]].
+
+Loss functions are implemented as sub-classes of
+[[#nn.Criterions|Criterion]]. They are helpful to train neural network on
+classical tasks.  Common criterions are the Mean Squared Error
+criterion implemented in [[#nn.MSECriterion|MSECriterion]] and the
+cross-entropy criterion implemented in
+[[#nn.ClassNLLCriterion|ClassNLLCriterion]].
+
+Finally, the [[#nn.StochasticGradient|StochasticGradient]] class provides a
+high level way to train the neural network of choice, even though it is
+easy with a simple for loop to [[#nn.DoItYourself|train a neural network yourself]].
+
+For those who want to implement their own modules, we suggest using
+the ''nn.Jacobian'' class for testing the derivatives of their class,
+together with the [[..:torch:tester|torch.Tester]] class. The sources
+of ''nn'' package contains sufficiently many examples of such tests.
+
+
+====== Detailed Overview of the Neural Network Package ======
+{{anchor:nn.overview.dok}}
+
+**Module**
+
+A neural network is called a [[#nn.Module|Module]] (or simply
+//module// in this documentation) in Torch. ''Module'' is an abstract
+class which defines four main methods:
+  * [[#nn.Module.forward|forward(input)]] which computes the output of the module given the ''input'' [[..:torch:tensor|Tensor]].
+  * [[#nn.Module.backward|backward(input, gradOutput)]] which computes the gradients of the module with respect to its own parameters, and its own inputs.
+  * [[#nn.Module.zeroGradParameters|zeroGradParameters()]] which zeroes the gradient with respect to the parameters of the module.
+  * [[#nn.Module.updateParameters|updateParameters(learningRate)]] which updates the parameters after one has computed the gradients with ''backward()''
+
+It also declares two members:
+  * [[#nn.Module.output|output]] which is the output returned by ''forward()''.
+  * [[#nn.Module.gradInput|gradInput]] which contains the gradients with respect to the input of the module, computed in a ''backward()''.
+
+Two other perhaps less used but handy methods are also defined:
+  * [[#nn.Module.share|share(mlp,s1,s2,...,sn)]] which makes this module share the parameters s1,..sn of the module ''mlp''. This is useful if you want to have modules that share the same weights.
+  * [[#nn.Module.clone|clone(...)]] which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any).
+
+Some important remarks:
+  * ''output'' contains only valid values after a [[#nn.Module.forward|forward(input)]].
+  * ''gradInput'' contains only valid values after a [[#nn.Module.backward|backward(input, gradOutput)]].
+  * [[#nn.Module.backward|backward(input, gradOutput)]] uses certain computations obtained during [[#nn.Module.forward|forward(input)]]. You //must// call ''forward()'' before calling a ''backward()'', on the //same// ''input'', or your gradients are going to be incorrect!
+
+
+**Plug and play**
+
+Building a simple neural network can be achieved by constructing an available layer.
+A linear neural network (perceptron!) is built only in one line:
+<file lua>
+nn = nn.Linear(10,1) -- perceptron with 10 inputs
+</file>
+
+More complex neural networks are easily built using container classes
+[[#nn.Sequential|Sequential]] and [[#nn.Concat|Concat]]. ''Sequential'' plugs
+layer in a feed-forward fully connected manner. ''Concat'' concatenates in
+one layer several modules: they take the same inputs, and their output is
+concatenated.
+
+Creating a one hidden-layer multi-layer perceptron is thus just as easy as:
+<file lua>
+mlp = nn.Sequential()
+mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units
+mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function
+mlp:add( nn.Linear(25, 1) ) -- 1 output
+</file>
+
+Of course, ''Sequential'' and ''Concat'' can contains other
+''Sequential'' or ''Concat'', allowing you to try the craziest neural
+networks you ever dreamt of! See the [[#nn.Modules|complete list of
+available modules]].
+
+**Training a neural network**
+
+Once you built your neural network, you have to choose a particular
+[[#nn.Criterions|Criterion]] to train it. A criterion is a class which
+describes the cost to be minimized during training.
+
+You can then train the neural network by using the
+[[#nn.StochasticGradient|StochasticGradient]] class.
+
+<file lua>
+ criterion = nn.MSECriterion() -- Mean Squared Error criterion
+ trainer = nn.StochasticGradient(mlp, criterion)
+ trainer:train(dataset) -- train using some examples
+</file>
+
+StochasticGradient expect as a ''dataset'' an object which implements
+the operator ''dataset[index]'' and implements the method
+''dataset:size()''. The ''size()'' methods returns the number of
+examples and ''dataset[i]'' has to return the i-th example.
+
+An ''example'' has to be an object which implements the operator
+''example[field]'', where ''field'' might take the value ''1'' (input
+features) or ''2'' (corresponding label which will be given to the
+criterion).  The input is usually a Tensor (except if you use special
+kind of gradient modules, like [[#nn.TableLayers|table layers]]). The
+label type depends of the criterion.  For example, the
+[[#nn.MSECriterion|MSECriterion]] expect a Tensor, but the
+[[#nn.ClassNLLCriterion|ClassNLLCriterion]] except a integer number (the
+class).
+
+Such a dataset is easily constructed by using Lua tables, but it could
+any ''C'' object for example, as long as required operators/methods
+are implemented.  [[#nn.DoItStochasticGradient|See an example]].
+
+''StochasticGradient'' being written in ''Lua'', it is extremely easy
+to cut-and-paste it and create a variant to it adapted to your needs
+(if the constraints of ''StochasticGradient'' do not satisfy you).
+
+**Low Level Training Of a Neural Network**
+
+If you want to program the ''StochasticGradient'' by hand, you
+essentially need to control the use of forwards and backwards through
+the network yourself.  For example, here is the code fragment one
+would need to make a gradient step given an input ''x'', a desired
+output ''y'', a network ''mlp'' and a given criterion ''criterion''
+and learning rate ''learningRate'':
+
+<file lua>
+function gradUpdate(mlp, x, y, criterion, learningRate) 
+  local pred = mlp:forward(x)
+  local err = criterion:forward(pred, y)
+  local gradCriterion = criterion:backward(pred, y)
+  mlp:zeroGradParameters()
+  mlp:backward(x, gradCriterion)
+  mlp:updateParameters(learningRate)
+end
+</file>
+For example, if you wish to use your own criterion you can simple replace 
+''gradCriterion'' with the gradient vector of your criterion of choice.
+
+
+======  Modules ======
+{{anchor:nn.Modules}}
+
+Modules are bricks to build neural networks. A [[#nn.Module|Module]] is a neural network
+by itself, but it can be combined with other networks using [[#nn.Containers|container classes]] to create
+complex neural networks.
+
+=====  Module =====
+{{anchor:nn.Module}}
+
+''Module'' is an abstract class which defines fundamental methods necessary
+for a training a neural network. Modules are [[..:torch:file#torch.file.serialization|serializable]].
+
+Modules contain two states variables: [[#nn.ModuleOutput|output]] and
+[[#nn.ModuleGradInput|gradInput]].
+
+====  [output] forward(input) ====
+{{anchor:nn.Module.forward}}
+
+Takes an ''input'' object, and computes the corresponding ''output'' of the
+module. In general ''input'' and ''output'' are
+[[..:torch:tensor|Tensors]]. However, some special sub-classes
+like [[#nn.TableLayers|table layers]] might expect something else. Please,
+refer to each module specification for further information.
+
+After a ''forward()'', the [[#nn.ModuleOutput|ouput]] state variable should
+have been updated to the new value.
+
+It is not advised to override this function. Instead, one should
+implement [[#nn.Module.updateOutput|updateOutput(input)]]
+function. The forward module in the abstract parent class
+[[#nn.Module|Module]] will call ''updateOutput(input)''.
+
+====  [gradInput] backward(input, gradOutput) ====
+{{anchor:nn.Module.backward}}
+
+Performs a //backpropagation step// through the module, with respect to the
+given ''input''.  In general this method makes the assumption
+[[#nn.Module.forward|forward(input)]] has been called before, //with the same input//.
+This is necessary for optimization reasons. If you do not respect
+this rule, ''backward()'' will compute incorrect gradients.
+
+In general ''input'' and ''gradOutput''  and ''gradInput'' are
+[[..:torch:tensor|Tensors]]. However, some special sub-classes
+like [[#nn.TableLayers|table layers]] might expect something else. Please,
+refer to each module specification for further information.
+
+A //backpropagation step// consist in computing two kind of gradients
+at ''input'' given ''gradOutput'' (gradients with respect to the
+output of the module).  This function simply performs this task using
+two function calls:
+
+  - A function call to [[#nn.Module.updateGradInput|updateGradInput(input, gradOutput)]].
+  - A function call to [[#nn.Module.accGradParameters|accGradParameters(input,gradOutput)]].
+
+It is not advised to override this function call in custom classes. It
+is better to override
+[[#nn.Module.updateGradInput|updateGradInput(input, gradOutput)]] and
+[[#nn.Module.accGradParameters|accGradParameters(input, gradOutput)]]
+functions.
+
+==== updateOutput(input) ====
+{{anchor:nn.Module.updateOutput}}
+
+Computes the output using the current parameter set of the class and
+input. This function returns the result which is stored in the
+[[#nn.Module.output|output]] field.
+
+==== updateGradInput(input, gradOutput) ====
+{{anchor:nn.Module.updateGradInput}}
+
+Computing the gradient of the module with respect to its own
+input. This is returned in ''gradInput''. Also, the
+[[#nn.Module.gradInput|gradInput]] state variable is updated
+accordingly.
+
+==== accGradParameters(input, gradOutput) ====
+{{anchor:nn.Module.accGradParameters}}
+
+Computing the gradient of the module with respect to its
+ownparameters. Many modules do not perform this step as they do not
+have any parameters. The state variable name for the parameters is
+module dependent. The module is expected to //accumulate// the
+gradients with respect to the parameters in some variable.
+
+Zeroing this accumulation is achieved with
+[[#nn.Module.zeroGradParameters|zeroGradParameters()]] and updating
+the parameters according to this accumulation is done with
+[[#nn.Module.updateParameters|updateParameters()]].
+
+====  zeroGradParameters() ====
+{{anchor:nn.Module.zeroGradParameters}}
+
+If the module has parameters, this will zero the accumulation of the
+gradients with respect to these parameters, accumulated through
+[[#nn.Module.accGradParameters|accGradParameters(input, gradOutput)]]
+calls. Otherwise, it does nothing.
+
+====  updateParameters(learningRate) ====
+{{anchor:nn.Module.updateParameters}}
+
+If the module has parameters, this will update these parameters, according
+to the accumulation of the gradients with respect to these parameters,
+accumulated through [[#nn.Module.backward|backward()]] calls.
+
+The update is basically:
+<file lua>
+parameters = parameters - learningRate * gradients_wrt_parameters
+</file>
+If the module does not have parameters, it does nothing.
+
+==== accUpdateGradParameters(input, gradOutput, learningRate) ====
+{{anchor:nn.Module.accUpdateGradParameters}}
+
+This is a convenience module that performs two functions at
+once. Calculates and accumulates the gradients with respect to the
+weights after mutltiplying with negative of the learning rate
+''learningRate''. Performing these two operations at once is more
+performance efficient and it might be advantageous in certain
+situations.
+
+Keep in mind that, this function uses a simple trick to achieve its
+goal and it might not be valid for a custom module.
+
+<file lua>
+function Module:accUpdateGradParameters(input, gradOutput, lr)
+   local gradWeight = self.gradWeight
+   local gradBias = self.gradBias
+   self.gradWeight = self.weight
+   self.gradBias = self.bias
+   self:accGradParameters(input, gradOutput, -lr)
+   self.gradWeight = gradWeight
+   self.gradBias = gradBias
+end
+</file>
+
+As it can be seen, the gradients are accumulated directly into
+weights. This assumption may not be true for a module that computes a
+nonlinear operation.
+
+==== share(mlp,s1,s2,...,sn) ====
+{{anchor:nn.Module.share}}
+
+This function modifies the parameters of the module named
+''s1'',..''sn'' (if they exist) so that they are shared with (pointers
+to) the parameters with the same names in the given module ''mlp''.
+
+The parameters have to be Tensors. This function is typically used if
+you want to have modules that share the same weights or biases.
+
+Note that this function if called on a [[#nn.Containers|Container]]
+module will share the same parameters for all the contained modules as
+well.
+
+Example:
+<file lua>
+
+-- make an mlp
+mlp1=nn.Sequential(); 
+mlp1:add(nn.Linear(100,10));
+
+-- make a second mlp
+mlp2=nn.Sequential(); 
+mlp2:add(nn.Linear(100,10)); 
+
+-- the second mlp shares the bias of the first
+mlp2:share(mlp1,'bias');
+
+-- we change the bias of the first
+mlp1:get(1).bias[1]=99;
+
+-- and see that the second one's bias has also changed..
+print(mlp2:get(1).bias[1])
+
+</file>
+
+
+====  clone(mlp,...) ====
+{{anchor:nn.Module.clone}}
+
+Creates a deep copy of (i.e. not just a pointer to) the module,
+including the current state of its parameters (e.g. weight, biases
+etc., if any).
+
+If arguments are provided to the ''clone(...)'' function it also calls
+[[#nn.Module.share|share(...)]] with those arguments on the cloned
+module after creating it, hence making a deep copy of this module with
+some shared parameters.
+
+Example:
+<file lua>
+-- make an mlp
+mlp1=nn.Sequential(); 
+mlp1:add(nn.Linear(100,10));
+
+-- make a copy that shares the weights and biases
+mlp2=mlp1:clone('weight','bias');
+
+-- we change the bias of the first mlp
+mlp1:get(1).bias[1]=99;
+
+-- and see that the second one's bias has also changed..
+print(mlp2:get(1).bias[1])
+
+</file>
+
+==== type(type) ====
+{{anchor:nn.Module.type}}
+
+This function converts all the parameters of a module to the given
+''type''. The ''type'' can be one of the types defined for
+[[..:torch:tensor|torch.Tensor]].
+
+==== float() ====
+{{anchor:nn.Module.float}}
+
+Convenience method for calling [[#nn.Module.type|module:type('torch.FloatTensor')]]
+
+==== double() ====
+{{anchor:nn.Module.double}}
+
+Convenience method for calling [[#nn.Module.type|module:type('torch.DoubleTensor')]]
+
+==== cuda() ====
+{{anchor:nn.Module.cuda}}
+
+Convenience method for calling [[#nn.Module.type|module:type('torch.CudaTensor')]]
+
+====  State Variables ====
+{{anchor:nn.statevars.dok}}
+
+These state variables are useful objects if one wants to check the guts of
+a ''Module''. The object pointer is //never// supposed to change. However, its
+contents (including its size if it is a Tensor) are supposed to change.
+
+In general state variables are
+[[..:torch:tensor|Tensors]]. However, some special sub-classes
+like [[#nn.TableLayers|table layers]] contain something else. Please,
+refer to each module specification for further information.
+
+===  output ===
+{{anchor:nn.Module.output}}
+
+This contains the output of the module, computed with the last call of
+[[#nn.Module.forward|forward(input)]].
+
+===  gradInput ===
+{{anchor:nn.Module.gradInput}}
+
+This contains the gradients with respect to the inputs of the module, computed with the last call of
+[[#nn.Module.updateGradInput|updateGradInput(input, gradOutput)]]. 
+
+====  Parameters and gradients w.r.t parameters ====
+
+Some modules contain parameters (the ones that we actually want to
+train!). The name of these parameters, and gradients w.r.t these parameters
+are module dependent.
+
+==== [{weights}, {gradWeights}] parameters() ====
+{{anchor:nn.Module.parameters}}
+
+This function should returns two tables. One for the learnable
+parameters ''{weights}'' and another for the gradients of the energy
+wrt to the learnable parameters ''{gradWeights}''.
+
+For custom modules, it is a good idea to also override this
+function. By default none of the built-in functions/modules use this
+function call, but it is especialy useful when one wants to obtain a
+global view of the whole network.
+
+=====  Containers =====
+{{anchor:nn.Containers}}
+
+====  Concat ====
+{{anchor:nn.Concat}}
+
+<file lua>
+module = nn.Concat(dim)
+</file>
+Concat concatenates the output of one layer of "parallel" modules along the
+provided dimension ''dim'': they take the same inputs, and their output is
+concatenated.
+<file lua>
+mlp=nn.Concat(1);
+mlp:add(nn.Linear(5,3))
+mlp:add(nn.Linear(5,7))
+require "lab"
+print(mlp:forward(lab.randn(5)))
+</file>
+which gives the output:
+<file lua>
+ 0.7486
+ 0.1349
+ 0.7924
+-0.0371
+-0.4794
+ 0.3044
+-0.0835
+-0.7928
+ 0.7856
+-0.1815
+[torch.Tensor of dimension 10]
+</file>
+
+
+====  Sequential ====
+{{anchor:nn.Sequential}}
+
+Sequential provides a means to plug layers together
+in a feed-forward fully connected manner.
+
+E.g. 
+creating a one hidden-layer multi-layer perceptron is thus just as easy as:
+<file lua>
+mlp = nn.Sequential()
+mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units
+mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function
+mlp:add( nn.Linear(25, 1) ) -- 1 output
+
+require "lab"
+print(mlp:forward(lab.randn(10)))
+</file>
+which gives the output:
+<file lua>
+-0.1815
+[torch.Tensor of dimension 1]
+</file>
+
+====  Parallel ====
+{{anchor:nn.Parallel}}
+
+''module'' = ''Parallel(inputDimension,outputDimension)''
+
+Creates a container module that applies its ''ith'' child module to the  ''ith'' slice of the input Tensor by using [[..:torch:tensor#torch.tensor.select|select]] 
+on dimension ''inputDimension''. It concatenates the results of its contained modules together along dimension ''outputDimension''.
+
+Example:
+<file lua>
+ require "lab"
+ mlp=nn.Parallel(2,1);     -- iterate over dimension 2 of input
+ mlp:add(nn.Linear(10,3)); -- apply to first slice
+ mlp:add(nn.Linear(10,2))  -- apply to first second slice
+ print(mlp:forward(lab.randn(10,2)))
+</file>
+gives the output:
+<file lua>
+-0.5300
+-1.1015
+ 0.7764
+ 0.2819
+-0.6026
+[torch.Tensor of dimension 5]
+</file>
+
+A more complicated example:
+<file lua>
+require "lab"
+
+mlp=nn.Sequential();
+c=nn.Parallel(1,2)
+for i=1,10 do
+ local t=nn.Sequential()
+ t:add(nn.Linear(3,2))
+ t:add(nn.Reshape(2,1))
+ c:add(t)
+end
+mlp:add(c)
+
+pred=mlp:forward(lab.randn(10,3))
+print(pred)
+
+for i=1,10000 do     -- Train for a few iterations
+ x=lab.randn(10,3);
+ y=lab.ones(2,10);
+ pred=mlp:forward(x)
+
+ criterion= nn.MSECriterion()
+ local err=criterion:forward(pred,y)
+ local gradCriterion = criterion:backward(pred,y);
+ mlp:zeroGradParameters();
+ mlp:backward(x, gradCriterion); 
+ mlp:updateParameters(0.01);
+ print(err)
+end
+</file>
+=====  Simple layers =====
+{{anchor:nn.simplelayers.dok}}
+====  Linear ====
+{{anchor:nn.Linear}}
+
+''module'' = ''Linear(inputDimension,outputDimension)''
+
+Applies a linear transformation to the incoming data, i.e.  //y=
+Ax+b//. The ''input'' tensor given in ''forward(input)'' must be
+either a vector (1D tensor) or matrix (2D tensor). If the input is a
+matrix, then each row is assumed to be an input sample of given batch.
+
+You can create a layer in the following way:
+<file lua>
+ module= nn.Linear(10,5)  -- 10 inputs, 5 outputs
+</file>
+Usually this would be added to a network of some kind, e.g.:
+<file lua>
+ mlp = nn.Sequential();
+ mlp:add(module)
+</file>
+The weights and biases (//A// and //b//) can be viewed with:
+<file lua>
+ print(module.weight)
+ print(module.bias)
+</file>
+The gradients for these weights can be seen with:
+<file lua>
+ print(module.gradWeight)
+ print(module.gradBias)
+</file>
+As usual with ''nn'' modules,
+applying the linear transformation is performed with:
+<file lua>
+ x=torch.Tensor(10) -- 10 inputs
+ y=module:forward(x)
+</file>
+
+====  SparseLinear ====
+{{anchor:nn.SparseLinear}}
+
+''module'' = ''SparseLinear(inputDimension,outputDimension)''
+
+Applies a linear transformation to the incoming sparse data, i.e.
+//y= Ax+b//. The ''input'' tensor given in ''forward(input)'' must
+be a sparse vector represented as 2D tensor of the form 
+torch.Tensor(N, 2) where the pairs represent indices and values.
+The SparseLinear layer is useful when the number of input 
+dimensions is very large and the input data is sparse.
+
+You can create a sparse linear layer in the following way:
+
+<file lua>
+ module= nn.SparseLinear(10000,2)  -- 10000 inputs, 2 outputs
+</file>
+The sparse linear module may be used as part of a larger network, 
+and apart from the form of the input, 
+[[#nn.SparseLinear|SparseLinear]] 
+operates in exactly the same way as the [[#nn.Linear|Linear]] layer.
+
+A sparse input vector may be created as so..
+<file lua>
+
+ x=lab.new({1, 0.1},{2, 0.3},{10, 0.3},{31, 0.2})
+
+ print(x)
+
+  1.0000   0.1000
+  2.0000   0.3000
+ 10.0000   0.3000
+ 31.0000   0.2000
+[torch.Tensor of dimension 4x2]
+
+</file>
+
+The first column contains indices, the second column contains 
+values in a a vector where all other elements are zeros. The 
+indices should not exceed the stated dimesions of the input to the 
+layer (10000 in the example).
+
+==== Abs ====
+{{anchor:nn.Abs}}
+
+''module'' = ''Abs()''
+
+''output = abs(input)''.
+
+<file lua>
+m=nn.Abs()
+ii=lab.linspace(-5,5)
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+
+{{abs.png?400}}
+
+====  Add  ====
+{{anchor:nn.Add }}
+
+''module'' = ''Add(inputDimension,scalar)''
+
+Applies a bias term to the incoming data, i.e.
+//y_i= x_i + b_i,  or if _scalar=true// then uses a single bias term,
+_y_i= x_i + b. 
+
+Example:
+<file lua>
+y=torch.Tensor(5);  
+mlp=nn.Sequential()
+mlp:add(nn.Add(5))
+
+function gradUpdate(mlp, x, y, criterion, learningRate) 
+  local pred = mlp:forward(x)
+  local err = criterion:forward(pred, y)
+  local gradCriterion = criterion:backward(pred, y)
+  mlp:zeroGradParameters()
+  mlp:backward(x, gradCriterion)
+  mlp:updateParameters(learningRate)
+  return err
+end
+
+for i=1,10000 do
+ x=lab.rand(5)
+ y:copy(x); 
+ for i=1,5 do y[i]=y[i]+i; end
+ err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01)
+end
+print(mlp:get(1).bias)
+</file>
+gives the output:
+<file lua>
+ 1.0000
+ 2.0000
+ 3.0000
+ 4.0000
+ 5.0000
+[torch.Tensor of dimension 5]
+</file>
+i.e. the network successfully learns the input //x// has been shifted 
+to produce the output //y//.
+
+
+====  Mul ====
+{{anchor:nn.Mul}}
+
+''module'' = ''Mul(inputDimension)''
+
+Applies a //single// scaling factor to the incoming data, i.e.
+//y= w x//, where //w// is a scalar. 
+
+Example:
+<file lua>
+y=torch.Tensor(5);  
+mlp=nn.Sequential()
+mlp:add(nn.Mul(5))
+
+function gradUpdate(mlp, x, y, criterion, learningRate) 
+  local pred = mlp:forward(x)
+  local err = criterion:forward(pred,y)
+  local gradCriterion = criterion:backward(pred,y);
+  mlp:zeroGradParameters();
+  mlp:backward(x, gradCriterion);
+  mlp:updateParameters(learningRate);
+  return err
+end
+
+
+for i=1,10000 do
+ x=lab.rand(5)
+ y:copy(x); y:mul(math.pi);
+ err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01)
+end
+print(mlp:get(1).weight)
+</file>
+gives the output:
+<file lua>
+ 3.1416
+[torch.Tensor of dimension 1]
+</file>
+i.e. the network successfully learns the input ''x'' has been scaled by
+pi.
+
+====  CMul ====
+{{anchor:nn.CMul  }}
+
+''module'' = ''CMul(inputDimension)''
+
+Applies a component-wise multiplication to the incoming data, i.e.
+''y_i'' = ''w_i'' =x_i=. 
+
+Example:
+<file lua>
+mlp=nn.Sequential()
+mlp:add(nn.CMul(5))
+
+y=torch.Tensor(5); 
+sc=torch.Tensor(5); for i=1,5 do sc[i]=i; end -- scale input with this
+
+function gradUpdate(mlp,x,y,criterion,learningRate) 
+  local pred = mlp:forward(x)
+  local err = criterion:forward(pred,y)
+  local gradCriterion = criterion:backward(pred,y);
+  mlp:zeroGradParameters();
+  mlp:backward(x, gradCriterion);
+  mlp:updateParameters(learningRate);
+  return err
+end
+
+for i=1,10000 do
+ x=lab.rand(5)
+ y:copy(x); y:cmul(sc);
+ err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01)
+end
+print(mlp:get(1).weight)
+</file>
+gives the output:
+<file lua>
+ 1.0000
+ 2.0000
+ 3.0000
+ 4.0000
+ 5.0000
+[torch.Tensor of dimension 5]
+</file>
+i.e. the network successfully learns the input //x// has been scaled by
+those scaling factors to produce the output //y//.
+
+
+====  Max ====
+{{anchor:nn.Max}}
+
+''module'' = ''Max(dimension)''
+
+Applies a max operation over dimension ''dimension''.
+Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2''
+then an ''nxq'' matrix would be output.
+
+
+====  Min ====
+{{anchor:nn.Min}}
+
+''module'' = ''Min(dimension)''
+
+Applies a min operation over dimension ''dimension''.
+Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2''
+then an ''nxq'' matrix would be output.
+
+
+====  Mean ====
+{{anchor:nn.Mean}}
+
+''module'' = ''Mean(dimension)''
+
+Applies a mean operation over dimension ''dimension''.
+Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2''
+then an ''nxq'' matrix would be output.
+
+====  Sum ====
+{{anchor:nn.Sum}}
+
+''module'' = ''Sum(dimension)''
+
+Applies a sum operation over dimension ''dimension''.
+Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2''
+then an ''nxq'' matrix would be output.
+
+
+====  Euclidean ====
+{{anchor:nn.Euclidean}}
+
+''module'' = ''Euclidean(inputDimension,outputDimension)''
+
+Outputs the Euclidean distance of the input to ''outputDimension'' centers,
+i.e. this layer has the weights ''c_i'', ''i'' = ''1'',..,''outputDimension'', where
+''c_i'' are vectors of dimension ''inputDimension''. Output dimension ''j'' is
+''|| c_i - x||^2'', where ''x'' is the input.
+
+====  WeightedEuclidean ====
+{{anchor:nn.WeightedEuclidean}}
+
+''module'' = ''WeightedEuclidean(inputDimension,outputDimension)''
+
+This module is similar to [[#nn.Euclidian|Euclidian]], but
+additionally learns a separate diagonal covariance matrix across the
+features of the input space for each center.
+
+
+==== Copy ====
+{{anchor:nn.Copy}}
+
+''module'' = ''Copy(inputType,outputType)''
+
+This layer copies the input to output with type casting from input
+type from ''inputType'' to ''outputType''.
+
+
+==== Narrow ====
+{{anchor:nn.Narrow}}
+
+''module'' = ''Narrow(dimension, offset, length)''
+
+Narrow is application of
+[[..:torch:tensor:#torch.Tensor.narrow|narrow]] operation in a
+module.
+
+==== Replicate ====
+{{anchor:nn.Replicate}}
+
+''module'' = ''Replicate(nFeature)''
+
+This class creates an output where the input is replicated
+''nFeature'' times along its first dimension. There is no memory
+allocation or memory copy in this module. It sets the
+[[..:torch:tensor#torch.Tensor.stride|stride]] along the first
+dimension to zero.
+
+<file lua>
+torch> x=lab.linspace(1,5,5)
+torch> =x
+ 1
+ 2
+ 3
+ 4
+ 5
+[torch.DoubleTensor of dimension 5]
+
+torch> m=nn.Replicate(3)
+torch> o=m:forward(x)
+torch> =o
+ 1  2  3  4  5
+ 1  2  3  4  5
+ 1  2  3  4  5
+[torch.DoubleTensor of dimension 3x5]
+
+torch> x:fill(13)
+torch> =x
+ 13
+ 13
+ 13
+ 13
+ 13
+[torch.DoubleTensor of dimension 5]
+
+torch> =o
+ 13  13  13  13  13
+ 13  13  13  13  13
+ 13  13  13  13  13
+[torch.DoubleTensor of dimension 3x5]
+
+</file>
+
+
+====  Reshape ====
+{{anchor:nn.Reshape}}
+
+''module'' = ''Reshape(dimension1, dimension2, ..)''
+
+Reshapes an ''nxpxqx..''  Tensor into a ''dimension1xdimension2x...'' Tensor,
+taking the elements column-wise.
+
+Example:
+<file lua>
+> x=torch.Tensor(4,4)
+> for i=1,4 do
+>  for j=1,4 do
+>   x[i][j]=(i-1)*4+j;
+>  end
+> end
+> print(x)
+
+  1   2   3   4
+  5   6   7   8
+  9  10  11  12
+ 13  14  15  16
+[torch.Tensor of dimension 4x4]
+
+> print(nn.Reshape(2,8):forward(x))
+
+  1   9   2  10   3  11   4  12
+  5  13   6  14   7  15   8  16
+[torch.Tensor of dimension 2x8]
+
+> print(nn.Reshape(8,2):forward(x))
+
+  1   3
+  5   7
+  9  11
+ 13  15
+  2   4
+  6   8
+ 10  12
+ 14  16
+[torch.Tensor of dimension 8x2]
+
+> print(nn.Reshape(16):forward(x))
+
+  1
+  5
+  9
+ 13
+  2
+  6
+ 10
+ 14
+  3
+  7
+ 11
+ 15
+  4
+  8
+ 12
+ 16
+[torch.Tensor of dimension 16]
+
+
+</file>
+
+
+====  Select ====
+{{anchor:nn.Select}}
+
+Selects a dimension and index of a  ''nxpxqx..''  Tensor.
+
+Example:
+<file lua>
+mlp=nn.Sequential();
+mlp:add(nn.Select(1,3))
+
+require "lab"
+x=lab.randn(10,5)
+print(x)
+print(mlp:forward(x))
+</file>
+gives the output:
+<file lua>
+ 0.9720 -0.0836  0.0831 -0.2059 -0.0871
+ 0.8750 -2.0432 -0.1295 -2.3932  0.8168
+ 0.0369  1.1633  0.6483  1.2862  0.6596
+ 0.1667 -0.5704 -0.7303  0.3697 -2.2941
+ 0.4794  2.0636  0.3502  0.3560 -0.5500
+-0.1898 -1.1547  0.1145 -1.1399  0.1711
+-1.5130  1.4445  0.2356 -0.5393 -0.6222
+-0.6587  0.4314  1.1916 -1.4509  1.9400
+ 0.2733  1.0911  0.7667  0.4002  0.1646
+ 0.5804 -0.5333  1.1621  1.5683 -0.1978
+[torch.Tensor of dimension 10x5]
+
+ 0.0369
+ 1.1633
+ 0.6483
+ 1.2862
+ 0.6596
+[torch.Tensor of dimension 5]
+</file>
+
+This can be used in conjunction with [[#nn.Concat|Concat]]
+to emulate the behavior 
+of [[#nn.Parallel|Parallel]], or to select various parts of an input Tensor to 
+perform operations on. Here is a fairly complicated example:
+<file lua>
+require "lab"
+
+mlp=nn.Sequential();
+c=nn.Concat(2) 
+for i=1,10 do
+ local t=nn.Sequential()
+ t:add(nn.Select(1,i))
+ t:add(nn.Linear(3,2)) 
+ t:add(nn.Reshape(2,1))
+ c:add(t)
+end
+mlp:add(c)
+
+pred=mlp:forward(lab.randn(10,3))
+print(pred)
+
+for i=1,10000 do     -- Train for a few iterations
+ x=lab.randn(10,3);
+ y=lab.ones(2,10);
+ pred=mlp:forward(x)
+
+ criterion= nn.MSECriterion()
+ err=criterion:forward(pred,y)
+ gradCriterion = criterion:backward(pred,y);
+ mlp:zeroGradParameters();
+ mlp:backward(x, gradCriterion); 
+ mlp:updateParameters(0.01);
+ print(err)
+end
+</file>
+
+====  Exp ====
+{{anchor:nn.Exp}}
+
+Applies the ''exp'' function element-wise to the input Tensor,
+thus outputting a Tensor of the same dimension.
+<file lua>
+ii=lab.linspace(-2,2)
+m=nn.Exp()
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{exp.png?400}}
+
+
+==== Square ====
+{{anchor:nn.Square}}
+
+Takes the square of each element.
+
+<file lua>
+ii=lab.linspace(-5,5)
+m=nn.Square()
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{square.png?400}}
+
+==== Sqrt ====
+{{anchor:nn.Sqrt}}
+
+Takes the square root of each element.
+
+<file lua>
+ii=lab.linspace(0,5)
+m=nn.Sqrt()
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{sqrt.png?400}}
+
+==== Power ====
+{{anchor:nn.Power}}
+
+''module'' = ''Power(p)''
+
+Raises each element to its ''pth'' power.
+
+<file lua>
+ii=lab.linspace(0,2)
+m=nn.Power(1.25)
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{power.png?400}}
+
+=====  Transfer Function Layers =====
+{{anchor:nn.transfer.dok}}
+
+====  HardTanh ====
+{{anchor:nn.HardTanh}}
+
+Applies the ''HardTanh'' function element-wise to the input Tensor,
+thus outputting a Tensor of the same dimension.
+
+''HardTanh'' is defined as:
+
+  * ''f(x)'' = ''1, if x >''  ''1,''
+  * ''f(x)'' = ''-1, if x <''  ''-1,''
+  * ''f(x)'' = ''x,'' ''otherwise.''
+
+<file lua>
+ii=lab.linspace(-2,2)
+m=nn.HardTanh()
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{htanh.png?400}}
+
+
+==== HardShrink ====
+{{anchor:nn.HardShrink}}
+
+''module = nn.HardShrink(lambda)''
+
+Applies the hard shrinkage function element-wise to the input
+[[..:torch:Tensor|Tensor]]. The output is the same size as the input.
+
+''HardShrinkage'' operator is defined as:
+
+  * ''f(x) = x, if x > lambda''
+  * ''f(x) = -x, if < -lambda''
+  * ''f(x) = 0, otherwise''
+
+<file lua>
+ii=lab.linspace(-2,2)
+m=nn.HardShrink(0.85)
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{hshrink.png?400}}
+
+==== SoftShrink ====
+{{anchor:nn.SoftShrink}}
+
+''module = nn.SoftShrink(lambda)''
+
+Applies the hard shrinkage function element-wise to the input
+[[..:torch:Tensor|Tensor]]. The output is the same size as the input.
+
+''HardShrinkage'' operator is defined as:
+
+  * ''f(x) = x-lambda, if x > lambda''
+  * ''f(x) = -x+lambda, if < -lambda''
+  * ''f(x) = 0, otherwise''
+
+<file lua>
+ii=lab.linspace(-2,2)
+m=nn.SoftShrink(0.85)
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{sshrink.png?400}}
+
+
+====  SoftMax ====
+{{anchor:nn.SoftMax}}
+
+Applies the ''Softmax'' function to an n-dimensional input Tensor,
+rescaling them so that the elements of the n-dimensional output Tensor
+lie in the range (0,1) and sum to 1. 
+
+''Softmax'' is defined as ''f_i(x)'' = ''exp(x_i-shift) / sum_j exp(x_j-shift)'',
+where ''shift'' = ''max_i x_i''.
+
+
+<file lua>
+ii=lab.exp(lab.abs(lab.randn(10)))
+m=nn.SoftMax()
+oo=m:forward(ii)
+gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'})
+gnuplot.grid(true)
+</file>
+{{softmax.png?400}}
+
+====  SoftMin ====
+{{anchor:nn.SoftMin}}
+
+Applies the ''Softmin'' function to an n-dimensional input Tensor,
+rescaling them so that the elements of the n-dimensional output Tensor
+lie in the range (0,1) and sum to 1. 
+
+''Softmin'' is defined as ''f_i(x)'' = ''exp(-x_i-shift) / sum_j exp(-x_j-shift)'',
+where ''shift'' = ''max_i x_i''.
+
+
+<file lua>
+ii=lab.exp(lab.abs(lab.randn(10)))
+m=nn.SoftMin()
+oo=m:forward(ii)
+gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'})
+gnuplot.grid(true)
+</file>
+{{softmin.png?400}}
+
+====  SoftPlus ====
+{{anchor:nn.SoftPlus}}
+
+Applies the ''SoftPlus'' function to an n-dimensioanl input Tensor.
+Can be used to constrain the output of a machine to always be positive.
+
+''SoftPlus'' is defined as ''f_i(x)'' = ''log(1 + exp(x_i)))''.
+
+<file lua>
+ii=lab.randn(10)
+m=nn.SoftPlus()
+oo=m:forward(ii)
+go=lab.ones(10)
+gi=m:backward(ii,go)
+gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{softplus.png?400}}
+
+==== SoftSign ====
+{{anchor:nn.SoftSign}}
+
+Applies the ''SoftSign'' function to an n-dimensioanl input Tensor.
+
+''SoftSign'' is defined as ''f_i(x) = x_i / (1+|x_i|)''
+
+<file lua>
+ii=lab.linspace(-5,5)
+m=nn.SoftSign()
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{softsign.png?400}}
+
+====  LogSigmoid ====
+{{anchor:nn.LogSigmoid}}
+
+Applies the ''LogSigmoid'' function to an n-dimensional input Tensor.
+
+''LogSigmoid'' is defined as ''f_i(x)'' = ''log(1/(1+ exp(-x_i)))''.
+
+
+<file lua>
+ii=lab.randn(10)
+m=nn.LogSigmoid()
+oo=m:forward(ii)
+go=lab.ones(10)
+gi=m:backward(ii,go)
+gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{logsigmoid.png?400}}
+
+
+====  LogSoftMax ====
+{{anchor:nn.LogSoftMax}}
+
+Applies the ''LogSoftmax'' function to an n-dimensional input Tensor.
+
+''LogSoftmax'' is defined as ''f_i(x)'' = ''log(1/a exp(x_i))'',
+where  ''a'' = ''sum_j exp(x_j)''.
+
+<file lua>
+ii=lab.randn(10)
+m=nn.LogSoftMax()
+oo=m:forward(ii)
+go=lab.ones(10)
+gi=m:backward(ii,go)
+gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{logsoftmax.png?400}}
+
+====  Sigmoid ====
+{{anchor:nn.Sigmoid}}
+
+Applies the ''Sigmoid'' function element-wise to the input Tensor,
+thus outputting a Tensor of the same dimension.
+
+''Sigmoid'' is defined as ''f(x)'' = ''1/(1+exp(-x))''.
+
+<file lua>
+ii=lab.linspace(-5,5)
+m=nn.Sigmoid()
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{sigmoid.png?400}}
+
+====  Tanh ====
+{{anchor:nn.Tanh}}
+
+Applies the ''Tanh'' function element-wise to the input Tensor,
+thus outputting a Tensor of the same dimension.
+
+<file lua>
+ii=lab.linspace(-3,3)
+m=nn.Tanh()
+oo=m:forward(ii)
+go=lab.ones(100)
+gi=m:backward(ii,go)
+gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'})
+gnuplot.grid(true)
+</file>
+{{tanh.png?400}}
+
+=====  Convolutional layers =====
+{{anchor:nn.convlayers.dok}}
+
+SpatialConvolution and SpatialSubsampling apply to inputs with
+two-dimensional relationships (e.g. images).  TemporalConvolution and
+TemporalSubsampling apply to sequences with a one-dimensional
+relationship (e.g. strings of some kind).
+
+For spatial convolutional layers, the input is supposed to be 3D. The
+first dimension is the number of features, the last two dimenstions
+are spatial.
+
+====  SpatialConvolution ====
+{{anchor:nn.SpatialConvolution}}
+
+<file lua>
+module = nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH])
+</file>
+
+Applies a 2D convolution over an input image composed of several input planes. The ''input'' tensor in
+''forward(input)'' is expected to be a 3D tensor (''width x height x nInputPlane'').
+
+The parameters are the following:
+  * ''nInputPlane'': The number of expected input planes in the image given into ''forward()''.
+  * ''nOutputPlane'': The number of output planes the convolution layer will produce.
+  * ''kW'': The kernel width of the convolution
+  * ''kH'': The kernel height of the convolution
+  * ''dW'': The step of the convolution in the width dimension. Default is ''1''.
+  * ''dH'': The step of the convolution in the height dimension. Default is ''1''.
+
+Note that depending of the size of your kernel, several (of the last)
+columns or rows of the input image might be lost. It is up to the user to
+add proper padding in images.
+
+If the input image is a 3D tensor ''nInputPlane x width x height'', the output image size
+will be ''nOutputPlane x owidth x oheight'' where
+<file lua>
+owidth  = (width  - kW) / dW + 1
+oheight = (height - kH) / dH + 1 .
+</file>
+
+The parameters of the convolution can be found in ''self.weight'' (Tensor of
+size ''nOutputPlane x nInputPlane x kH x kW'') and ''self.bias'' (Tensor of
+size ''nOutputPlane''). The corresponding gradients can be found in
+''self.gradWeight'' and ''self.gradBias''.
+
+The output value of the layer can be precisely described as:
+<file lua>
+output[i][j][k] = bias[k]
+  + sum_l sum_{s=1}^kW sum_{t=1}^kH weight[s][t][l][k]
+                                    * input[dW*(i-1)+s)][dH*(j-1)+t][l]
+</file>
+
+====  SpatialConvolutionMap ====
+{{anchor:nn.SpatialConvolutionMap}}
+
+<file lua>
+module = nn.SpatialConvolutionMap(connectionMatrix, kW, kH, [dW], [dH])
+</file>
+
+This class is a generalization of
+[[#nn.SpatialConvolution|nn.SpatialConvolution]]. It uses a geenric
+connection table between input and output features. The
+[[#nn.SpatialConvolution|nn.SpatialConvolution]] is equivalent to
+using a [[#nn.tables.full|full connection table]]. One can specify
+different types of connection tables.
+
+=== Full Connection Table ===
+{{anchor:nn.tables.full}}
+
+''table = nn.tables.full(nin,nout)''
+
+This is a precomputed table that specifies connections between every
+input and output node.
+
+=== One to One Connection Table ===
+{{anchor:nn.tables.onetoone}}
+
+''table = nn.tables.oneToOne(n)''
+
+This is a precomputed table that specifies a single connection to each
+output node from corresponding input node.
+
+=== Random Connection Table ===
+{{anchor:nn.tables.random}}
+
+''table = nn.tables.random(nin,nout, nto)''
+
+This table is randomly populated such that each output unit has
+''nto'' incoming connections. The algorihtm tries to assign uniform
+number of outgoing connections to each input node if possible.
+
+==== SpatialLPPooling ====
+{{anchor:nn.SpatialLPPooling}}
+
+<file lua>
+module = nn.SpatialLPPooling(nInputPlane, pnorm, kW, kH, [dW], [dH])
+</file>
+
+Computes the ''p'' norm in a convolutional manner on a set of 2D input planes.
+
+==== SpatialMaxPooling ====
+{{anchor:nn.SpatialMaxPooling}}
+
+<file lua>
+module = nn.SpatialMaxPooling(kW, kH [, dW, dH])
+</file>
+
+Applies 2D max-pooling operation in ''kWxkH'' regions by step size
+''dWxdH'' steps. The number of output features is equal to the number of
+input planes.
+
+====  SpatialSubSampling ====
+{{anchor:nn.SpatialSubSampling}}
+
+<file lua>
+module = nn.SpatialSubSampling(nInputPlane, kW, kH, [dW], [dH])
+</file>
+
+Applies a 2D sub-sampling over an input image composed of several input planes. The ''input'' tensor in
+''forward(input)'' is expected to be a 3D tensor (''nInputPlane x width x height''). The number of output
+planes will be the same as ''nInputPlane''.
+
+The parameters are the following:
+  * ''nInputPlane'': The number of expected input planes in the image given into ''forward()''.
+  * ''kW'': The kernel width of the sub-sampling
+  * ''kH'': The kernel height of the sub-sampling
+  * ''dW'': The step of the sub-sampling in the width dimension. Default is ''1''.
+  * ''dH'': The step of the sub-sampling in the height dimension. Default is ''1''.
+
+Note that depending of the size of your kernel, several (of the last)
+columns or rows of the input image might be lost. It is up to the user to
+add proper padding in images.
+
+If the input image is a 3D tensor ''width x height x nInputPlane'', the output image size
+will be ''owidth x oheight x nInputPlane'' where
+<file lua>
+owidth  = (width  - kW) / dW + 1
+oheight = (height - kH) / dH + 1 .
+</file>
+
+The parameters of the sub-sampling can be found in ''self.weight'' (Tensor of
+size ''nInputPlane'') and ''self.bias'' (Tensor of size ''nInputPlane''). The
+corresponding gradients can be found in ''self.gradWeight'' and
+''self.gradBias''.
+
+The output value of the layer can be precisely described as:
+<file lua>
+output[i][j][k] = bias[k]
+  + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k]
+</file>
+
+==== SpatialZeroPadding ====
+{{anchor:nn.SpatialZeroPadding}}
+
+<file lua>
+module = nn.SpatialZeroPadding(padLeft, padRight, padTop, padBottom)
+</file>
+
+Each feature map of a given input is padded with specified number of
+zeros. If padding values are negative, then input is cropped.
+
+==== SpatialSubtractiveNormalization ====
+{{anchor:nn.SpatialSubtractiveNormalization}}
+
+<file lua>
+module = nn.SpatialSubtractiveNormalization(ninputplane, kernel)
+</file>
+
+Applies a spatial subtraction operation on a series of 2D inputs using
+''kernel'' for computing the weighted average in a neighborhood. The
+neighborhood is defined for a local spatial region that is the size as
+kernel and across all features. For a an input image, since there is
+only one feature, the region is only spatial. For an RGB image, the
+weighted anerage is taken over RGB channels and a spatial region.
+
+If the ''kernel'' is 1D, then it will be used for constructing and seperable
+2D kernel. The operations will be much more efficient in this case.
+
+The kernel is generally chosen as a gaussian when it is believed that
+the correlation of two pixel locations decrease with increasing
+distance. On the feature dimension, a uniform average is used since
+the weighting across features is not known.
+
+For this example we use an external package
+[[http://www.github.com/clementfarabet/lua---image/|image]]
+
+<file lua>
+require 'image'
+require 'nn'
+lena = image.rgb2y(image.lena())
+ker = lab.ones(11)
+m=nn.SpatialSubtractiveNormalization(1,ker)
+processed = m:forward(lena)
+w1=image.display(lena)
+w2=image.display(processed)
+</file>
+{{lena.jpg?300}}{{lenap.jpg?300}}
+
+====  TemporalConvolution ====
+{{anchor:nn.TemporalConvolution}}
+
+<file lua>
+module = nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW])
+</file>
+
+Applies a 1D convolution over an input sequence composed of ''nInputFrame'' frames. The ''input'' tensor in
+''forward(input)'' is expected to be a 2D tensor (''nInputFrame x inputFrameSize'').
+
+The parameters are the following:
+  * ''inputFrameSize'': The input frame size expected in sequences given into ''forward()''.
+  * ''outputFrameSize'': The output frame size the convolution layer will produce.
+  * ''kW'': The kernel width of the convolution
+  * ''dW'': The step of the convolution. Default is ''1''.
+
+Note that depending of the size of your kernel, several (of the last)
+frames of the sequence might be lost. It is up to the user to add proper padding frames in the input
+sequences.
+
+If the input sequence is a 2D tensor ''inputFrameSize x nInputFrame'', the output sequence will be
+''nOutputFrame x outputFrameSize'' where
+<file lua>
+nOutputFrame = (nInputFrame - kW) / dW + 1
+</file>
+
+The parameters of the convolution can be found in ''self.weight'' (Tensor of
+size ''outputFrameSize x (inputFrameSize x kW) '') and ''self.bias'' (Tensor of
+size ''outputFrameSize''). The corresponding gradients can be found in
+''self.gradWeight'' and ''self.gradBias''.
+
+The output value of the layer can be precisely described as:
+<file lua>
+output[i][t] = bias[i]
+  + sum_j sum_{k=1}^kW weight[j][k][i]
+                                * input[j][dW*(t-1)+k)]
+</file>
+
+Here is a simple example:
+
+<file lua>
+inp=5;  -- dimensionality of one sequence element 
+outp=1; -- number of derived features for one sequence element
+kw=1;   -- kernel only operates on one sequence element at once
+dw=1;   -- we step once and go on to the next sequence element
+
+mlp=nn.TemporalConvolution(inp,outp,kw,dw)
+
+require "lab"
+x=lab.rand(7,inp) -- a sequence of 7 elements
+print(mlp:forward(x))
+</file>
+which gives:
+<file lua>
+-0.9109
+-0.9872
+-0.6808
+-0.9403
+-0.9680 
+-0.6901 
+-0.6387
+[torch.Tensor of dimension 7x1]
+</file>
+
+This is equivalent to:
+<file lua>
+weights=lab.reshape(mlp.weight,inp) -- weights applied to all
+bias= mlp.bias[1];
+for i=1,x:size(1) do -- for each sequence element
+  element= x[i]; -- features of ith sequence element
+  print(element:dot(weights) + bias)
+end
+</file>
+which gives:
+<file lua>
+-0.91094998687717
+-0.98721705771773
+-0.68075004276185
+-0.94030132495887
+-0.96798754116609
+-0.69008470895581
+-0.63871422284166
+</file>
+
+
+====  TemporalSubSampling ====
+{{anchor:nn.TemporalSubSampling}}
+
+<file lua>
+module = nn.TemporalSubSampling(inputFrameSize, kW, [dW])
+</file>
+
+Applies a 1D sub-sampling over an input sequence composed of ''nInputFrame'' frames. The ''input'' tensor in
+''forward(input)'' is expected to be a 2D tensor (''nInputFrame x inputFrameSize''). The output frame size
+will be the same as the input one (''inputFrameSize'').
+
+The parameters are the following:
+  * ''inputFrameSize'': The input frame size expected in sequences given into ''forward()''.
+  * ''kW'': The kernel width of the sub-sampling
+  * ''dW'': The step of the sub-sampling. Default is ''1''.
+
+Note that depending of the size of your kernel, several (of the last)
+frames of the sequence might be lost. It is up to the user to add proper padding frames in the input
+sequences.
+
+If the input sequence is a 2D tensor ''nInputFrame x inputFrameSize'', the output sequence will be
+''inputFrameSize x nOutputFrame'' where
+<file lua>
+nOutputFrame = (nInputFrame - kW) / dW + 1
+</file>
+
+The parameters of the sub-sampling can be found in ''self.weight'' (Tensor of
+size ''inputFrameSize'') and ''self.bias'' (Tensor of
+size ''inputFrameSize''). The corresponding gradients can be found in
+''self.gradWeight'' and ''self.gradBias''.
+
+The output value of the layer can be precisely described as:
+<file lua>
+output[i][t] = bias[i] + weight[i] * sum_{k=1}^kW input[i][dW*(t-1)+k)]
+</file>
+
+====  LookupTable ====
+{{anchor:nn.LookupTable}}
+
+<file lua>
+module = nn.LookupTable(nIndex, sizes)
+</file>
+or
+<file lua>
+module = nn.LookupTable(nIndex, size1, [size2], [size3], ...)
+</file>
+
+This layer is a particular case of a convolution, where the width of the convolution would be ''1''.
+When calling ''forward(input)'', it assumes ''input'' is a 1D tensor filled with indices. Indices start
+at ''1'' and can go up to ''nIndex''. For each index, it outputs a corresponding ''Tensor'' of size
+specified by ''sizes'' (an ''LongStorage'') or ''size1 x size2 x...''.
+
+The output tensors are concatenated, generating a ''size1 x size2 x ... x sizeN x n'' tensor, where ''n''
+is the size of the ''input'' tensor.
+
+When only ''size1'' is provided, this is equivalent to do the following matrix-matrix multiplication
+in an efficient manner:
+<file lua>
+M P
+</file>
+where ''M'' is a 2D matrix ''size1 x nIndex'' containing the parameters of the lookup-table and
+''P'' is a 2D matrix, where each column vector ''i'' is a zero vector except at index ''input[i]'' where it is ''1''.
+
+Example:
+<file lua>
+ -- a lookup table containing 10 tensors of size 3
+ module = nn.LookupTable(10, 3) 
+
+ input = torch.Tensor(4)
+ input[1] = 1; input[2] = 2; input[3] = 1; input[4] = 10;
+ print(module:forward(input))
+</file>
+
+Outputs something like:
+<file lua>
+-0.1784  2.2045 -0.1784 -0.2475
+-1.0120  0.0537 -1.0120 -0.2148
+-1.2840  0.8685 -1.2840 -0.2792
+[torch.Tensor of dimension 3x4]
+</file>
+Note that the first column vector is the same than the 3rd one!
+
+=====  Layers for manipulating tables =====
+{{anchor:nn.TableLayers}}
+
+This set of modules allows the manipulation of  Tables
+through the layers of a neural network.
+This allows one to build very rich architectures.
+
+Table-based modules work by supporting forward and backward methods that can accept 
+tables as inputs. It turns out that the usual [[#nn.Sequential|Sequential]] module can do this, so all that is needed is other child modules that take advantage of such tables.
+<file lua>
+mlp = nn.Sequential();
+t={x,y,z}
+pred=mlp:forward(t)
+pred=mlp:forward{x,y,z}      -- This is equivalent to the line before
+</file>
+
+====  ConcatTable  ====
+{{anchor:nn.ConcatTable}}
+
+ConcatTable is a container module that applies each member module to 
+the same input Tensor.
+
+Example:
+<file lua>
+mlp= nn.ConcatTable()
+mlp:add(nn.Linear(5,2))
+mlp:add(nn.Linear(5,3))
+
+require "lab"
+pred=mlp:forward(lab.randn(5));
+for i,k in pairs(pred) do print(i,k); end
+</file>
+which gives the output:
+<file lua>
+1
+-0.4073
+ 0.0110
+[torch.Tensor of dimension 2]
+
+2
+ 0.0027
+-0.0598
+-0.1189
+[torch.Tensor of dimension 3] 
+</file>
+
+====  ParallelTable ====
+{{anchor:nn.ParallelTable}}
+
+ParallelTable is a container module that, in its ''forward'' method, applies the ''ith'' member module to the ''ith'' input, and outputs a table of the set of outputs. 
+
+Example:
+<file lua>
+mlp= nn.ParallelTable()
+mlp:add(nn.Linear(10,2))
+mlp:add(nn.Linear(5,3))
+
+require "lab"
+x=lab.randn(10)
+y=lab.rand(5)
+
+pred=mlp:forward{x,y}
+for i,k in pairs(pred) do print(i,k); end
+</file>
+which gives the output:
+<file lua>
+1
+ 0.0331
+ 0.7003
+[torch.Tensor of dimension 2]
+
+2
+ 0.0677
+-0.1657
+-0.7383
+[torch.Tensor of dimension 3]
+</file>
+
+====  SplitTable  ====
+{{anchor:nn.SplitTable}}
+
+''module'' = ''SplitTable(dimension)''
+
+Creates a module that takes a Tensor as input and outputs several tables, splitting the Tensor along dimension ''dimension''.
+
+Example 1:
+<file lua>
+require "lab"
+mlp=nn.SplitTable(2)
+x=lab.randn(4,3)
+pred=mlp:forward(x)
+for i,k in pairs(pred) do print(i,k); end
+</file>
+gives the output:
+<file lua>
+1
+ 1.3885
+ 1.3295
+ 0.4281
+-1.0171
+[torch.Tensor of dimension 4]
+
+2
+-1.1565
+-0.8556
+-1.0717
+-0.8316
+[torch.Tensor of dimension 4]
+
+3
+-1.3678
+-0.1709
+-0.0191
+-2.5871
+[torch.Tensor of dimension 4]
+</file>
+
+Example 2:
+<file lua>
+require "lab"
+mlp=nn.SplitTable(1)
+pred=mlp:forward(lab.randn(10,3))
+for i,k in pairs(pred) do print(i,k); end
+</file>
+gives the output:
+<file lua>
+1
+ 1.6114
+ 0.9038
+ 0.8419
+[torch.Tensor of dimension 3]
+
+2
+ 2.4742
+ 0.2208
+ 1.6043
+[torch.Tensor of dimension 3]
+
+3
+ 1.3415
+ 0.2984
+ 0.2260
+[torch.Tensor of dimension 3]
+
+4
+ 2.0889
+ 1.2309
+ 0.0983
+[torch.Tensor of dimension 3]
+</file>
+
+A more complicated example:
+<file lua>
+require "lab"
+
+mlp=nn.Sequential();       --Create a network that takes a Tensor as input
+mlp:add(nn.SplitTable(2))
+ c=nn.ParallelTable()      --The two Tensors go through two different Linear
+ c:add(nn.Linear(10,3))	   --Layers in Parallel
+ c:add(nn.Linear(10,7))
+mlp:add(c)                 --Outputing a table with 2 elements
+ p=nn.ParallelTable()      --These tables go through two more linear layers
+ p:add(nn.Linear(3,2))	   -- separately.
+ p:add(nn.Linear(7,1)) 
+mlp:add(p) 
+mlp:add(nn.JoinTable(1))   --Finally, the tables are joined together and output. 
+
+pred=mlp:forward(lab.randn(10,2))
+print(pred)
+
+for i=1,100 do             -- A few steps of training such a network.. 
+ x=lab.ones(10,2);
+ y=torch.Tensor(3); y:copy(x:select(2,1,1):narrow(1,1,3))
+ pred=mlp:forward(x)
+
+ criterion= nn.MSECriterion()
+ local err=criterion:forward(pred,y)
+ local gradCriterion = criterion:backward(pred,y);
+ mlp:zeroGradParameters();
+ mlp:backward(x, gradCriterion); 
+ mlp:updateParameters(0.05);
+
+ print(err)
+end
+</file>
+
+====  JoinTable   ====
+{{anchor:nn.JoinTable}}
+
+''module'' = ''JoinTable(dimension)''
+
+Creates a module that takes a list of Tensors as input and outputs a Tensor by joining them together along dimension ''dimension''.
+
+Example:
+<file lua>
+require "lab"
+x=lab.randn(5,1)
+y=lab.randn(5,1)
+z=lab.randn(2,1)
+
+print(nn.JoinTable(1):forward{x,y})
+print(nn.JoinTable(2):forward{x,y})
+print(nn.JoinTable(1):forward{x,z})
+</file>
+gives the output:
+<file lua>
+1.3965
+ 0.5146
+-1.5244
+-0.9540
+ 0.4256
+ 0.1575
+ 0.4491
+ 0.6580
+ 0.1784
+-1.7362
+ 
+ 1.3965  0.1575
+ 0.5146  0.4491
+-1.5244  0.6580
+-0.9540  0.1784
+ 0.4256 -1.7362
+
+ 1.3965
+ 0.5146
+-1.5244
+-0.9540
+ 0.4256
+-1.2660
+ 1.0869
+[torch.Tensor of dimension 7x1]
+</file>
+
+A more complicated example:
+<file lua>
+require "lab"
+
+mlp=nn.Sequential();       --Create a network that takes a Tensor as input
+ c=nn.ConcatTable()        --The same Tensor goes through two different Linear
+ c:add(nn.Linear(10,3))	   --Layers in Parallel
+ c:add(nn.Linear(10,7))
+mlp:add(c)                 --Outputing a table with 2 elements
+ p=nn.ParallelTable()      --These tables go through two more linear layers
+ p:add(nn.Linear(3,2))	   -- separately.
+ p:add(nn.Linear(7,1)) 
+mlp:add(p) 
+mlp:add(nn.JoinTable(1))   --Finally, the tables are joined together and output. 
+
+pred=mlp:forward(lab.randn(10))
+print(pred)
+
+for i=1,100 do             -- A few steps of training such a network.. 
+ x=lab.ones(10);
+ y=torch.Tensor(3); y:copy(x:narrow(1,1,3))
+ pred=mlp:forward(x)
+
+ criterion= nn.MSECriterion()
+ local err=criterion:forward(pred,y)
+ local gradCriterion = criterion:backward(pred,y);
+ mlp:zeroGradParameters();
+ mlp:backward(x, gradCriterion); 
+ mlp:updateParameters(0.05);
+
+ print(err)
+end
+</file>
+
+====  Identity  ====
+{{anchor:nn.Identity}}
+
+''module'' = ''Identity()''
+
+Creates a module that returns whatever is input to it as output. 
+This is useful when combined with the module 
+[[#nn.ParallelTable|ParallelTable]]
+in case you do not wish to do anything to one of the input Tensors.
+Example:
+<file lua>
+require "lab"
+mlp=nn.Identity()
+print(mlp:forward(lab.ones(5,2)))
+</file>
+gives the output: 
+<file lua>
+ 1  1
+ 1  1
+ 1  1
+ 1  1
+ 1  1
+[torch.Tensor of dimension 5x2]
+</file>
+
+Here is a more useful example, where one can implement a network which also computes a Criterion using this module:
+<file lua> 
+pred_mlp=nn.Sequential(); -- A network that makes predictions given x.
+pred_mlp:add(nn.Linear(5,4)) 
+pred_mlp:add(nn.Linear(4,3)) 
+
+xy_mlp=nn.ParallelTable();-- A network for predictions and for keeping the
+xy_mlp:add(pred_mlp)      -- true label for comparison with a criterion
+xy_mlp:add(nn.Identity()) -- by forwarding both x and y through the network.
+
+mlp=nn.Sequential();     -- The main network that takes both x and y.
+mlp:add(xy_mlp)		 -- It feeds x and y to parallel networks;
+cr=nn.MSECriterion();
+cr_wrap=nn.CriterionTable(cr)
+mlp:add(cr_wrap)         -- and then applies the criterion.
+
+for i=1,100 do 		 -- Do a few training iterations
+  x=lab.ones(5);          -- Make input features.
+  y=torch.Tensor(3); 
+  y:copy(x:narrow(1,1,3)) -- Make output label.
+  err=mlp:forward{x,y}    -- Forward both input and output.
+  print(err)		 -- Print error from criterion.
+
+  mlp:zeroGradParameters();  -- Do backprop... 
+  mlp:backward({x, y} );   
+  mlp:updateParameters(0.05); 
+end
+</file>
+
+====  PairwiseDistance  ====
+{{anchor:nn.PairwiseDistance}}
+
+''module'' = ''PairwiseDistance(p)'' creates a module that takes a table of two vectors as input and outputs the distance between them using the ''p''-norm. 
+
+Example:
+<file lua>
+mlp_l1=nn.PairwiseDistance(1)
+mlp_l2=nn.PairwiseDistance(2)
+x=lab.new(1,2,3) 
+y=lab.new(4,5,6)
+print(mlp_l1:forward({x,y}))
+print(mlp_l2:forward({x,y}))
+</file>
+gives the output:
+<file lua>
+ 9
+[torch.Tensor of dimension 1]
+
+ 5.1962
+[torch.Tensor of dimension 1]
+</file>
+
+A more complicated example:
+<file lua>
+-- imagine we have one network we are interested in, it is called "p1_mlp"
+p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2))
+
+-- But we want to push examples towards or away from each other
+-- so we make another copy of it called p2_mlp
+-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage
+-- that's why we create it again (so that the gradients of the pair don't wipe each other)
+p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2))
+p2_mlp:get(1).weight:set(p1_mlp:get(1).weight)
+p2_mlp:get(1).bias:set(p1_mlp:get(1).bias)
+
+-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp
+prl = nn.ParallelTable()
+prl:add(p1_mlp)
+prl:add(p2_mlp)
+
+-- now we define our top level network that takes this parallel table and computes the pairwise distance betweem
+-- the pair of outputs
+mlp= nn.Sequential()
+mlp:add(prl)
+mlp:add(nn.PairwiseDistance(1))
+
+-- and a criterion for pushing together or pulling apart pairs
+crit=nn.HingeEmbeddingCriterion(1)
+
+-- lets make two example vectors
+x=lab.rand(5)
+y=lab.rand(5)
+
+
+-- Use a typical generic gradient update function
+function gradUpdate(mlp, x, y, criterion, learningRate)
+local pred = mlp:forward(x)
+local err = criterion:forward(pred, y)
+local gradCriterion = criterion:backward(pred, y)
+mlp:zeroGradParameters()
+mlp:backward(x, gradCriterion)
+mlp:updateParameters(learningRate)
+end
+
+-- push the pair x and y together, notice how then the distance between them given
+-- by  print(mlp:forward({x,y})[1]) gets smaller
+for i=1,10 do
+gradUpdate(mlp,{x,y},1,crit,0.01)
+print(mlp:forward({x,y})[1])
+end
+
+
+-- pull apart the pair x and y, notice how then the distance between them given
+-- by  print(mlp:forward({x,y})[1]) gets larger
+
+for i=1,10 do
+gradUpdate(mlp,{x,y},-1,crit,0.01)
+print(mlp:forward({x,y})[1])
+end
+
+</file>
+
+====  DotProduct ====
+{{anchor:nn.DotProduct}}
+
+''module'' = ''DotProduct()'' creates a module that takes a table of two vectors as input and outputs the dot product between them.
+
+Example:
+<file lua>
+mlp=nn.DotProduct()
+x=lab.new(1,2,3) 
+y=lab.new(4,5,6)
+print(mlp:forward({x,y}))
+</file>
+gives the output:
+<file lua>
+ 32
+[torch.Tensor of dimension 1]
+</file>
+
+
+A more complicated example:
+<file lua>
+
+-- Train a ranking function so that mlp:forward({x,y},{x,z}) returns a number
+-- which indicates whether x is better matched with y or z (larger score = better match), or vice versa.
+
+mlp1=nn.Linear(5,10)
+mlp2=mlp1:clone('weight','bias')
+
+prl=nn.ParallelTable();
+prl:add(mlp1); prl:add(mlp2)
+
+mlp1=nn.Sequential()
+mlp1:add(prl)
+mlp1:add(nn.DotProduct())
+
+mlp2=mlp1:clone('weight','bias')
+
+mlp=nn.Sequential()
+prla=nn.ParallelTable()
+prla:add(mlp1)
+prla:add(mlp2)
+mlp:add(prla)
+
+x=lab.rand(5); 
+y=lab.rand(5)
+z=lab.rand(5)
+
+
+print(mlp1:forward{x,x})
+print(mlp1:forward{x,y})
+print(mlp1:forward{y,y})
+
+
+crit=nn.MarginRankingCriterion(1); 
+
+-- Use a typical generic gradient update function
+function gradUpdate(mlp, x, y, criterion, learningRate)
+   local pred = mlp:forward(x)
+   local err = criterion:forward(pred, y)
+   local gradCriterion = criterion:backward(pred, y)
+   mlp:zeroGradParameters()
+   mlp:backward(x, gradCriterion)
+   mlp:updateParameters(learningRate)
+end
+
+inp={{x,y},{x,z}}
+
+math.randomseed(1)
+
+-- make the pair x and y have a larger dot product than x and z
+
+for i=1,100 do
+   gradUpdate(mlp,inp,1,crit,0.05)
+   o1=mlp1:forward{x,y}[1]; 
+   o2=mlp2:forward{x,z}[1]; 
+   o=crit:forward(mlp:forward{{x,y},{x,z}},1)
+   print(o1,o2,o)
+end
+
+print "******************"
+
+-- make the pair x and z have a larger dot product than x and y
+
+for i=1,100 do
+   gradUpdate(mlp,inp,-1,crit,0.05)
+   o1=mlp1:forward{x,y}[1]; 
+   o2=mlp2:forward{x,z}[1]; 
+   o=crit:forward(mlp:forward{{x,y},{x,z}},-1)
+   print(o1,o2,o)
+end
+</file>
+
+
+====  CosineDistance  ====
+{{anchor:nn.CosineDistance}}
+
+''module'' = ''CosineDistance()'' creates a module that takes a table of two vectors as input and outputs the cosine distance between them.
+
+Example:
+<file lua>
+mlp=nn.CosineDistance()
+x=lab.new(1,2,3) 
+y=lab.new(4,5,6)
+print(mlp:forward({x,y}))
+</file>
+gives the output:
+<file lua>
+ 0.9746
+[torch.Tensor of dimension 1]
+</file>
+
+A more complicated example:
+<file lua>
+
+-- imagine we have one network we are interested in, it is called "p1_mlp"
+p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2))
+
+-- But we want to push examples towards or away from each other
+-- so we make another copy of it called p2_mlp
+-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage
+-- that's why we create it again (so that the gradients of the pair don't wipe each other)
+p2_mlp= p1_mlp:clone('weight','bias')
+
+-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp
+prl = nn.ParallelTable()
+prl:add(p1_mlp)
+prl:add(p2_mlp)
+
+-- now we define our top level network that takes this parallel table and computes the cosine distance betweem
+-- the pair of outputs
+mlp= nn.Sequential()
+mlp:add(prl)
+mlp:add(nn.CosineDistance())
+
+
+-- lets make two example vectors
+x=lab.rand(5)
+y=lab.rand(5)
+
+-- Grad update function..
+function gradUpdate(mlp, x, y, learningRate)
+local pred = mlp:forward(x)
+if pred[1]*y < 1 then
+ gradCriterion=lab.new(-y)
+ mlp:zeroGradParameters()
+ mlp:backward(x, gradCriterion)
+ mlp:updateParameters(learningRate)
+end
+end
+
+-- push the pair x and y together, the distance should get larger..
+for i=1,1000 do
+ gradUpdate(mlp,{x,y},1,0.1)
+ if ((i%100)==0) then print(mlp:forward({x,y})[1]);end
+end
+
+
+-- pull apart the pair x and y, the distance should get smaller..
+
+for i=1,1000 do
+ gradUpdate(mlp,{x,y},-1,0.1)
+ if ((i%100)==0) then print(mlp:forward({x,y})[1]);end
+end
+</file>
+
+
+
+====  CriterionTable  ====
+{{anchor:nn.CriterionTable}}
+
+''module'' = ''CriterionTable(criterion)''
+
+Creates a module that wraps a Criterion module so that it can accept a Table of inputs. Typically the table would contain two elements: the input and output ''x'' and ''y'' that the Criterion compares.
+
+Example:
+<file lua>
+mlp = nn.CriterionTable(nn.MSECriterion())
+require "lab"
+x=lab.randn(5)
+y=lab.randn(5)
+print(mlp:forward{x,x})
+print(mlp:forward{x,y})
+</file>
+gives the output:
+<file lua>
+0
+1.9028918413199
+</file>
+
+Here is a more complex example of embedding the criterion into a network:
+<file lua>
+require "lab"
+
+function table.print(t)
+ for i,k in pairs(t) do print(i,k); end
+end
+ 
+mlp=nn.Sequential();                          -- Create an mlp that takes input
+  main_mlp=nn.Sequential();		      -- and output using ParallelTable      
+  main_mlp:add(nn.Linear(5,4)) 
+  main_mlp:add(nn.Linear(4,3))
+ cmlp=nn.ParallelTable(); 
+ cmlp:add(main_mlp)
+ cmlp:add(nn.Identity())           
+mlp:add(cmlp)
+mlp:add(nn.CriterionTable(nn.MSECriterion())) -- Apply the Criterion
+
+for i=1,20 do                                 -- Train for a few iterations
+ x=lab.ones(5);
+ y=torch.Tensor(3); y:copy(x:narrow(1,1,3))
+ err=mlp:forward{x,y}                         -- Pass in both input and output
+ print(err)
+
+ mlp:zeroGradParameters();
+ mlp:backward({x, y} );   
+ mlp:updateParameters(0.05); 
+end
+</file>
+
+==== CAddTable ====
+{{anchor:nn.CAddTable}}
+
+Takes a table of tensors and outputs summation of all tensors.
+
+<file lua>
+ii = {lab.ones(5),lab.ones(5)*2,lab.ones(5)*3}
+=ii[1]
+ 1
+ 1
+ 1
+ 1
+ 1
+[torch.DoubleTensor of dimension 5]
+
+return ii[2]
+ 2
+ 2
+ 2
+ 2
+ 2
+[torch.DoubleTensor of dimension 5]
+
+return ii[3]
+ 3
+ 3
+ 3
+ 3
+ 3
+[torch.DoubleTensor of dimension 5]
+
+m=nn.CAddTable()
+=m:forward(ii)
+ 6
+ 6
+ 6
+ 6
+ 6
+[torch.DoubleTensor of dimension 5]
+
+
+==== CSubTable ====
+{{anchor:nn.CSubTable}}
+
+Takes a table with two tensor and returns the component-wise
+subtraction between them.
+
+<file lua>
+m=nn.CSubTable()
+=m:forward({lab.ones(5)*2.2,lab.ones(5)})
+ 1.2000
+ 1.2000
+ 1.2000
+ 1.2000
+ 1.2000
+[torch.DoubleTensor of dimension 5]
+</file>
+
+==== CMulTable ====
+{{anchor:nn.CMulTable}}
+
+Takes a table of tensors and outputs the multiplication of all of them.
+
+<file lua>
+ii = {lab.ones(5)*2,lab.ones(5)*3,lab.ones(5)*4}
+m=nn.CMulTable()
+=m:forward(ii)
+ 24
+ 24
+ 24
+ 24
+ 24
+[torch.DoubleTensor of dimension 5]
+
+</file>
+
+==== CDivTable ====
+{{anchor:nn.CDivTable}}
+
+Takes a table with two tensor and returns the component-wise
+division between them.
+
+<file lua>
+m=nn.CDivTable()
+=m:forward({lab.ones(5)*2.2,lab.ones(5)*4.4})
+ 0.5000
+ 0.5000
+ 0.5000
+ 0.5000
+ 0.5000
+[torch.DoubleTensor of dimension 5]
+</file>
+
+======  Criterions ======
+{{anchor:nn.Criterions}}
+
+Criterions are helpful to train a neural network. Given an input and a
+target, they compute a gradient according to a given loss
+function. [[#nn.AbsCriterion|AbsCriterion]] and
+[[#nn.MSECriterion|MSECriterion]] are perfect for regression problems, while
+[[#nn.ClassNLLCriterion|ClassNLLCriterion]] is the criterion of choice when
+dealing with classification.
+
+Criterions are [[..:torch:file#torch.file.serialization|serializable]].
+
+=====  Criterion =====
+{{anchor:nn.Criterion}}
+
+This is an abstract class which declares methods defined in all criterions.
+This class is [[..:torch:file#torch.file.serialization|serializable]].
+
+====  [output] forward(input, target) ====
+{{anchor:nn.Criterion.forward}}
+
+Given an ''input'' and a ''target'', compute the loss function associated to the criterion and return the
+result. In general ''input'' and ''target'' are [[..:torch:tensor|tensors]], but some specific criterions
+might require some other type of object.
+
+The ''output'' returned should be a scalar in general.
+
+The state variable [[#nn.Criterion.output|self.output]] should be updated after a call to ''forward()''.
+
+====  [gradInput] backward(input, target) ====
+{{anchor:nn.Criterion.backward}}
+
+Given an ''input'' and a ''target'', compute the gradients of the loss function associated to the criterion and
+return the result.In general ''input'', ''target'' and ''gradInput'' are [[..:torch:tensor|tensors]], but some specific criterions
+might require some other type of object.
+
+The state variable [[#nn.Criterion.gradInput|self.gradInput]] should be updated after a call to ''backward()''.
+
+====  State variable: output ====
+{{anchor:nn.Criterion.output}}
+
+State variable which contains the result of the last [[#nn.Criterion.forward|forward(input, target)]] call.
+
+====  State variable: gradInput ====
+{{anchor:nn.Criterion.gradInput}}
+
+State variable which contains the result of the last [[#nn.Criterion.backward|backward(input, target)]] call.
+
+=====  AbsCriterion =====
+{{anchor:nn.AbsCriterion}}
+
+<file lua>
+criterion = AbsCriterion()
+</file>
+
+Creates a criterion that
+measures the mean absolute value between ''n'' elements in the input ''x'' 
+and output ''y'':
+
+''loss(x,y)''  = ''1/n \sum |x_i-y_i|''.
+
+If ''x'' and ''y'' are ''d''-dimensional Tensors with a total of ''n'' elements,
+the sum operation still operates over all the elements, and divides by ''n''.
+
+The division by ''n'' can be avoided if one sets the internal variable ''sizeAverage'' to ''false'':
+<file lua>
+criterion = nn.AbsCriterion()
+criterion.sizeAverage = false
+</file>
+
+=====  ClassNLLCriterion =====
+{{anchor:nn.ClassNLLCriterion}}
+
+<file lua>
+criterion = ClassNLLCriterion()
+</file>
+
+The negative log likelihood criterion. It is useful to train a classication
+problem with ''n'' classes. The ''input'' given through a ''forward()'' is
+expected to contain //log-probabilities// of each class: ''input'' has to be a
+1D tensor of size ''n''. Obtaining log-probabilities in a neural network is
+easily achieved by adding a [[#nn.LogSoftMax|LogSoftMax]] layer in the last
+layer of your neural network.
+
+This criterion expect a class index (1 to the number of class) as ''target''
+when calling [[#nn.CriterionForward|forward(input, target)]] and
+[[#nn.CriterionBackward|backward(input, target)]].
+
+The loss can be described as:
+<file lua>
+loss(x, class) = forward(x, class) = -x[class]
+</file>
+
+The following is a code fragment showing how to make a gradient step 
+given an input ''x'', a desired output ''y'' (an integer ''1'' to ''n'', 
+in this case ''n'' = ''2'' classes), 
+a network ''mlp'' and a learning rate ''learningRate'':
+<file lua>
+function gradUpdate(mlp,x,y,learningRate)
+  local criterion = nn.ClassNLLCriterion()
+  pred = mlp:forward(x)
+  local err = criterion:forward(pred, y); 
+  mlp:zeroGradParameters();
+  local t = criterion:backward(pred, y);
+  mlp:backward(x, t);
+  mlp:updateParameters(learningRate);
+end
+</file>
+
+=====  MarginCriterion =====
+{{anchor:nn.MarginCriterion}}
+
+<file lua>
+criterion = MarginCriterion()
+</file>
+
+Creates a criterion that optimizes a two-class classification hinge loss (margin-based loss) between input ''x''  (a Tensor of dimension 1) and output ''y'' (which is a scalar, either 1 or -1) :
+
+<file lua>
+loss(x,y) = forward(x,y) = max(0,m- y x).
+</file>
+
+''m'' is the margin, which is by default 1.
+
+<file lua>
+criterion = MarginCriterion(marginValue)
+</file>
+
+sets a different value of ''m''.
+
+
+Example:
+<file lua>
+require "nn"
+require "lab"
+
+function gradUpdate(mlp, x, y, criterion, learningRate)
+  local pred = mlp:forward(x)
+  local err = criterion:forward(pred, y)
+  local gradCriterion = criterion:backward(pred, y)
+  mlp:zeroGradParameters()
+  mlp:backward(x, gradCriterion)
+  mlp:updateParameters(learningRate)
+end
+
+mlp=nn.Sequential()
+mlp:add(nn.Linear(5,1))
+
+x1=lab.rand(5)
+x2=lab.rand(5)
+criterion=nn.MarginCriterion(1)
+
+for i=1,1000 do
+    gradUpdate(mlp,x1,1,criterion,0.01)
+    gradUpdate(mlp,x2,-1,criterion,0.01)
+end
+
+print(mlp:forward(x1))
+print(mlp:forward(x2))
+
+print(criterion:forward(mlp:forward(x1),1))
+print(criterion:forward(mlp:forward(x2),-1))
+</file>
+gives the output:
+<file lua>
+ 1.0043
+[torch.Tensor of dimension 1]
+
+
+-1.0061
+[torch.Tensor of dimension 1]
+
+0
+0
+</file>
+i.e. the mlp successfully separates the two data points such that they both have a margin of 1, and hence a loss of 0.
+
+=====  MSECriterion =====
+{{anchor:nn.MSECriterion}}
+
+<file lua>
+criterion = MSECriterion()
+</file>
+
+Creates a criterion that measures the mean squared error between ''n'' elements in the input ''x'' 
+and output ''y'':
+
+<file lua>
+loss(x,y) = forward(x,y) = 1/n \sum |x_i-y_i|^2 .
+</file>
+
+If ''x'' and ''y'' are ''d''-dimensional Tensors with a total of ''n'' elements,
+the sum operation still operates over all the elements, and divides by ''n''. The two tensors must
+have the same number of elements (but their sizes might be different...)
+
+The division by ''n'' can be avoided if one sets the internal variable ''sizeAverage'' to ''false'':
+<file lua>
+criterion = nn.MSECriterion()
+criterion.sizeAverage = false
+</file>
+
+=====  MultiCriterion =====
+{{anchor:nn.MultiCriterion}}
+
+<file lua>
+criterion = MultiCriterion()
+</file>
+
+This returns a Criterion which is a weighted sum of other Criterion. 
+Criterions are added using the method:
+
+''criterion:add(singleCriterion, weight)''
+
+where ''weight'' is a scalar.
+
+
+=====  HingeEmbeddingCriterion =====
+{{anchor:nn.HingeEmbeddingCriterion}}
+
+<file lua>
+criterion = HingeEmbeddingCriterion()
+</file>
+
+Creates a criterion that measures the loss given  an input
+''x'' which is a 1-dimensional vector and a label ''y'' (1 or -1).
+This is usually used for measuring whether two inputs are similar
+or dissimilar, e.g. using the L1 pairwise distance, 
+and is typically used for
+learning nonlinear embeddings or semi-supervised learning.
+
+<verbatim> 
+loss(x,y) = forward(x,y) = x, if y=1
+= max(0,margin - x), if y=-1
+</verbatim>
+
+The ''margin'' has a default value of 1, or can be set in the constructor:
+<file lua>
+criterion = HingeEmbeddingCriterion(marginValue)
+</file>
+
+Example use:
+<file lua>
+-- imagine we have one network we are interested in, it is called "p1_mlp"
+p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2))
+
+-- But we want to push examples towards or away from each other
+-- so we make another copy of it called p2_mlp
+-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage
+-- that's why we create it again (so that the gradients of the pair don't wipe each other)
+p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2))
+p2_mlp:get(1).weight:set(p1_mlp:get(1).weight)
+p2_mlp:get(1).bias:set(p1_mlp:get(1).bias)
+
+-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp
+prl = nn.ParallelTable()
+prl:add(p1_mlp)
+prl:add(p2_mlp)
+
+-- now we define our top level network that takes this parallel table and computes the pairwise distance betweem
+-- the pair of outputs
+mlp= nn.Sequential()
+mlp:add(prl)
+mlp:add(nn.PairwiseDistance(1))
+
+-- and a criterion for pushing together or pulling apart pairs
+crit=nn.HingeEmbeddingCriterion(1)
+
+-- lets make two example vectors
+x=lab.rand(5)
+y=lab.rand(5)
+
+
+-- Use a typical generic gradient update function
+function gradUpdate(mlp, x, y, criterion, learningRate)
+local pred = mlp:forward(x)
+local err = criterion:forward(pred, y)
+local gradCriterion = criterion:backward(pred, y)
+mlp:zeroGradParameters()
+mlp:backward(x, gradCriterion)
+mlp:updateParameters(learningRate)
+end
+
+-- push the pair x and y together, notice how then the distance between them given
+-- by  print(mlp:forward({x,y})[1]) gets smaller
+for i=1,10 do
+gradUpdate(mlp,{x,y},1,crit,0.01)
+print(mlp:forward({x,y})[1])
+end
+
+
+-- pull apart the pair x and y, notice how then the distance between them given
+-- by  print(mlp:forward({x,y})[1]) gets larger
+
+for i=1,10 do
+gradUpdate(mlp,{x,y},-1,crit,0.01)
+print(mlp:forward({x,y})[1])
+end
+
+</file>
+
+=====  L1HingeEmbeddingCriterion =====
+{{anchor:nn.L1HingeEmbeddingCriterion}}
+
+<file lua>
+criterion = L1HingeEmbeddingCriterion(margin)
+</file>
+
+Creates a criterion that measures the loss given  an input
+''x'' = ''{x1,x2}'', a table of two tensors, and a label ''y'' (1 or -1):
+This is used for measuring whether two inputs are similar
+or dissimilar, using the L1 distance, and is typically used for
+learning nonlinear embeddings or semi-supervised learning.
+
+<verbatim> 
+loss(x,y) = forward(x,y) = ||x1-x2||_1, if y=1
+= max(0,margin - ||x1-x2||_1), if y=-1
+</verbatim>
+
+The ''margin'' has a default value of 1, or can be set in the constructor:
+<file lua>
+criterion = L1HingeEmbeddingCriterion(marginValue)
+</file>
+
+=====  CosineEmbeddingCriterion =====
+{{anchor:nn.CosineEmbeddingCriterion}}
+
+<file lua>
+criterion = nn.CosineEmbeddingCriterion(margin)
+</file>
+
+Creates a criterion that measures the loss given  an input
+''x'' = ''{x1,x2}'', a table of two tensors, and a label ''y'' (1 or -1):
+This is used for measuring whether two inputs are similar
+or dissimilar, using the cosine distance, and is typically used for
+learning nonlinear embeddings or semi-supervised learning.
+
+''margin'' should be a number from -1 to 1, 0 to 0.5 is suggested.
+Forward and Backward have to be used alternately. If ''margin'' is missing, the default value is 0.
+
+The loss function is:
+<verbatim> 
+loss(x,y) = forward(x,y) = 1-cos(x1, x2), if y=1
+= max(0,cos(x1, x2)-margin), if y=-1
+</verbatim>
+
+=====  MarginRankingCriterion =====
+{{anchor:nn.MarginRankingCriterion}}
+
+<file lua>
+criterion = nn.MarginRankingCriterion(margin)
+</file>
+
+Creates a criterion that measures the loss given  an input
+''x'' = ''{x1,x2}'', a table of two Tensors of size 1 (they contain only scalars),
+and a label ''y'' (1 or -1):
+
+If ''y'' = ''1'' then it assumed the first input should be ranked higher (have a larger value) 
+than the second input, and vice-versa for ''y'' = ''-1''.
+
+The loss function is:
+<verbatim> 
+loss(x,y) = forward(x,y) = max(0,-y*(x[1]-x[2])+margin)
+</verbatim>
+
+Example:
+<file lua>
+
+p1_mlp= nn.Linear(5,2)
+p2_mlp= p1_mlp:clone('weight','bias')
+
+prl=nn.ParallelTable()
+prl:add(p1_mlp)
+prl:add(p2_mlp)
+  
+mlp1=nn.Sequential()
+mlp1:add(prl)
+mlp1:add(nn.DotProduct())
+ 
+mlp2=mlp1:clone('weight','bias')
+
+mlpa=nn.Sequential()
+prla=nn.ParallelTable()
+prla:add(mlp1)
+prla:add(mlp2)
+mlpa:add(prla)
+
+crit=nn.MarginRankingCriterion(0.1)
+
+x=lab.randn(5)
+y=lab.randn(5)
+z=lab.randn(5)
+
+
+-- Use a typical generic gradient update function
+function gradUpdate(mlp, x, y, criterion, learningRate)
+ local pred = mlp:forward(x)
+ local err = criterion:forward(pred, y)
+ local gradCriterion = criterion:backward(pred, y)
+ mlp:zeroGradParameters()
+ mlp:backward(x, gradCriterion)
+ mlp:updateParameters(learningRate)
+end
+
+for i=1,100 do
+ gradUpdate(mlpa,{{x,y},{x,z}},1,crit,0.01)
+ if true then 
+      o1=mlp1:forward{x,y}[1]; 
+      o2=mlp2:forward{x,z}[1]; 
+      o=crit:forward(mlpa:forward{{x,y},{x,z}},1)
+      print(o1,o2,o)
+  end
+end
+
+print "--"
+
+for i=1,100 do
+ gradUpdate(mlpa,{{x,y},{x,z}},-1,crit,0.01)
+ if true then 
+      o1=mlp1:forward{x,y}[1]; 
+      o2=mlp2:forward{x,z}[1]; 
+      o=crit:forward(mlpa:forward{{x,y},{x,z}},-1)
+      print(o1,o2,o)
+  end
+end
+</file>
+
+======  Training a neural network ======
+{{anchor:nn.traningneuralnet.dok}}
+
+Training a neural network is easy with a [[#nn.DoItYourself|simple ''for'' loop]].
+While doing your own loop provides great flexibility, you might
+want sometimes a quick way of training neural
+networks. [[#nn.StochasticGradient|StochasticGradient]], a simple class
+which does the job for you is provided as standard.
+
+=====  StochasticGradient =====
+{{anchor:nn.StochasticGradient.dok}}
+
+''StochasticGradient'' is a high-level class for training [[#nn.Module|neural networks]], using a stochastic gradient
+algorithm. This class is [[..:torch:file#torch.file.serialization|serializable]].
+
+====  StochasticGradient(module, criterion) ====
+{{anchor:nn.StochasticGradient}}
+
+Create a ''StochasticGradient'' class, using the given [[#nn.Module|Module]] and [[#nn.Criterion|Criterion]].
+The class contains [[#nn.StochasticGradientParameters|several parameters]] you might want to set after initialization.
+
+====  train(dataset) ====
+{{anchor:nn.StochasticGradientTrain}}
+
+Train the module and criterion given in the
+[[#nn.StochasticGradient|constructor]] over ''dataset'', using the
+internal [[#nn.StochasticGradientParameters|parameters]].
+
+StochasticGradient expect as a ''dataset'' an object which implements the operator
+''dataset[index]'' and implements the method ''dataset:size()''. The ''size()'' methods
+returns the number of examples and ''dataset[i]'' has to return the i-th example.
+
+An ''example'' has to be an object which implements the operator
+''example[field]'', where ''field'' might take the value ''1'' (input features)
+or ''2'' (corresponding label which will be given to the criterion). 
+The input is usually a Tensor (except if you use special kind of gradient modules,
+like [[#nn.TableLayers|table layers]]). The label type depends of the criterion.
+For example, the [[#nn.MSECriterion|MSECriterion]] expects a Tensor, but the
+[[#nn.ClassNLLCriterion|ClassNLLCriterion]] except a integer number (the class).
+
+Such a dataset is easily constructed by using Lua tables, but it could any ''C'' object
+for example, as long as required operators/methods are implemented. 
+[[#nn.DoItStochasticGradient|See an example]].
+
+====  Parameters ====
+{{anchor:nn.StochasticGradientParameters}}
+
+''StochasticGradient'' has several field which have an impact on a call to [[#nn.StochasticGradientTrain|train()]].
+
+  * ''learningRate'': This is the learning rate used during training. The update of the parameters will be ''parameters = parameters - learningRate * parameters_gradient''. Default value is ''0.01''.
+  * ''learningRateDecay'': The learning rate decay. If non-zero, the learning rate (note: the field learningRate will not change value) will be computed after each iteration (pass over the dataset) with: ''current_learning_rate =learningRate / (1 + iteration * learningRateDecay)''
+  * ''maxIteration'': The maximum number of iteration (passes over the dataset). Default is ''25''.
+  * ''shuffleIndices'': Boolean which says if the examples will be randomly sampled or not. Default is ''true''. If ''false'', the examples will be taken in the order of the dataset.
+  * ''hookExample'': A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes ''(self, example)'' as parameters. Default is ''nil''.
+  * ''hookIteration'': A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes ''(self, iteration)'' as parameters. Default is ''nil''.
+
+=====  Example of training using StochasticGradient =====
+{{anchor:nn.DoItStochasticGradient}}
+
+We show an example here on a classical XOR problem.
+
+**Dataset**
+
+We first need to create a dataset, following the conventions described in
+[[#nn.StochasticGradientTrain|StochasticGradient]].
+<file lua>
+require "lab"
+dataset={};
+function dataset:size() return 100 end -- 100 examples
+for i=1,dataset:size() do 
+  local input = lab.randn(2);     -- normally distributed example in 2d
+  local output = torch.Tensor(1);
+  if input[1]*input[2]>0 then     -- calculate label for XOR function
+    output[1] = -1;
+  else
+    output[1] = 1
+  end
+  dataset[i] = {input, output}
+end
+</file>
+
+**Neural Network**
+
+We create a simple neural network with one hidden layer.
+<file lua>
+require "nn"
+mlp = nn.Sequential();  -- make a multi-layer perceptron
+inputs = 2; outputs = 1; HUs = 20; -- parameters
+mlp:add(nn.Linear(inputs, HUs))
+mlp:add(nn.Tanh())
+mlp:add(nn.Linear(HUs, outputs))
+</file>
+
+**Training**
+
+We choose the Mean Squared Error criterion and train the beast.
+<file lua>
+criterion = nn.MSECriterion()  
+trainer = nn.StochasticGradient(mlp, criterion)
+trainer.learningRate = 0.01
+trainer:train(dataset)
+</file>
+
+**Test the network**
+
+<file lua>
+x = torch.Tensor(2)
+x[1] =  0.5; x[2] =  0.5; print(mlp:forward(x))
+x[1] =  0.5; x[2] = -0.5; print(mlp:forward(x))
+x[1] = -0.5; x[2] =  0.5; print(mlp:forward(x))
+x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))
+</file>
+
+You should see something like:
+<file lua>
+> x = torch.Tensor(2)
+> x[1] =  0.5; x[2] =  0.5; print(mlp:forward(x))
+
+-0.3490
+[torch.Tensor of dimension 1]
+
+> x[1] =  0.5; x[2] = -0.5; print(mlp:forward(x))
+
+ 1.0561
+[torch.Tensor of dimension 1]
+
+> x[1] = -0.5; x[2] =  0.5; print(mlp:forward(x))
+
+ 0.8640
+[torch.Tensor of dimension 1]
+
+> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))
+
+-0.2941
+[torch.Tensor of dimension 1]
+</file>
+
+=====  Example of manual training of a neural network =====
+{{anchor:nn.DoItYourself}}
+
+We show an example here on a classical XOR problem.
+
+**Neural Network**
+
+We create a simple neural network with one hidden layer.
+<file lua>
+require "nn"
+mlp = nn.Sequential();  -- make a multi-layer perceptron
+inputs = 2; outputs = 1; HUs = 20; -- parameters
+mlp:add(nn.Linear(inputs, HUs))
+mlp:add(nn.Tanh())
+mlp:add(nn.Linear(HUs, outputs))
+</file>
+
+**Loss function**
+
+We choose the Mean Squared Error criterion.
+<file lua>
+criterion = nn.MSECriterion()  
+</file>
+
+**Training**
+
+We create data //on the fly// and feed it to the neural network.
+
+<file lua>
+require "lab"
+for i = 1,2500 do
+  -- random sample
+  local input= lab.randn(2);     -- normally distributed example in 2d
+  local output= torch.Tensor(1);
+  if input[1]*input[2] > 0 then  -- calculate label for XOR function
+    output[1] = -1
+  else
+    output[1] = 1
+  end
+
+  -- feed it to the neural network and the criterion
+  criterion:forward(mlp:forward(input), output)
+
+  -- train over this example in 3 steps
+  -- (1) zero the accumulation of the gradients
+  mlp:zeroGradParameters()
+  -- (2) accumulate gradients
+  mlp:backward(input, criterion:backward(mlp.output, output))
+  -- (3) update parameters with a 0.01 learning rate
+  mlp:updateParameters(0.01)
+end
+</file>
+
+**Test the network**
+
+<file lua>
+x = torch.Tensor(2)
+x[1] =  0.5; x[2] =  0.5; print(mlp:forward(x))
+x[1] =  0.5; x[2] = -0.5; print(mlp:forward(x))
+x[1] = -0.5; x[2] =  0.5; print(mlp:forward(x))
+x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))
+</file>
+
+You should see something like:
+<file lua>
+> x = torch.Tensor(2)
+> x[1] =  0.5; x[2] =  0.5; print(mlp:forward(x))
+
+-0.6140
+[torch.Tensor of dimension 1]
+
+> x[1] =  0.5; x[2] = -0.5; print(mlp:forward(x))
+
+ 0.8878
+[torch.Tensor of dimension 1]
+
+> x[1] = -0.5; x[2] =  0.5; print(mlp:forward(x))
+
+ 0.8548
+[torch.Tensor of dimension 1]
+
+> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x))
+
+-0.5498
+[torch.Tensor of dimension 1]
+</file>