diff options
author | Ronan Collobert <ronan@collobert.com> | 2014-02-07 19:33:58 +0400 |
---|---|---|
committer | Ronan Collobert <ronan@collobert.com> | 2014-02-07 19:33:58 +0400 |
commit | 2425c587886bf1519846d999983f57611efbafb6 (patch) | |
tree | 4b3048fcab943deca04ba765eed8de8ad965ecf8 /README.md | |
parent | 512626efb2d926d58f8ab5a57fb88d7823d167b5 (diff) |
README.md into main directory
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 3110 |
1 files changed, 3110 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..7af3ed6 --- /dev/null +++ b/README.md @@ -0,0 +1,3110 @@ +<a name="nn.dok"/> +# Neural Network Package # + +This package provides an easy way to build and train simple or complex +neural networks. + +Each module of a network is composed of [Modules](#nn.Modules) and there +are several sub-classes of `Module` available: container classes like +[Sequential](#nn.Sequential), [Parallel](#nn.Parallel) and +[Concat](#nn.Concat) , which can contain simple layers like +[Linear](#nn.Linear), [Mean](#nn.Mean), [Max](#nn.Max) and +[Reshape](#nn.Reshape), as well as convolutional layers, and transfer +functions like [Tanh](#nn.Tanh). + +Loss functions are implemented as sub-classes of +[Criterion](#nn.Criterions). They are helpful to train neural network on +classical tasks. Common criterions are the Mean Squared Error +criterion implemented in [MSECriterion](#nn.MSECriterion) and the +cross-entropy criterion implemented in +[ClassNLLCriterion](#nn.ClassNLLCriterion). + +Finally, the [StochasticGradient](#nn.StochasticGradient) class provides a +high level way to train the neural network of choice, even though it is +easy with a simple for loop to [train a neural network yourself](#nn.DoItYourself). + +For those who want to implement their own modules, we suggest using +the `nn.Jacobian` class for testing the derivatives of their class, +together with the [torch.Tester](..:torch:tester) class. The sources +of `nn` package contains sufficiently many examples of such tests. + + +<a name="nn.overview.dok"/> +# Detailed Overview of the Neural Network Package # + +__Module__ + +A neural network is called a [Module](#nn.Module) (or simply +_module_ in this documentation) in Torch. `Module` is an abstract +class which defines four main methods: + * [forward(input)](#nn.Module.forward) which computes the output of the module given the `input` [Tensor](..:torch:tensor). + * [backward(input, gradOutput)](#nn.Module.backward) which computes the gradients of the module with respect to its own parameters, and its own inputs. + * [zeroGradParameters()](#nn.Module.zeroGradParameters) which zeroes the gradient with respect to the parameters of the module. + * [updateParameters(learningRate)](#nn.Module.updateParameters) which updates the parameters after one has computed the gradients with `backward()` + +It also declares two members: + * [output](#nn.Module.output) which is the output returned by `forward()`. + * [gradInput](#nn.Module.gradInput) which contains the gradients with respect to the input of the module, computed in a `backward()`. + +Two other perhaps less used but handy methods are also defined: + * [share(mlp,s1,s2,...,sn)](#nn.Module.share) which makes this module share the parameters s1,..sn of the module `mlp`. This is useful if you want to have modules that share the same weights. + * [clone(...)](#nn.Module.clone) which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any). + +Some important remarks: + * `output` contains only valid values after a [forward(input)](#nn.Module.forward). + * `gradInput` contains only valid values after a [backward(input, gradOutput)](#nn.Module.backward). + * [backward(input, gradOutput)](#nn.Module.backward) uses certain computations obtained during [forward(input)](#nn.Module.forward). You _must_ call `forward()` before calling a `backward()`, on the _same_ `input`, or your gradients are going to be incorrect! + + +__Plug and play__ + +Building a simple neural network can be achieved by constructing an available layer. +A linear neural network (perceptron!) is built only in one line: +```lua +mlp = nn.Linear(10,1) -- perceptron with 10 inputs +``` + +More complex neural networks are easily built using container classes +[Sequential](#nn.Sequential) and [Concat](#nn.Concat). `Sequential` plugs +layer in a feed-forward fully connected manner. `Concat` concatenates in +one layer several modules: they take the same inputs, and their output is +concatenated. + +Creating a one hidden-layer multi-layer perceptron is thus just as easy as: +```lua +mlp = nn.Sequential() +mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units +mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function +mlp:add( nn.Linear(25, 1) ) -- 1 output +``` + +Of course, `Sequential` and `Concat` can contains other +`Sequential` or `Concat`, allowing you to try the craziest neural +networks you ever dreamt of! See the [[#nn.Modules|complete list of +available modules]]. + +__Training a neural network__ + +Once you built your neural network, you have to choose a particular +[Criterion](#nn.Criterions) to train it. A criterion is a class which +describes the cost to be minimized during training. + +You can then train the neural network by using the +[StochasticGradient](#nn.StochasticGradient) class. + +```lua + criterion = nn.MSECriterion() -- Mean Squared Error criterion + trainer = nn.StochasticGradient(mlp, criterion) + trainer:train(dataset) -- train using some examples +``` + +StochasticGradient expect as a `dataset` an object which implements +the operator `dataset[index]` and implements the method +`dataset:size()`. The `size()` methods returns the number of +examples and `dataset[i]` has to return the i-th example. + +An `example` has to be an object which implements the operator +`example[field]`, where `field` might take the value `1` (input +features) or `2` (corresponding label which will be given to the +criterion). The input is usually a Tensor (except if you use special +kind of gradient modules, like [table layers](#nn.TableLayers)). The +label type depends of the criterion. For example, the +[MSECriterion](#nn.MSECriterion) expect a Tensor, but the +[ClassNLLCriterion](#nn.ClassNLLCriterion) except a integer number (the +class). + +Such a dataset is easily constructed by using Lua tables, but it could +any `C` object for example, as long as required operators/methods +are implemented. [See an example](#nn.DoItStochasticGradient). + +`StochasticGradient` being written in `Lua`, it is extremely easy +to cut-and-paste it and create a variant to it adapted to your needs +(if the constraints of `StochasticGradient` do not satisfy you). + +__Low Level Training Of a Neural Network__ + +If you want to program the `StochasticGradient` by hand, you +essentially need to control the use of forwards and backwards through +the network yourself. For example, here is the code fragment one +would need to make a gradient step given an input `x`, a desired +output `y`, a network `mlp` and a given criterion `criterion` +and learning rate `learningRate`: + +```lua +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end +``` +For example, if you wish to use your own criterion you can simple replace +`gradCriterion` with the gradient vector of your criterion of choice. + + +<a name="nn.Modules"/> +# Modules # + +Modules are bricks to build neural networks. A [Module](#nn.Module) is a neural network +by itself, but it can be combined with other networks using [container classes](#nn.Containers) to create +complex neural networks. + +<a name="nn.Module"/> +## Module ## + +`Module` is an abstract class which defines fundamental methods necessary +for a training a neural network. Modules are [serializable](..:torch:file#torch.file.serialization). + +Modules contain two states variables: [output](#nn.ModuleOutput) and +[gradInput](#nn.ModuleGradInput). + +<a name="nn.Module.forward"/> +### [output] forward(input) ### + +Takes an `input` object, and computes the corresponding `output` of the +module. In general `input` and `output` are +[Tensors](..:torch:tensor). However, some special sub-classes +like [table layers](#nn.TableLayers) might expect something else. Please, +refer to each module specification for further information. + +After a `forward()`, the [ouput](#nn.ModuleOutput) state variable should +have been updated to the new value. + +It is not advised to override this function. Instead, one should +implement [updateOutput(input)](#nn.Module.updateOutput) +function. The forward module in the abstract parent class +[Module](#nn.Module) will call `updateOutput(input)`. + +<a name="nn.Module.backward"/> +### [gradInput] backward(input, gradOutput) ### + +Performs a _backpropagation step_ through the module, with respect to the +given `input`. In general this method makes the assumption +[forward(input)](#nn.Module.forward) has been called before, _with the same input_. +This is necessary for optimization reasons. If you do not respect +this rule, `backward()` will compute incorrect gradients. + +In general `input` and `gradOutput` and `gradInput` are +[Tensors](..:torch:tensor). However, some special sub-classes +like [table layers](#nn.TableLayers) might expect something else. Please, +refer to each module specification for further information. + +A _backpropagation step_ consist in computing two kind of gradients +at `input` given `gradOutput` (gradients with respect to the +output of the module). This function simply performs this task using +two function calls: + + - A function call to [updateGradInput(input, gradOutput)](#nn.Module.updateGradInput). + - A function call to [accGradParameters(input,gradOutput)](#nn.Module.accGradParameters). + +It is not advised to override this function call in custom classes. It +is better to override +[updateGradInput(input, gradOutput)](#nn.Module.updateGradInput) and +[accGradParameters(input, gradOutput)](#nn.Module.accGradParameters) +functions. + +<a name="nn.Module.updateOutput"/> +### updateOutput(input) ### + +Computes the output using the current parameter set of the class and +input. This function returns the result which is stored in the +[output](#nn.Module.output) field. + +<a name="nn.Module.updateGradInput"/> +### updateGradInput(input, gradOutput) ### + +Computing the gradient of the module with respect to its own +input. This is returned in `gradInput`. Also, the +[gradInput](#nn.Module.gradInput) state variable is updated +accordingly. + +<a name="nn.Module.accGradParameters"/> +### accGradParameters(input, gradOutput) ### + +Computing the gradient of the module with respect to its +ownparameters. Many modules do not perform this step as they do not +have any parameters. The state variable name for the parameters is +module dependent. The module is expected to _accumulate_ the +gradients with respect to the parameters in some variable. + +Zeroing this accumulation is achieved with +[zeroGradParameters()](#nn.Module.zeroGradParameters) and updating +the parameters according to this accumulation is done with +[updateParameters()](#nn.Module.updateParameters). + +<a name="nn.Module.zeroGradParameters"/> +### zeroGradParameters() ### + +If the module has parameters, this will zero the accumulation of the +gradients with respect to these parameters, accumulated through +[accGradParameters(input, gradOutput)](#nn.Module.accGradParameters) +calls. Otherwise, it does nothing. + +<a name="nn.Module.updateParameters"/> +### updateParameters(learningRate) ### + +If the module has parameters, this will update these parameters, according +to the accumulation of the gradients with respect to these parameters, +accumulated through [backward()](#nn.Module.backward) calls. + +The update is basically: +```lua +parameters = parameters - learningRate * gradients_wrt_parameters +``` +If the module does not have parameters, it does nothing. + +<a name="nn.Module.accUpdateGradParameters"/> +### accUpdateGradParameters(input, gradOutput, learningRate) ### + +This is a convenience module that performs two functions at +once. Calculates and accumulates the gradients with respect to the +weights after mutltiplying with negative of the learning rate +`learningRate`. Performing these two operations at once is more +performance efficient and it might be advantageous in certain +situations. + +Keep in mind that, this function uses a simple trick to achieve its +goal and it might not be valid for a custom module. + +Also note that compared to accGradParameters(), the gradients are not retained +for future use. + +```lua +function Module:accUpdateGradParameters(input, gradOutput, lr) + local gradWeight = self.gradWeight + local gradBias = self.gradBias + self.gradWeight = self.weight + self.gradBias = self.bias + self:accGradParameters(input, gradOutput, -lr) + self.gradWeight = gradWeight + self.gradBias = gradBias +end +``` + +As it can be seen, the gradients are accumulated directly into +weights. This assumption may not be true for a module that computes a +nonlinear operation. + +<a name="nn.Module.share"/> +### share(mlp,s1,s2,...,sn) ### + +This function modifies the parameters of the module named +`s1`,..`sn` (if they exist) so that they are shared with (pointers +to) the parameters with the same names in the given module `mlp`. + +The parameters have to be Tensors. This function is typically used if +you want to have modules that share the same weights or biases. + +Note that this function if called on a [Container](#nn.Containers) +module will share the same parameters for all the contained modules as +well. + +Example: +```lua + +-- make an mlp +mlp1=nn.Sequential(); +mlp1:add(nn.Linear(100,10)); + +-- make a second mlp +mlp2=nn.Sequential(); +mlp2:add(nn.Linear(100,10)); + +-- the second mlp shares the bias of the first +mlp2:share(mlp1,'bias'); + +-- we change the bias of the first +mlp1:get(1).bias[1]=99; + +-- and see that the second one's bias has also changed.. +print(mlp2:get(1).bias[1]) + +``` + + +<a name="nn.Module.clone"/> +### clone(mlp,...) ### + +Creates a deep copy of (i.e. not just a pointer to) the module, +including the current state of its parameters (e.g. weight, biases +etc., if any). + +If arguments are provided to the `clone(...)` function it also calls +[share(...)](#nn.Module.share) with those arguments on the cloned +module after creating it, hence making a deep copy of this module with +some shared parameters. + +Example: +```lua +-- make an mlp +mlp1=nn.Sequential(); +mlp1:add(nn.Linear(100,10)); + +-- make a copy that shares the weights and biases +mlp2=mlp1:clone('weight','bias'); + +-- we change the bias of the first mlp +mlp1:get(1).bias[1]=99; + +-- and see that the second one's bias has also changed.. +print(mlp2:get(1).bias[1]) + +``` + +<a name="nn.Module.type"/> +### type(type) ### + +This function converts all the parameters of a module to the given +`type`. The `type` can be one of the types defined for +[torch.Tensor](..:torch:tensor). + +<a name="nn.Module.float"/> +### float() ### + +Convenience method for calling [module:type('torch.FloatTensor')](#nn.Module.type) + +<a name="nn.Module.double"/> +### double() ### + +Convenience method for calling [module:type('torch.DoubleTensor')](#nn.Module.type) + +<a name="nn.Module.cuda"/> +### cuda() ### + +Convenience method for calling [module:type('torch.CudaTensor')](#nn.Module.type) + +<a name="nn.statevars.dok"/> +### State Variables ### + +These state variables are useful objects if one wants to check the guts of +a `Module`. The object pointer is _never_ supposed to change. However, its +contents (including its size if it is a Tensor) are supposed to change. + +In general state variables are +[Tensors](..:torch:tensor). However, some special sub-classes +like [table layers](#nn.TableLayers) contain something else. Please, +refer to each module specification for further information. + +<a name="nn.Module.output"/> +#### output #### + +This contains the output of the module, computed with the last call of +[forward(input)](#nn.Module.forward). + +<a name="nn.Module.gradInput"/> +#### gradInput #### + +This contains the gradients with respect to the inputs of the module, computed with the last call of +[updateGradInput(input, gradOutput)](#nn.Module.updateGradInput). + +### Parameters and gradients w.r.t parameters ### + +Some modules contain parameters (the ones that we actually want to +train!). The name of these parameters, and gradients w.r.t these parameters +are module dependent. + +<a name="nn.Module.parameters"/> +### [{weights}, {gradWeights}] parameters() ### + +This function should returns two tables. One for the learnable +parameters `{weights}` and another for the gradients of the energy +wrt to the learnable parameters `{gradWeights}`. + +Custom modules should override this function if they use learnable +parameters that are stored in tensors. + +<a name="nn.Module.getParameters"/> +### [flatParameters, flatGradParameters] getParameters() ### + +This function returns two tensors. One for the flattened learnable +parameters `flatParameters` and another for the gradients of the energy +wrt to the learnable parameters `flatGradParameters`. + +Custom modules should not override this function. They should instead override [parameters(...)](#nn.Module.parameters) which is, in turn, called by the present function. + +This function will go over all the weights and gradWeights and make them view into a single tensor (one for weights and one for gradWeights). Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network. + +<a name="nn.Containers"/> +## Containers ## + +<a name="nn.Concat"/> +### Concat ### + +```lua +module = nn.Concat(dim) +``` +Concat concatenates the output of one layer of "parallel" modules along the +provided dimension `dim`: they take the same inputs, and their output is +concatenated. +```lua +mlp=nn.Concat(1); +mlp:add(nn.Linear(5,3)) +mlp:add(nn.Linear(5,7)) +print(mlp:forward(torch.randn(5))) +``` +which gives the output: +```lua + 0.7486 + 0.1349 + 0.7924 +-0.0371 +-0.4794 + 0.3044 +-0.0835 +-0.7928 + 0.7856 +-0.1815 +[torch.Tensor of dimension 10] +``` + + +<a name="nn.Sequential"/> +### Sequential ### + +Sequential provides a means to plug layers together +in a feed-forward fully connected manner. + +E.g. +creating a one hidden-layer multi-layer perceptron is thus just as easy as: +```lua +mlp = nn.Sequential() +mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units +mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function +mlp:add( nn.Linear(25, 1) ) -- 1 output + +print(mlp:forward(torch.randn(10))) +``` +which gives the output: +```lua +-0.1815 +[torch.Tensor of dimension 1] +``` + +<a name="nn.Parallel"/> +### Parallel ### + +`module` = `Parallel(inputDimension,outputDimension)` + +Creates a container module that applies its `ith` child module to the `ith` slice of the input Tensor by using [select](..:torch:tensor#torch.tensor.select) +on dimension `inputDimension`. It concatenates the results of its contained modules together along dimension `outputDimension`. + +Example: +```lua + mlp=nn.Parallel(2,1); -- iterate over dimension 2 of input + mlp:add(nn.Linear(10,3)); -- apply to first slice + mlp:add(nn.Linear(10,2)) -- apply to first second slice + print(mlp:forward(torch.randn(10,2))) +``` +gives the output: +```lua +-0.5300 +-1.1015 + 0.7764 + 0.2819 +-0.6026 +[torch.Tensor of dimension 5] +``` + +A more complicated example: +```lua + +mlp=nn.Sequential(); +c=nn.Parallel(1,2) +for i=1,10 do + local t=nn.Sequential() + t:add(nn.Linear(3,2)) + t:add(nn.Reshape(2,1)) + c:add(t) +end +mlp:add(c) + +pred=mlp:forward(torch.randn(10,3)) +print(pred) + +for i=1,10000 do -- Train for a few iterations + x=torch.randn(10,3); + y=torch.ones(2,10); + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + local err=criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.01); + print(err) +end +``` +<a name="nn.simplelayers.dok"/> +## Simple layers ## + +<a name="nn.Linear"/> +### Linear ### + +`module` = `Linear(inputDimension,outputDimension)` + +Applies a linear transformation to the incoming data, i.e. //y= +Ax+b//. The `input` tensor given in `forward(input)` must be +either a vector (1D tensor) or matrix (2D tensor). If the input is a +matrix, then each row is assumed to be an input sample of given batch. + +You can create a layer in the following way: +```lua + module= nn.Linear(10,5) -- 10 inputs, 5 outputs +``` +Usually this would be added to a network of some kind, e.g.: +```lua + mlp = nn.Sequential(); + mlp:add(module) +``` +The weights and biases (_A_ and _b_) can be viewed with: +```lua + print(module.weight) + print(module.bias) +``` +The gradients for these weights can be seen with: +```lua + print(module.gradWeight) + print(module.gradBias) +``` +As usual with `nn` modules, +applying the linear transformation is performed with: +```lua + x=torch.Tensor(10) -- 10 inputs + y=module:forward(x) +``` + +<a name="nn.SparseLinear"/> +### SparseLinear ### + +`module` = `SparseLinear(inputDimension,outputDimension)` + +Applies a linear transformation to the incoming sparse data, i.e. +_y= Ax+b_. The `input` tensor given in `forward(input)` must +be a sparse vector represented as 2D tensor of the form +torch.Tensor(N, 2) where the pairs represent indices and values. +The SparseLinear layer is useful when the number of input +dimensions is very large and the input data is sparse. + +You can create a sparse linear layer in the following way: + +```lua + module= nn.SparseLinear(10000,2) -- 10000 inputs, 2 outputs +``` +The sparse linear module may be used as part of a larger network, +and apart from the form of the input, +[SparseLinear](#nn.SparseLinear) +operates in exactly the same way as the [Linear](#nn.Linear) layer. + +A sparse input vector may be created as so.. +```lua + + x=torch.Tensor({{1, 0.1},{2, 0.3},{10, 0.3},{31, 0.2}}) + + print(x) + + 1.0000 0.1000 + 2.0000 0.3000 + 10.0000 0.3000 + 31.0000 0.2000 +[torch.Tensor of dimension 4x2] + +``` + +The first column contains indices, the second column contains +values in a a vector where all other elements are zeros. The +indices should not exceed the stated dimesions of the input to the +layer (10000 in the example). + +<a name="nn.Abs"/> +### Abs ### + +`module` = `Abs()` + +`output = abs(input)`. + +```lua +m=nn.Abs() +ii=torch.linspace(-5,5) +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` + +![](doc/abs.png) + +### Add ### +![](anchor:nn.Add) + +`module` = `Add(inputDimension,scalar)` + +Applies a bias term to the incoming data, i.e. +_y_i= x_i + b_i, or if _scalar=true_ then uses a single bias term, +_y_i= x_i + b. + +Example: +```lua +y=torch.Tensor(5); +mlp=nn.Sequential() +mlp:add(nn.Add(5)) + +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) + return err +end + +for i=1,10000 do + x=torch.rand(5) + y:copy(x); + for i=1,5 do y[i]=y[i]+i; end + err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) +end +print(mlp:get(1).bias) +``` +gives the output: +```lua + 1.0000 + 2.0000 + 3.0000 + 4.0000 + 5.0000 +[torch.Tensor of dimension 5] +``` +i.e. the network successfully learns the input _x_ has been shifted +to produce the output _y_. + + +<a name="nn.Mul"/> +### Mul ### + +`module` = `Mul(inputDimension)` + +Applies a _single_ scaling factor to the incoming data, i.e. +_y= w x_, where _w_ is a scalar. + +Example: +```lua +y=torch.Tensor(5); +mlp=nn.Sequential() +mlp:add(nn.Mul(5)) + +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(learningRate); + return err +end + + +for i=1,10000 do + x=torch.rand(5) + y:copy(x); y:mul(math.pi); + err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) +end +print(mlp:get(1).weight) +``` +gives the output: +```lua + 3.1416 +[torch.Tensor of dimension 1] +``` +i.e. the network successfully learns the input `x` has been scaled by +pi. + +### CMul ### +![](anchor:nn.CMul) + +`module` = `CMul(inputDimension)` + +Applies a component-wise multiplication to the incoming data, i.e. +`y_i` = `w_i` =x_i=. + +Example: +```lua +mlp=nn.Sequential() +mlp:add(nn.CMul(5)) + +y=torch.Tensor(5); +sc=torch.Tensor(5); for i=1,5 do sc[i]=i; end -- scale input with this + +function gradUpdate(mlp,x,y,criterion,learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(learningRate); + return err +end + +for i=1,10000 do + x=torch.rand(5) + y:copy(x); y:cmul(sc); + err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) +end +print(mlp:get(1).weight) +``` +gives the output: +```lua + 1.0000 + 2.0000 + 3.0000 + 4.0000 + 5.0000 +[torch.Tensor of dimension 5] +``` +i.e. the network successfully learns the input _x_ has been scaled by +those scaling factors to produce the output _y_. + + +<a name="nn.Max"/> +### Max ### + +`module` = `Max(dimension)` + +Applies a max operation over dimension `dimension`. +Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` +then an `nxq` matrix would be output. + + +<a name="nn.Min"/> +### Min ### + +`module` = `Min(dimension)` + +Applies a min operation over dimension `dimension`. +Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` +then an `nxq` matrix would be output. + + +<a name="nn.Mean"/> +### Mean ### + +`module` = `Mean(dimension)` + +Applies a mean operation over dimension `dimension`. +Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` +then an `nxq` matrix would be output. + +<a name="nn.Sum"/> +### Sum ### + +`module` = `Sum(dimension)` + +Applies a sum operation over dimension `dimension`. +Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` +then an `nxq` matrix would be output. + + +<a name="nn.Euclidean"/> +### Euclidean ### + +`module` = `Euclidean(inputDimension,outputDimension)` + +Outputs the Euclidean distance of the input to `outputDimension` centers, +i.e. this layer has the weights `c_i`, `i` = `1`,..,`outputDimension`, where +`c_i` are vectors of dimension `inputDimension`. Output dimension `j` is +`|| c_j - x ||`, where `x` is the input. + +<a name="nn.WeightedEuclidean"/> +### WeightedEuclidean ### + +`module` = `WeightedEuclidean(inputDimension,outputDimension)` + +This module is similar to [Euclidian](#nn.Euclidian), but +additionally learns a separate diagonal covariance matrix across the +features of the input space for each center. + + +<a name="nn.Copy"/> +### Copy ### + +`module` = `Copy(inputType,outputType)` + +This layer copies the input to output with type casting from input +type from `inputType` to `outputType`. + + +<a name="nn.Narrow"/> +### Narrow ### + +`module` = `Narrow(dimension, offset, length)` + +Narrow is application of +[narrow](..:torch:tensor:#torch.Tensor.narrow) operation in a +module. + +<a name="nn.Replicate"/> +### Replicate ### + +`module` = `Replicate(nFeature)` + +This class creates an output where the input is replicated +`nFeature` times along its first dimension. There is no memory +allocation or memory copy in this module. It sets the +[stride](..:torch:tensor#torch.Tensor.stride) along the first +dimension to zero. + +```lua +torch> x=torch.linspace(1,5,5) +torch> =x + 1 + 2 + 3 + 4 + 5 +[torch.DoubleTensor of dimension 5] + +torch> m=nn.Replicate(3) +torch> o=m:forward(x) +torch> =o + 1 2 3 4 5 + 1 2 3 4 5 + 1 2 3 4 5 +[torch.DoubleTensor of dimension 3x5] + +torch> x:fill(13) +torch> =x + 13 + 13 + 13 + 13 + 13 +[torch.DoubleTensor of dimension 5] + +torch> =o + 13 13 13 13 13 + 13 13 13 13 13 + 13 13 13 13 13 +[torch.DoubleTensor of dimension 3x5] + +``` + + +<a name="nn.Reshape"/> +### Reshape ### + +`module` = `Reshape(dimension1, dimension2, ..)` + +Reshapes an `nxpxqx..` Tensor into a `dimension1xdimension2x...` Tensor, +taking the elements column-wise. + +Example: +```lua +> x=torch.Tensor(4,4) +> for i=1,4 do +> for j=1,4 do +> x[i][j]=(i-1)*4+j; +> end +> end +> print(x) + + 1 2 3 4 + 5 6 7 8 + 9 10 11 12 + 13 14 15 16 +[torch.Tensor of dimension 4x4] + +> print(nn.Reshape(2,8):forward(x)) + + 1 9 2 10 3 11 4 12 + 5 13 6 14 7 15 8 16 +[torch.Tensor of dimension 2x8] + +> print(nn.Reshape(8,2):forward(x)) + + 1 3 + 5 7 + 9 11 + 13 15 + 2 4 + 6 8 + 10 12 + 14 16 +[torch.Tensor of dimension 8x2] + +> print(nn.Reshape(16):forward(x)) + + 1 + 5 + 9 + 13 + 2 + 6 + 10 + 14 + 3 + 7 + 11 + 15 + 4 + 8 + 12 + 16 +[torch.Tensor of dimension 16] + + +``` + + +<a name="nn.Select"/> +### Select ### + +Selects a dimension and index of a `nxpxqx..` Tensor. + +Example: +```lua +mlp=nn.Sequential(); +mlp:add(nn.Select(1,3)) + +x=torch.randn(10,5) +print(x) +print(mlp:forward(x)) +``` +gives the output: +```lua + 0.9720 -0.0836 0.0831 -0.2059 -0.0871 + 0.8750 -2.0432 -0.1295 -2.3932 0.8168 + 0.0369 1.1633 0.6483 1.2862 0.6596 + 0.1667 -0.5704 -0.7303 0.3697 -2.2941 + 0.4794 2.0636 0.3502 0.3560 -0.5500 +-0.1898 -1.1547 0.1145 -1.1399 0.1711 +-1.5130 1.4445 0.2356 -0.5393 -0.6222 +-0.6587 0.4314 1.1916 -1.4509 1.9400 + 0.2733 1.0911 0.7667 0.4002 0.1646 + 0.5804 -0.5333 1.1621 1.5683 -0.1978 +[torch.Tensor of dimension 10x5] + + 0.0369 + 1.1633 + 0.6483 + 1.2862 + 0.6596 +[torch.Tensor of dimension 5] +``` + +This can be used in conjunction with [Concat](#nn.Concat) +to emulate the behavior +of [Parallel](#nn.Parallel), or to select various parts of an input Tensor to +perform operations on. Here is a fairly complicated example: +```lua + +mlp=nn.Sequential(); +c=nn.Concat(2) +for i=1,10 do + local t=nn.Sequential() + t:add(nn.Select(1,i)) + t:add(nn.Linear(3,2)) + t:add(nn.Reshape(2,1)) + c:add(t) +end +mlp:add(c) + +pred=mlp:forward(torch.randn(10,3)) +print(pred) + +for i=1,10000 do -- Train for a few iterations + x=torch.randn(10,3); + y=torch.ones(2,10); + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + err=criterion:forward(pred,y) + gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.01); + print(err) +end +``` + +<a name="nn.Exp"/> +### Exp ### + +Applies the `exp` function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. +```lua +ii=torch.linspace(-2,2) +m=nn.Exp() +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/exp.png) + + +<a name="nn.Square"/> +### Square ### + +Takes the square of each element. + +```lua +ii=torch.linspace(-5,5) +m=nn.Square() +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/square.png) + +<a name="nn.Sqrt"/> +### Sqrt ### + +Takes the square root of each element. + +```lua +ii=torch.linspace(0,5) +m=nn.Sqrt() +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/sqrt.png) + +<a name="nn.Power"/> +### Power ### + +`module` = `Power(p)` + +Raises each element to its `pth` power. + +```lua +ii=torch.linspace(0,2) +m=nn.Power(1.25) +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/power.png) + +<a name="nn.transfer.dok"/> +## Transfer Function Layers ## + +<a name="nn.HardTanh"/> +### HardTanh ### + +Applies the `HardTanh` function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. + +`HardTanh` is defined as: + + * `f(x)` = `1, if x >` `1,` + * `f(x)` = `-1, if x <` `-1,` + * `f(x)` = `x,` `otherwise.` + +```lua +ii=torch.linspace(-2,2) +m=nn.HardTanh() +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/htanh.png) + + +<a name="nn.HardShrink"/> +### HardShrink ### + +`module = nn.HardShrink(lambda)` + +Applies the hard shrinkage function element-wise to the input +[Tensor](..:torch:Tensor). The output is the same size as the input. + +`HardShrinkage` operator is defined as: + + * `f(x) = x, if x > lambda` + * `f(x) = -x, if < -lambda` + * `f(x) = 0, otherwise` + +```lua +ii=torch.linspace(-2,2) +m=nn.HardShrink(0.85) +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/hshrink.png) + +<a name="nn.SoftShrink"/> +### SoftShrink ### + +`module = nn.SoftShrink(lambda)` + +Applies the hard shrinkage function element-wise to the input +[Tensor](..:torch:Tensor). The output is the same size as the input. + +`HardShrinkage` operator is defined as: + + * `f(x) = x-lambda, if x > lambda` + * `f(x) = -x+lambda, if < -lambda` + * `f(x) = 0, otherwise` + +```lua +ii=torch.linspace(-2,2) +m=nn.SoftShrink(0.85) +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/sshrink.png) + + +<a name="nn.SoftMax"/> +### SoftMax ### + +Applies the `Softmax` function to an n-dimensional input Tensor, +rescaling them so that the elements of the n-dimensional output Tensor +lie in the range (0,1) and sum to 1. + +`Softmax` is defined as `f_i(x)` = `exp(x_i-shift) / sum_j exp(x_j-shift)`, +where `shift` = `max_i x_i`. + + +```lua +ii=torch.exp(torch.abs(torch.randn(10))) +m=nn.SoftMax() +oo=m:forward(ii) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'}) +gnuplot.grid(true) +``` +![](doc/softmax.png) + +<a name="nn.SoftMin"/> +### SoftMin ### + +Applies the `Softmin` function to an n-dimensional input Tensor, +rescaling them so that the elements of the n-dimensional output Tensor +lie in the range (0,1) and sum to 1. + +`Softmin` is defined as `f_i(x)` = `exp(-x_i-shift) / sum_j exp(-x_j-shift)`, +where `shift` = `max_i x_i`. + + +```lua +ii=torch.exp(torch.abs(torch.randn(10))) +m=nn.SoftMin() +oo=m:forward(ii) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'}) +gnuplot.grid(true) +``` +![](doc/softmin.png) + +<a name="nn.SoftPlus"/> +### SoftPlus ### + +Applies the `SoftPlus` function to an n-dimensioanl input Tensor. +Can be used to constrain the output of a machine to always be positive. + +`SoftPlus` is defined as `f_i(x)` = `log(1 + exp(x_i)))`. + +```lua +ii=torch.randn(10) +m=nn.SoftPlus() +oo=m:forward(ii) +go=torch.ones(10) +gi=m:backward(ii,go) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/softplus.png) + +<a name="nn.SoftSign"/> +### SoftSign ### + +Applies the `SoftSign` function to an n-dimensioanl input Tensor. + +`SoftSign` is defined as `f_i(x) = x_i / (1+|x_i|)` + +```lua +ii=torch.linspace(-5,5) +m=nn.SoftSign() +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/softsign.png) + +<a name="nn.LogSigmoid"/> +### LogSigmoid ### + +Applies the `LogSigmoid` function to an n-dimensional input Tensor. + +`LogSigmoid` is defined as `f_i(x)` = `log(1/(1+ exp(-x_i)))`. + + +```lua +ii=torch.randn(10) +m=nn.LogSigmoid() +oo=m:forward(ii) +go=torch.ones(10) +gi=m:backward(ii,go) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/logsigmoid.png) + + +<a name="nn.LogSoftMax"/> +### LogSoftMax ### + +Applies the `LogSoftmax` function to an n-dimensional input Tensor. + +`LogSoftmax` is defined as `f_i(x)` = `log(1/a exp(x_i))`, +where `a` = `sum_j exp(x_j)`. + +```lua +ii=torch.randn(10) +m=nn.LogSoftMax() +oo=m:forward(ii) +go=torch.ones(10) +gi=m:backward(ii,go) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/logsoftmax.png) + +<a name="nn.Sigmoid"/> +### Sigmoid ### + +Applies the `Sigmoid` function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. + +`Sigmoid` is defined as `f(x)` = `1/(1+exp(-x))`. + +```lua +ii=torch.linspace(-5,5) +m=nn.Sigmoid() +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/sigmoid.png) + +<a name="nn.Tanh"/> +### Tanh ### + +Applies the `Tanh` function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. + +```lua +ii=torch.linspace(-3,3) +m=nn.Tanh() +oo=m:forward(ii) +go=torch.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +``` +![](doc/tanh.png) + +<a name="nn.convlayers.dok"/> +## Convolutional layers ## + +SpatialConvolution and SpatialSubsampling apply to inputs with +two-dimensional relationships (e.g. images). TemporalConvolution and +TemporalSubsampling apply to sequences with a one-dimensional +relationship (e.g. strings of some kind). + +For spatial convolutional layers, the input is supposed to be 3D. The +first dimension is the number of features, the last two dimenstions +are spatial. + +<a name="nn.SpatialConvolution"/> +### SpatialConvolution ### + +```lua +module = nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH]) +``` + +Applies a 2D convolution over an input image composed of several input planes. The `input` tensor in +`forward(input)` is expected to be a 3D tensor (`nInputPlane x height x width`). + +The parameters are the following: + * `nInputPlane`: The number of expected input planes in the image given into `forward()`. + * `nOutputPlane`: The number of output planes the convolution layer will produce. + * `kW`: The kernel width of the convolution + * `kH`: The kernel height of the convolution + * `dW`: The step of the convolution in the width dimension. Default is `1`. + * `dH`: The step of the convolution in the height dimension. Default is `1`. + +Note that depending of the size of your kernel, several (of the last) +columns or rows of the input image might be lost. It is up to the user to +add proper padding in images. + +If the input image is a 3D tensor `nInputPlane x height x width`, the output image size +will be `nOutputPlane x owidth x oheight` where +```lua +owidth = (width - kW) / dW + 1 +oheight = (height - kH) / dH + 1 . +``` + +The parameters of the convolution can be found in `self.weight` (Tensor of +size `nOutputPlane x nInputPlane x kH x kW`) and `self.bias` (Tensor of +size `nOutputPlane`). The corresponding gradients can be found in +`self.gradWeight` and `self.gradBias`. + +The output value of the layer can be precisely described as: +```lua +output[i][j][k] = bias[k] + + sum_l sum_{s=1}^kW sum_{t=1}^kH weight[s][t][l][k] + * input[dW*(i-1)+s)][dH*(j-1)+t][l] +``` + +<a name="nn.VolumetricConvolution"/> +### VolumetricConvolution ### + +```lua +module = nn.VolumetricConvolution(nInputPlane, nOutputPlane, kT, kW, kH [, dT, dW, dH]) +``` + +Applies a 3D convolution over an input image composed of several input planes. The `input` tensor in +`forward(input)` is expected to be a 4D tensor (`nInputPlane x time x height x width`). + +The parameters are the following: + * `nInputPlane`: The number of expected input planes in the image given into `forward()`. + * `nOutputPlane`: The number of output planes the convolution layer will produce. + * `kT`: The kernel size of the convolution in time + * `kW`: The kernel width of the convolution + * `kH`: The kernel height of the convolution + * `dT`: The step of the convolution in the time dimension. Default is `1`. + * `dW`: The step of the convolution in the width dimension. Default is `1`. + * `dH`: The step of the convolution in the height dimension. Default is `1`. + +Note that depending of the size of your kernel, several (of the last) +columns or rows of the input image might be lost. It is up to the user to +add proper padding in images. + +If the input image is a 4D tensor `nInputPlane x time x height x width`, the output image size +will be `nOutputPlane x otime x owidth x oheight` where +```lua +otime = (time - kT) / dT + 1 +owidth = (width - kW) / dW + 1 +oheight = (height - kH) / dH + 1 . +``` + +The parameters of the convolution can be found in `self.weight` (Tensor of +size `nOutputPlane x nInputPlane x kT x kH x kW`) and `self.bias` (Tensor of +size `nOutputPlane`). The corresponding gradients can be found in +`self.gradWeight` and `self.gradBias`. + +<a name="nn.SpatialConvolutionMap"/> +### SpatialConvolutionMap ### + +```lua +module = nn.SpatialConvolutionMap(connectionMatrix, kW, kH, [dW], [dH]) +``` + +This class is a generalization of +[nn.SpatialConvolution](#nn.SpatialConvolution). It uses a geenric +connection table between input and output features. The +[nn.SpatialConvolution](#nn.SpatialConvolution) is equivalent to +using a [full connection table](#nn.tables.full). One can specify +different types of connection tables. + +<a name="nn.tables.full"/> +#### Full Connection Table #### + +`table = nn.tables.full(nin,nout)` + +This is a precomputed table that specifies connections between every +input and output node. + +<a name="nn.tables.onetoone"/> +#### One to One Connection Table #### + +`table = nn.tables.oneToOne(n)` + +This is a precomputed table that specifies a single connection to each +output node from corresponding input node. + +<a name="nn.tables.random"/> +#### Random Connection Table #### + +`table = nn.tables.random(nin,nout, nto)` + +This table is randomly populated such that each output unit has +`nto` incoming connections. The algorihtm tries to assign uniform +number of outgoing connections to each input node if possible. + +<a name="nn.SpatialLPPooling"/> +### SpatialLPPooling ### + +```lua +module = nn.SpatialLPPooling(nInputPlane, pnorm, kW, kH, [dW], [dH]) +``` + +Computes the `p` norm in a convolutional manner on a set of 2D input planes. + +<a name="nn.SpatialMaxPooling"/> +### SpatialMaxPooling ### + +```lua +module = nn.SpatialMaxPooling(kW, kH [, dW, dH]) +``` + +Applies 2D max-pooling operation in `kWxkH` regions by step size +`dWxdH` steps. The number of output features is equal to the number of +input planes. + +<a name="nn.VolumetricMaxPooling"/> +### VolumetricMaxPooling ### + +```lua +module = nn.VolumetricMaxPooling(kT, kW, kH [, dT, dW, dH]) +``` + +Applies 3D max-pooling operation in `kTxkWxkH` regions by step size +`dTxdWxdH` steps. The number of output features is equal to the number of +input planes. + +<a name="nn.SpatialSubSampling"/> +### SpatialSubSampling ### + +```lua +module = nn.SpatialSubSampling(nInputPlane, kW, kH, [dW], [dH]) +``` + +Applies a 2D sub-sampling over an input image composed of several input planes. The `input` tensor in +`forward(input)` is expected to be a 3D tensor (`nInputPlane x height x width`). The number of output +planes will be the same as `nInputPlane`. + +The parameters are the following: + * `nInputPlane`: The number of expected input planes in the image given into `forward()`. + * `kW`: The kernel width of the sub-sampling + * `kH`: The kernel height of the sub-sampling + * `dW`: The step of the sub-sampling in the width dimension. Default is `1`. + * `dH`: The step of the sub-sampling in the height dimension. Default is `1`. + +Note that depending of the size of your kernel, several (of the last) +columns or rows of the input image might be lost. It is up to the user to +add proper padding in images. + +If the input image is a 3D tensor `nInputPlane x height x width`, the output image size +will be `nInputPlane x oheight x owidth` where +```lua +owidth = (width - kW) / dW + 1 +oheight = (height - kH) / dH + 1 . +``` + +The parameters of the sub-sampling can be found in `self.weight` (Tensor of +size `nInputPlane`) and `self.bias` (Tensor of size `nInputPlane`). The +corresponding gradients can be found in `self.gradWeight` and +`self.gradBias`. + +The output value of the layer can be precisely described as: +```lua +output[i][j][k] = bias[k] + + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k] +``` + +<a name="nn.SpatialZeroPadding"/> +### SpatialZeroPadding ### + +```lua +module = nn.SpatialZeroPadding(padLeft, padRight, padTop, padBottom) +``` + +Each feature map of a given input is padded with specified number of +zeros. If padding values are negative, then input is cropped. + +<a name="nn.SpatialSubtractiveNormalization"/> +### SpatialSubtractiveNormalization ### + +```lua +module = nn.SpatialSubtractiveNormalization(ninputplane, kernel) +``` + +Applies a spatial subtraction operation on a series of 2D inputs using +`kernel` for computing the weighted average in a neighborhood. The +neighborhood is defined for a local spatial region that is the size as +kernel and across all features. For a an input image, since there is +only one feature, the region is only spatial. For an RGB image, the +weighted anerage is taken over RGB channels and a spatial region. + +If the `kernel` is 1D, then it will be used for constructing and seperable +2D kernel. The operations will be much more efficient in this case. + +The kernel is generally chosen as a gaussian when it is believed that +the correlation of two pixel locations decrease with increasing +distance. On the feature dimension, a uniform average is used since +the weighting across features is not known. + +For this example we use an external package +[image](http://www.github.com/clementfarabet/lua---image/) + +```lua +require 'image' +require 'nn' +lena = image.rgb2y(image.lena()) +ker = torch.ones(11) +m=nn.SpatialSubtractiveNormalization(1,ker) +processed = m:forward(lena) +w1=image.display(lena) +w2=image.display(processed) +``` +![](lena.jpg)![](lenap.jpg) + +<a name="nn.TemporalConvolution"/> +### TemporalConvolution ### + +```lua +module = nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW]) +``` + +Applies a 1D convolution over an input sequence composed of `nInputFrame` frames. The `input` tensor in +`forward(input)` is expected to be a 2D tensor (`nInputFrame x inputFrameSize`). + +The parameters are the following: + * `inputFrameSize`: The input frame size expected in sequences given into `forward()`. + * `outputFrameSize`: The output frame size the convolution layer will produce. + * `kW`: The kernel width of the convolution + * `dW`: The step of the convolution. Default is `1`. + +Note that depending of the size of your kernel, several (of the last) +frames of the sequence might be lost. It is up to the user to add proper padding frames in the input +sequences. + +If the input sequence is a 2D tensor `inputFrameSize x nInputFrame`, the output sequence will be +`nOutputFrame x outputFrameSize` where +```lua +nOutputFrame = (nInputFrame - kW) / dW + 1 +``` + +The parameters of the convolution can be found in `self.weight` (Tensor of +size `outputFrameSize x (inputFrameSize x kW) `) and `self.bias` (Tensor of +size `outputFrameSize`). The corresponding gradients can be found in +`self.gradWeight` and `self.gradBias`. + +The output value of the layer can be precisely described as: +```lua +output[i][t] = bias[i] + + sum_j sum_{k=1}^kW weight[j][k][i] + * input[j][dW*(t-1)+k)] +``` + +Here is a simple example: + +```lua +inp=5; -- dimensionality of one sequence element +outp=1; -- number of derived features for one sequence element +kw=1; -- kernel only operates on one sequence element at once +dw=1; -- we step once and go on to the next sequence element + +mlp=nn.TemporalConvolution(inp,outp,kw,dw) + +x=torch.rand(7,inp) -- a sequence of 7 elements +print(mlp:forward(x)) +``` +which gives: +```lua +-0.9109 +-0.9872 +-0.6808 +-0.9403 +-0.9680 +-0.6901 +-0.6387 +[torch.Tensor of dimension 7x1] +``` + +This is equivalent to: +```lua +weights=torch.reshape(mlp.weight,inp) -- weights applied to all +bias= mlp.bias[1]; +for i=1,x:size(1) do -- for each sequence element + element= x[i]; -- features of ith sequence element + print(element:dot(weights) + bias) +end +``` +which gives: +```lua +-0.91094998687717 +-0.98721705771773 +-0.68075004276185 +-0.94030132495887 +-0.96798754116609 +-0.69008470895581 +-0.63871422284166 +``` + + +<a name="nn.TemporalSubSampling"/> +### TemporalSubSampling ### + +```lua +module = nn.TemporalSubSampling(inputFrameSize, kW, [dW]) +``` + +Applies a 1D sub-sampling over an input sequence composed of `nInputFrame` frames. The `input` tensor in +`forward(input)` is expected to be a 2D tensor (`nInputFrame x inputFrameSize`). The output frame size +will be the same as the input one (`inputFrameSize`). + +The parameters are the following: + * `inputFrameSize`: The input frame size expected in sequences given into `forward()`. + * `kW`: The kernel width of the sub-sampling + * `dW`: The step of the sub-sampling. Default is `1`. + +Note that depending of the size of your kernel, several (of the last) +frames of the sequence might be lost. It is up to the user to add proper padding frames in the input +sequences. + +If the input sequence is a 2D tensor `nInputFrame x inputFrameSize`, the output sequence will be +`inputFrameSize x nOutputFrame` where +```lua +nOutputFrame = (nInputFrame - kW) / dW + 1 +``` + +The parameters of the sub-sampling can be found in `self.weight` (Tensor of +size `inputFrameSize`) and `self.bias` (Tensor of +size `inputFrameSize`). The corresponding gradients can be found in +`self.gradWeight` and `self.gradBias`. + +The output value of the layer can be precisely described as: +```lua +output[i][t] = bias[i] + weight[i] * sum_{k=1}^kW input[i][dW*(t-1)+k)] +``` + +<a name="nn.LookupTable"/> +### LookupTable ### + +```lua +module = nn.LookupTable(nIndex, sizes) +``` +or +```lua +module = nn.LookupTable(nIndex, size1, [size2], [size3], ...) +``` + +This layer is a particular case of a convolution, where the width of the convolution would be `1`. +When calling `forward(input)`, it assumes `input` is a 1D tensor filled with indices. Indices start +at `1` and can go up to `nIndex`. For each index, it outputs a corresponding `Tensor` of size +specified by `sizes` (an `LongStorage`) or `size1 x size2 x...`. + +The output tensors are concatenated, generating a `size1 x size2 x ... x sizeN x n` tensor, where `n` +is the size of the `input` tensor. + +When only `size1` is provided, this is equivalent to do the following matrix-matrix multiplication +in an efficient manner: +```lua +M P +``` +where `M` is a 2D matrix `size1 x nIndex` containing the parameters of the lookup-table and +`P` is a 2D matrix, where each column vector `i` is a zero vector except at index `input[i]` where it is `1`. + +Example: +```lua + -- a lookup table containing 10 tensors of size 3 + module = nn.LookupTable(10, 3) + + input = torch.Tensor(4) + input[1] = 1; input[2] = 2; input[3] = 1; input[4] = 10; + print(module:forward(input)) +``` + +Outputs something like: +```lua +-0.1784 2.2045 -0.1784 -0.2475 +-1.0120 0.0537 -1.0120 -0.2148 +-1.2840 0.8685 -1.2840 -0.2792 +[torch.Tensor of dimension 3x4] +``` +Note that the first column vector is the same than the 3rd one! + +<a name="nn.TableLayers"/> +## Layers for manipulating tables ## + +This set of modules allows the manipulation of Tables +through the layers of a neural network. +This allows one to build very rich architectures. + +Table-based modules work by supporting forward and backward methods that can accept +tables as inputs. It turns out that the usual [Sequential](#nn.Sequential) module can do this, so all that is needed is other child modules that take advantage of such tables. +```lua +mlp = nn.Sequential(); +t={x,y,z} +pred=mlp:forward(t) +pred=mlp:forward{x,y,z} -- This is equivalent to the line before +``` + +<a name="nn.ConcatTable"/> +### ConcatTable ### + +ConcatTable is a container module that applies each member module to +the same input Tensor. + +Example: +```lua +mlp= nn.ConcatTable() +mlp:add(nn.Linear(5,2)) +mlp:add(nn.Linear(5,3)) + +pred=mlp:forward(torch.randn(5)); +for i,k in pairs(pred) do print(i,k); end +``` +which gives the output: +```lua +1 +-0.4073 + 0.0110 +[torch.Tensor of dimension 2] + +2 + 0.0027 +-0.0598 +-0.1189 +[torch.Tensor of dimension 3] +``` + +<a name="nn.ParallelTable"/> +### ParallelTable ### + +ParallelTable is a container module that, in its `forward` method, applies the `ith` member module to the `ith` input, and outputs a table of the set of outputs. + +Example: +```lua +mlp= nn.ParallelTable() +mlp:add(nn.Linear(10,2)) +mlp:add(nn.Linear(5,3)) + +x=torch.randn(10) +y=torch.rand(5) + +pred=mlp:forward{x,y} +for i,k in pairs(pred) do print(i,k); end +``` +which gives the output: +```lua +1 + 0.0331 + 0.7003 +[torch.Tensor of dimension 2] + +2 + 0.0677 +-0.1657 +-0.7383 +[torch.Tensor of dimension 3] +``` + +<a name="nn.SplitTable"/> +### SplitTable ### + +`module` = `SplitTable(dimension)` + +Creates a module that takes a Tensor as input and outputs several tables, splitting the Tensor along dimension `dimension`. + +Example 1: +```lua +mlp=nn.SplitTable(2) +x=torch.randn(4,3) +pred=mlp:forward(x) +for i,k in pairs(pred) do print(i,k); end +``` +gives the output: +```lua +1 + 1.3885 + 1.3295 + 0.4281 +-1.0171 +[torch.Tensor of dimension 4] + +2 +-1.1565 +-0.8556 +-1.0717 +-0.8316 +[torch.Tensor of dimension 4] + +3 +-1.3678 +-0.1709 +-0.0191 +-2.5871 +[torch.Tensor of dimension 4] +``` + +Example 2: +```lua +mlp=nn.SplitTable(1) +pred=mlp:forward(torch.randn(10,3)) +for i,k in pairs(pred) do print(i,k); end +``` +gives the output: +```lua +1 + 1.6114 + 0.9038 + 0.8419 +[torch.Tensor of dimension 3] + +2 + 2.4742 + 0.2208 + 1.6043 +[torch.Tensor of dimension 3] + +3 + 1.3415 + 0.2984 + 0.2260 +[torch.Tensor of dimension 3] + +4 + 2.0889 + 1.2309 + 0.0983 +[torch.Tensor of dimension 3] +``` + +A more complicated example: +```lua + +mlp=nn.Sequential(); --Create a network that takes a Tensor as input +mlp:add(nn.SplitTable(2)) + c=nn.ParallelTable() --The two Tensors go through two different Linear + c:add(nn.Linear(10,3)) --Layers in Parallel + c:add(nn.Linear(10,7)) +mlp:add(c) --Outputing a table with 2 elements + p=nn.ParallelTable() --These tables go through two more linear layers + p:add(nn.Linear(3,2)) -- separately. + p:add(nn.Linear(7,1)) +mlp:add(p) +mlp:add(nn.JoinTable(1)) --Finally, the tables are joined together and output. + +pred=mlp:forward(torch.randn(10,2)) +print(pred) + +for i=1,100 do -- A few steps of training such a network.. + x=torch.ones(10,2); + y=torch.Tensor(3); y:copy(x:select(2,1,1):narrow(1,1,3)) + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + local err=criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.05); + + print(err) +end +``` + +<a name="nn.JoinTable"/> +### JoinTable ### + +`module` = `JoinTable(dimension)` + +Creates a module that takes a list of Tensors as input and outputs a Tensor by joining them together along dimension `dimension`. + +Example: +```lua +x=torch.randn(5,1) +y=torch.randn(5,1) +z=torch.randn(2,1) + +print(nn.JoinTable(1):forward{x,y}) +print(nn.JoinTable(2):forward{x,y}) +print(nn.JoinTable(1):forward{x,z}) +``` +gives the output: +```lua +1.3965 + 0.5146 +-1.5244 +-0.9540 + 0.4256 + 0.1575 + 0.4491 + 0.6580 + 0.1784 +-1.7362 + + 1.3965 0.1575 + 0.5146 0.4491 +-1.5244 0.6580 +-0.9540 0.1784 + 0.4256 -1.7362 + + 1.3965 + 0.5146 +-1.5244 +-0.9540 + 0.4256 +-1.2660 + 1.0869 +[torch.Tensor of dimension 7x1] +``` + +A more complicated example: +```lua + +mlp=nn.Sequential(); --Create a network that takes a Tensor as input + c=nn.ConcatTable() --The same Tensor goes through two different Linear + c:add(nn.Linear(10,3)) --Layers in Parallel + c:add(nn.Linear(10,7)) +mlp:add(c) --Outputing a table with 2 elements + p=nn.ParallelTable() --These tables go through two more linear layers + p:add(nn.Linear(3,2)) -- separately. + p:add(nn.Linear(7,1)) +mlp:add(p) +mlp:add(nn.JoinTable(1)) --Finally, the tables are joined together and output. + +pred=mlp:forward(torch.randn(10)) +print(pred) + +for i=1,100 do -- A few steps of training such a network.. + x=torch.ones(10); + y=torch.Tensor(3); y:copy(x:narrow(1,1,3)) + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + local err=criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.05); + + print(err) +end +``` + +<a name="nn.Identity"/> +### Identity ### + +`module` = `Identity()` + +Creates a module that returns whatever is input to it as output. +This is useful when combined with the module +[ParallelTable](#nn.ParallelTable) +in case you do not wish to do anything to one of the input Tensors. +Example: +```lua +mlp=nn.Identity() +print(mlp:forward(torch.ones(5,2))) +``` +gives the output: +```lua + 1 1 + 1 1 + 1 1 + 1 1 + 1 1 +[torch.Tensor of dimension 5x2] +``` + +Here is a more useful example, where one can implement a network which also computes a Criterion using this module: +```lua +pred_mlp=nn.Sequential(); -- A network that makes predictions given x. +pred_mlp:add(nn.Linear(5,4)) +pred_mlp:add(nn.Linear(4,3)) + +xy_mlp=nn.ParallelTable();-- A network for predictions and for keeping the +xy_mlp:add(pred_mlp) -- true label for comparison with a criterion +xy_mlp:add(nn.Identity()) -- by forwarding both x and y through the network. + +mlp=nn.Sequential(); -- The main network that takes both x and y. +mlp:add(xy_mlp) -- It feeds x and y to parallel networks; +cr=nn.MSECriterion(); +cr_wrap=nn.CriterionTable(cr) +mlp:add(cr_wrap) -- and then applies the criterion. + +for i=1,100 do -- Do a few training iterations + x=torch.ones(5); -- Make input features. + y=torch.Tensor(3); + y:copy(x:narrow(1,1,3)) -- Make output label. + err=mlp:forward{x,y} -- Forward both input and output. + print(err) -- Print error from criterion. + + mlp:zeroGradParameters(); -- Do backprop... + mlp:backward({x, y} ); + mlp:updateParameters(0.05); +end +``` + +<a name="nn.PairwiseDistance"/> +### PairwiseDistance ### + +`module` = `PairwiseDistance(p)` creates a module that takes a table of two vectors as input and outputs the distance between them using the `p`-norm. + +Example: +```lua +mlp_l1=nn.PairwiseDistance(1) +mlp_l2=nn.PairwiseDistance(2) +x=torch.Tensor(1,2,3) +y=torch.Tensor(4,5,6) +print(mlp_l1:forward({x,y})) +print(mlp_l2:forward({x,y})) +``` +gives the output: +```lua + 9 +[torch.Tensor of dimension 1] + + 5.1962 +[torch.Tensor of dimension 1] +``` + +A more complicated example: +```lua +-- imagine we have one network we are interested in, it is called "p1_mlp" +p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) + +-- But we want to push examples towards or away from each other +-- so we make another copy of it called p2_mlp +-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage +-- that's why we create it again (so that the gradients of the pair don't wipe each other) +p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2)) +p2_mlp:get(1).weight:set(p1_mlp:get(1).weight) +p2_mlp:get(1).bias:set(p1_mlp:get(1).bias) + +-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp +prl = nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +-- now we define our top level network that takes this parallel table and computes the pairwise distance betweem +-- the pair of outputs +mlp= nn.Sequential() +mlp:add(prl) +mlp:add(nn.PairwiseDistance(1)) + +-- and a criterion for pushing together or pulling apart pairs +crit=nn.HingeEmbeddingCriterion(1) + +-- lets make two example vectors +x=torch.rand(5) +y=torch.rand(5) + + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) +local pred = mlp:forward(x) +local err = criterion:forward(pred, y) +local gradCriterion = criterion:backward(pred, y) +mlp:zeroGradParameters() +mlp:backward(x, gradCriterion) +mlp:updateParameters(learningRate) +end + +-- push the pair x and y together, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets smaller +for i=1,10 do +gradUpdate(mlp,{x,y},1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + + +-- pull apart the pair x and y, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets larger + +for i=1,10 do +gradUpdate(mlp,{x,y},-1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + +``` + +<a name="nn.DotProduct"/> +### DotProduct ### + +`module` = `DotProduct()` creates a module that takes a table of two vectors as input and outputs the dot product between them. + +Example: +```lua +mlp=nn.DotProduct() +x=torch.Tensor(1,2,3) +y=torch.Tensor(4,5,6) +print(mlp:forward({x,y})) +``` +gives the output: +```lua + 32 +[torch.Tensor of dimension 1] +``` + + +A more complicated example: +```lua + +-- Train a ranking function so that mlp:forward({x,y},{x,z}) returns a number +-- which indicates whether x is better matched with y or z (larger score = better match), or vice versa. + +mlp1=nn.Linear(5,10) +mlp2=mlp1:clone('weight','bias') + +prl=nn.ParallelTable(); +prl:add(mlp1); prl:add(mlp2) + +mlp1=nn.Sequential() +mlp1:add(prl) +mlp1:add(nn.DotProduct()) + +mlp2=mlp1:clone('weight','bias') + +mlp=nn.Sequential() +prla=nn.ParallelTable() +prla:add(mlp1) +prla:add(mlp2) +mlp:add(prla) + +x=torch.rand(5); +y=torch.rand(5) +z=torch.rand(5) + + +print(mlp1:forward{x,x}) +print(mlp1:forward{x,y}) +print(mlp1:forward{y,y}) + + +crit=nn.MarginRankingCriterion(1); + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end + +inp={{x,y},{x,z}} + +math.randomseed(1) + +-- make the pair x and y have a larger dot product than x and z + +for i=1,100 do + gradUpdate(mlp,inp,1,crit,0.05) + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlp:forward{{x,y},{x,z}},1) + print(o1,o2,o) +end + +print "________________**" + +-- make the pair x and z have a larger dot product than x and y + +for i=1,100 do + gradUpdate(mlp,inp,-1,crit,0.05) + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlp:forward{{x,y},{x,z}},-1) + print(o1,o2,o) +end +``` + + +<a name="nn.CosineDistance"/> +### CosineDistance ### + +`module` = `CosineDistance()` creates a module that takes a table of two vectors as input and outputs the cosine distance between them. + +Example: +```lua +mlp=nn.CosineDistance() +x=torch.Tensor(1,2,3) +y=torch.Tensor(4,5,6) +print(mlp:forward({x,y})) +``` +gives the output: +```lua + 0.9746 +[torch.Tensor of dimension 1] +``` + +A more complicated example: +```lua + +-- imagine we have one network we are interested in, it is called "p1_mlp" +p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) + +-- But we want to push examples towards or away from each other +-- so we make another copy of it called p2_mlp +-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage +-- that's why we create it again (so that the gradients of the pair don't wipe each other) +p2_mlp= p1_mlp:clone('weight','bias') + +-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp +prl = nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +-- now we define our top level network that takes this parallel table and computes the cosine distance betweem +-- the pair of outputs +mlp= nn.Sequential() +mlp:add(prl) +mlp:add(nn.CosineDistance()) + + +-- lets make two example vectors +x=torch.rand(5) +y=torch.rand(5) + +-- Grad update function.. +function gradUpdate(mlp, x, y, learningRate) +local pred = mlp:forward(x) +if pred[1]*y < 1 then + gradCriterion=torch.Tensor(-y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end +end + +-- push the pair x and y together, the distance should get larger.. +for i=1,1000 do + gradUpdate(mlp,{x,y},1,0.1) + if ((i%100)==0) then print(mlp:forward({x,y})[1]);end +end + + +-- pull apart the pair x and y, the distance should get smaller.. + +for i=1,1000 do + gradUpdate(mlp,{x,y},-1,0.1) + if ((i%100)==0) then print(mlp:forward({x,y})[1]);end +end +``` + + + +<a name="nn.CriterionTable"/> +### CriterionTable ### + +`module` = `CriterionTable(criterion)` + +Creates a module that wraps a Criterion module so that it can accept a Table of inputs. Typically the table would contain two elements: the input and output `x` and `y` that the Criterion compares. + +Example: +```lua +mlp = nn.CriterionTable(nn.MSECriterion()) +x=torch.randn(5) +y=torch.randn(5) +print(mlp:forward{x,x}) +print(mlp:forward{x,y}) +``` +gives the output: +```lua +0 +1.9028918413199 +``` + +Here is a more complex example of embedding the criterion into a network: +```lua + +function table.print(t) + for i,k in pairs(t) do print(i,k); end +end + +mlp=nn.Sequential(); -- Create an mlp that takes input + main_mlp=nn.Sequential(); -- and output using ParallelTable + main_mlp:add(nn.Linear(5,4)) + main_mlp:add(nn.Linear(4,3)) + cmlp=nn.ParallelTable(); + cmlp:add(main_mlp) + cmlp:add(nn.Identity()) +mlp:add(cmlp) +mlp:add(nn.CriterionTable(nn.MSECriterion())) -- Apply the Criterion + +for i=1,20 do -- Train for a few iterations + x=torch.ones(5); + y=torch.Tensor(3); y:copy(x:narrow(1,1,3)) + err=mlp:forward{x,y} -- Pass in both input and output + print(err) + + mlp:zeroGradParameters(); + mlp:backward({x, y} ); + mlp:updateParameters(0.05); +end +``` + +<a name="nn.CAddTable"/> +### CAddTable ### + +Takes a table of tensors and outputs summation of all tensors. + +```lua +ii = {torch.ones(5),torch.ones(5)*2,torch.ones(5)*3} +=ii[1] + 1 + 1 + 1 + 1 + 1 +[torch.DoubleTensor of dimension 5] + +return ii[2] + 2 + 2 + 2 + 2 + 2 +[torch.DoubleTensor of dimension 5] + +return ii[3] + 3 + 3 + 3 + 3 + 3 +[torch.DoubleTensor of dimension 5] + +m=nn.CAddTable() +=m:forward(ii) + 6 + 6 + 6 + 6 + 6 +[torch.DoubleTensor of dimension 5] +``` + + +<a name="nn.CSubTable"/> +### CSubTable ### + +Takes a table with two tensor and returns the component-wise +subtraction between them. + +```lua +m=nn.CSubTable() +=m:forward({torch.ones(5)*2.2,torch.ones(5)}) + 1.2000 + 1.2000 + 1.2000 + 1.2000 + 1.2000 +[torch.DoubleTensor of dimension 5] +``` + +<a name="nn.CMulTable"/> +### CMulTable ### + +Takes a table of tensors and outputs the multiplication of all of them. + +```lua +ii = {torch.ones(5)*2,torch.ones(5)*3,torch.ones(5)*4} +m=nn.CMulTable() +=m:forward(ii) + 24 + 24 + 24 + 24 + 24 +[torch.DoubleTensor of dimension 5] + +``` + +<a name="nn.CDivTable"/> +### CDivTable ### + +Takes a table with two tensor and returns the component-wise +division between them. + +```lua +m=nn.CDivTable() +=m:forward({torch.ones(5)*2.2,torch.ones(5)*4.4}) + 0.5000 + 0.5000 + 0.5000 + 0.5000 + 0.5000 +[torch.DoubleTensor of dimension 5] +``` + +<a name="nn.Criterions"/> +# Criterions # + +Criterions are helpful to train a neural network. Given an input and a +target, they compute a gradient according to a given loss +function. [AbsCriterion](#nn.AbsCriterion) and +[MSECriterion](#nn.MSECriterion) are perfect for regression problems, while +[ClassNLLCriterion](#nn.ClassNLLCriterion) is the criterion of choice when +dealing with classification. + +Criterions are [serializable](..:torch:file#torch.file.serialization). + +<a name="nn.Criterion"/> +## Criterion ## + +This is an abstract class which declares methods defined in all criterions. +This class is [serializable](..:torch:file#torch.file.serialization). + +<a name="nn.Criterion.forward"/> +### [output] forward(input, target) ### + +Given an `input` and a `target`, compute the loss function associated to the criterion and return the +result. In general `input` and `target` are [tensors](..:torch:tensor), but some specific criterions +might require some other type of object. + +The `output` returned should be a scalar in general. + +The state variable [self.output](#nn.Criterion.output) should be updated after a call to `forward()`. + +<a name="nn.Criterion.backward"/> +### [gradInput] backward(input, target) ### + +Given an `input` and a `target`, compute the gradients of the loss function associated to the criterion and +return the result.In general `input`, `target` and `gradInput` are [tensors](..:torch:tensor), but some specific criterions +might require some other type of object. + +The state variable [self.gradInput](#nn.Criterion.gradInput) should be updated after a call to `backward()`. + +<a name="nn.Criterion.output"/> +### State variable: output ### + +State variable which contains the result of the last [forward(input, target)](#nn.Criterion.forward) call. + +<a name="nn.Criterion.gradInput"/> +### State variable: gradInput ### + +State variable which contains the result of the last [backward(input, target)](#nn.Criterion.backward) call. + +<a name="nn.AbsCriterion"/> +## AbsCriterion ## + +```lua +criterion = AbsCriterion() +``` + +Creates a criterion that +measures the mean absolute value between `n` elements in the input `x` +and output `y`: + +`loss(x,y)` = `1/n \sum |x_i-y_i|`. + +If `x` and `y` are `d`-dimensional Tensors with a total of `n` elements, +the sum operation still operates over all the elements, and divides by `n`. + +The division by `n` can be avoided if one sets the internal variable `sizeAverage` to `false`: +```lua +criterion = nn.AbsCriterion() +criterion.sizeAverage = false +``` + +<a name="nn.ClassNLLCriterion"/> +## ClassNLLCriterion ## + +```lua +criterion = ClassNLLCriterion() +``` + +The negative log likelihood criterion. It is useful to train a classication +problem with `n` classes. The `input` given through a `forward()` is +expected to contain _log-probabilities_ of each class: `input` has to be a +1D tensor of size `n`. Obtaining log-probabilities in a neural network is +easily achieved by adding a [LogSoftMax](#nn.LogSoftMax) layer in the last +layer of your neural network. + +This criterion expect a class index (1 to the number of class) as `target` +when calling [forward(input, target)](#nn.CriterionForward) and +[backward(input, target)](#nn.CriterionBackward). + +The loss can be described as: +```lua +loss(x, class) = forward(x, class) = -x[class] +``` + +The following is a code fragment showing how to make a gradient step +given an input `x`, a desired output `y` (an integer `1` to `n`, +in this case `n` = `2` classes), +a network `mlp` and a learning rate `learningRate`: +```lua +function gradUpdate(mlp,x,y,learningRate) + local criterion = nn.ClassNLLCriterion() + pred = mlp:forward(x) + local err = criterion:forward(pred, y); + mlp:zeroGradParameters(); + local t = criterion:backward(pred, y); + mlp:backward(x, t); + mlp:updateParameters(learningRate); +end +``` + +<a name="nn.MarginCriterion"/> +## MarginCriterion ## + +```lua +criterion = MarginCriterion() +``` + +Creates a criterion that optimizes a two-class classification hinge loss (margin-based loss) between input `x` (a Tensor of dimension 1) and output `y` (which is a scalar, either 1 or -1) : + +```lua +loss(x,y) = forward(x,y) = max(0,m- y x). +``` + +`m` is the margin, which is by default 1. + +```lua +criterion = MarginCriterion(marginValue) +``` + +sets a different value of `m`. + + +Example: +```lua +require "nn" + +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end + +mlp=nn.Sequential() +mlp:add(nn.Linear(5,1)) + +x1=torch.rand(5) +x2=torch.rand(5) +criterion=nn.MarginCriterion(1) + +for i=1,1000 do + gradUpdate(mlp,x1,1,criterion,0.01) + gradUpdate(mlp,x2,-1,criterion,0.01) +end + +print(mlp:forward(x1)) +print(mlp:forward(x2)) + +print(criterion:forward(mlp:forward(x1),1)) +print(criterion:forward(mlp:forward(x2),-1)) +``` +gives the output: +```lua + 1.0043 +[torch.Tensor of dimension 1] + + +-1.0061 +[torch.Tensor of dimension 1] + +0 +0 +``` +i.e. the mlp successfully separates the two data points such that they both have a margin of 1, and hence a loss of 0. + +<a name="nn.MultiMarginCriterion"/> +## MultiMarginCriterion ## + +```lua +criterion = MultiMarginCriterion() +``` + +Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input `x` (a Tensor of dimension 1) and output `y` (which is a target class index, 1 <= y <= x:size(1)) : + +```lua +loss(x,y) = forward(x,y) = sum_i(max(0, 1 - (x[y] - x[i]))) / x:size(1) +``` +where i = 1 to x:size(1) and i ~= y + +<a name="nn.MSECriterion"/> +## MSECriterion ## + +```lua +criterion = MSECriterion() +``` + +Creates a criterion that measures the mean squared error between `n` elements in the input `x` +and output `y`: + +```lua +loss(x,y) = forward(x,y) = 1/n \sum |x_i-y_i|^2 . +``` + +If `x` and `y` are `d`-dimensional Tensors with a total of `n` elements, +the sum operation still operates over all the elements, and divides by `n`. The two tensors must +have the same number of elements (but their sizes might be different...) + +The division by `n` can be avoided if one sets the internal variable `sizeAverage` to `false`: +```lua +criterion = nn.MSECriterion() +criterion.sizeAverage = false +``` + +<a name="nn.MultiCriterion"/> +## MultiCriterion ## + +```lua +criterion = MultiCriterion() +``` + +This returns a Criterion which is a weighted sum of other Criterion. +Criterions are added using the method: + +`criterion:add(singleCriterion, weight)` + +where `weight` is a scalar. + + +<a name="nn.HingeEmbeddingCriterion"/> +## HingeEmbeddingCriterion ## + +```lua +criterion = HingeEmbeddingCriterion() +``` + +Creates a criterion that measures the loss given an input +`x` which is a 1-dimensional vector and a label `y` (1 or -1). +This is usually used for measuring whether two inputs are similar +or dissimilar, e.g. using the L1 pairwise distance, +and is typically used for +learning nonlinear embeddings or semi-supervised learning. + +<verbatim> +loss(x,y) = forward(x,y) = x, if y=1 += max(0,margin - x), if y=-1 +</verbatim> + +The `margin` has a default value of 1, or can be set in the constructor: +```lua +criterion = HingeEmbeddingCriterion(marginValue) +``` + +Example use: +```lua +-- imagine we have one network we are interested in, it is called "p1_mlp" +p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) + +-- But we want to push examples towards or away from each other +-- so we make another copy of it called p2_mlp +-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage +-- that's why we create it again (so that the gradients of the pair don't wipe each other) +p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2)) +p2_mlp:get(1).weight:set(p1_mlp:get(1).weight) +p2_mlp:get(1).bias:set(p1_mlp:get(1).bias) + +-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp +prl = nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +-- now we define our top level network that takes this parallel table and computes the pairwise distance betweem +-- the pair of outputs +mlp= nn.Sequential() +mlp:add(prl) +mlp:add(nn.PairwiseDistance(1)) + +-- and a criterion for pushing together or pulling apart pairs +crit=nn.HingeEmbeddingCriterion(1) + +-- lets make two example vectors +x=torch.rand(5) +y=torch.rand(5) + + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) +local pred = mlp:forward(x) +local err = criterion:forward(pred, y) +local gradCriterion = criterion:backward(pred, y) +mlp:zeroGradParameters() +mlp:backward(x, gradCriterion) +mlp:updateParameters(learningRate) +end + +-- push the pair x and y together, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets smaller +for i=1,10 do +gradUpdate(mlp,{x,y},1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + + +-- pull apart the pair x and y, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets larger + +for i=1,10 do +gradUpdate(mlp,{x,y},-1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + +``` + +<a name="nn.L1HingeEmbeddingCriterion"/> +## L1HingeEmbeddingCriterion ## + +```lua +criterion = L1HingeEmbeddingCriterion(margin) +``` + +Creates a criterion that measures the loss given an input +`x` = `{x1,x2}`, a table of two tensors, and a label `y` (1 or -1): +This is used for measuring whether two inputs are similar +or dissimilar, using the L1 distance, and is typically used for +learning nonlinear embeddings or semi-supervised learning. + +<verbatim> +loss(x,y) = forward(x,y) = ||x1-x2||_1, if y=1 += max(0,margin - ||x1-x2||_1), if y=-1 +</verbatim> + +The `margin` has a default value of 1, or can be set in the constructor: +```lua +criterion = L1HingeEmbeddingCriterion(marginValue) +``` + +<a name="nn.CosineEmbeddingCriterion"/> +## CosineEmbeddingCriterion ## + +```lua +criterion = nn.CosineEmbeddingCriterion(margin) +``` + +Creates a criterion that measures the loss given an input +`x` = `{x1,x2}`, a table of two tensors, and a label `y` (1 or -1): +This is used for measuring whether two inputs are similar +or dissimilar, using the cosine distance, and is typically used for +learning nonlinear embeddings or semi-supervised learning. + +`margin` should be a number from -1 to 1, 0 to 0.5 is suggested. +Forward and Backward have to be used alternately. If `margin` is missing, the default value is 0. + +The loss function is: +<verbatim> +loss(x,y) = forward(x,y) = 1-cos(x1, x2), if y=1 += max(0,cos(x1, x2)-margin), if y=-1 +</verbatim> + +<a name="nn.MarginRankingCriterion"/> +## MarginRankingCriterion ## + +```lua +criterion = nn.MarginRankingCriterion(margin) +``` + +Creates a criterion that measures the loss given an input +`x` = `{x1,x2}`, a table of two Tensors of size 1 (they contain only scalars), +and a label `y` (1 or -1): + +If `y` = `1` then it assumed the first input should be ranked higher (have a larger value) +than the second input, and vice-versa for `y` = `-1`. + +The loss function is: +<verbatim> +loss(x,y) = forward(x,y) = max(0,-y*(x[1]-x[2])+margin) +</verbatim> + +Example: +```lua + +p1_mlp= nn.Linear(5,2) +p2_mlp= p1_mlp:clone('weight','bias') + +prl=nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +mlp1=nn.Sequential() +mlp1:add(prl) +mlp1:add(nn.DotProduct()) + +mlp2=mlp1:clone('weight','bias') + +mlpa=nn.Sequential() +prla=nn.ParallelTable() +prla:add(mlp1) +prla:add(mlp2) +mlpa:add(prla) + +crit=nn.MarginRankingCriterion(0.1) + +x=torch.randn(5) +y=torch.randn(5) +z=torch.randn(5) + + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end + +for i=1,100 do + gradUpdate(mlpa,{{x,y},{x,z}},1,crit,0.01) + if true then + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlpa:forward{{x,y},{x,z}},1) + print(o1,o2,o) + end +end + +print "--" + +for i=1,100 do + gradUpdate(mlpa,{{x,y},{x,z}},-1,crit,0.01) + if true then + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlpa:forward{{x,y},{x,z}},-1) + print(o1,o2,o) + end +end +``` + +<a name="nn.traningneuralnet.dok"/> +# Training a neural network # + +Training a neural network is easy with a [simple `for` loop](#nn.DoItYourself). +While doing your own loop provides great flexibility, you might +want sometimes a quick way of training neural +networks. [StochasticGradient](#nn.StochasticGradient), a simple class +which does the job for you is provided as standard. + +<a name="nn.StochasticGradient.dok"/> +## StochasticGradient ## + +`StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient +algorithm. This class is [serializable](..:torch:file#torch.file.serialization). + +<a name="nn.StochasticGradient"/> +### StochasticGradient(module, criterion) ### + +Create a `StochasticGradient` class, using the given [Module](#nn.Module) and [Criterion](#nn.Criterion). +The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization. + +<a name="nn.StochasticGradientTrain"/> +### train(dataset) ### + +Train the module and criterion given in the +[constructor](#nn.StochasticGradient) over `dataset`, using the +internal [parameters](#nn.StochasticGradientParameters). + +StochasticGradient expect as a `dataset` an object which implements the operator +`dataset[index]` and implements the method `dataset:size()`. The `size()` methods +returns the number of examples and `dataset[i]` has to return the i-th example. + +An `example` has to be an object which implements the operator +`example[field]`, where `field` might take the value `1` (input features) +or `2` (corresponding label which will be given to the criterion). +The input is usually a Tensor (except if you use special kind of gradient modules, +like [table layers](#nn.TableLayers)). The label type depends of the criterion. +For example, the [MSECriterion](#nn.MSECriterion) expects a Tensor, but the +[ClassNLLCriterion](#nn.ClassNLLCriterion) except a integer number (the class). + +Such a dataset is easily constructed by using Lua tables, but it could any `C` object +for example, as long as required operators/methods are implemented. +[See an example](#nn.DoItStochasticGradient). + +<a name="nn.StochasticGradientParameters"/> +### Parameters ### + +`StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain). + + * `learningRate`: This is the learning rate used during training. The update of the parameters will be `parameters = parameters - learningRate * parameters_gradient`. Default value is `0.01`. + * `learningRateDecay`: The learning rate decay. If non-zero, the learning rate (note: the field learningRate will not change value) will be computed after each iteration (pass over the dataset) with: `current_learning_rate =learningRate / (1 + iteration * learningRateDecay)` + * `maxIteration`: The maximum number of iteration (passes over the dataset). Default is `25`. + * `shuffleIndices`: Boolean which says if the examples will be randomly sampled or not. Default is `true`. If `false`, the examples will be taken in the order of the dataset. + * `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`. + * `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration)` as parameters. Default is `nil`. + +<a name="nn.DoItStochasticGradient"/> +## Example of training using StochasticGradient ## + +We show an example here on a classical XOR problem. + +__Dataset__ + +We first need to create a dataset, following the conventions described in +[StochasticGradient](#nn.StochasticGradientTrain). +```lua +dataset={}; +function dataset:size() return 100 end -- 100 examples +for i=1,dataset:size() do + local input = torch.randn(2); -- normally distributed example in 2d + local output = torch.Tensor(1); + if input[1]*input[2]>0 then -- calculate label for XOR function + output[1] = -1; + else + output[1] = 1 + end + dataset[i] = {input, output} +end +``` + +__Neural Network__ + +We create a simple neural network with one hidden layer. +```lua +require "nn" +mlp = nn.Sequential(); -- make a multi-layer perceptron +inputs = 2; outputs = 1; HUs = 20; -- parameters +mlp:add(nn.Linear(inputs, HUs)) +mlp:add(nn.Tanh()) +mlp:add(nn.Linear(HUs, outputs)) +``` + +__Training__ + +We choose the Mean Squared Error criterion and train the beast. +```lua +criterion = nn.MSECriterion() +trainer = nn.StochasticGradient(mlp, criterion) +trainer.learningRate = 0.01 +trainer:train(dataset) +``` + +__Test the network__ + +```lua +x = torch.Tensor(2) +x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) +``` + +You should see something like: +```lua +> x = torch.Tensor(2) +> x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) + +-0.3490 +[torch.Tensor of dimension 1] + +> x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) + + 1.0561 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) + + 0.8640 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) + +-0.2941 +[torch.Tensor of dimension 1] +``` + +<a name="nn.DoItYourself"/> +## Example of manual training of a neural network ## + +We show an example here on a classical XOR problem. + +__Neural Network__ + +We create a simple neural network with one hidden layer. +```lua +require "nn" +mlp = nn.Sequential(); -- make a multi-layer perceptron +inputs = 2; outputs = 1; HUs = 20; -- parameters +mlp:add(nn.Linear(inputs, HUs)) +mlp:add(nn.Tanh()) +mlp:add(nn.Linear(HUs, outputs)) +``` + +__Loss function__ + +We choose the Mean Squared Error criterion. +```lua +criterion = nn.MSECriterion() +``` + +__Training__ + +We create data _on the fly_ and feed it to the neural network. + +```lua +for i = 1,2500 do + -- random sample + local input= torch.randn(2); -- normally distributed example in 2d + local output= torch.Tensor(1); + if input[1]*input[2] > 0 then -- calculate label for XOR function + output[1] = -1 + else + output[1] = 1 + end + + -- feed it to the neural network and the criterion + criterion:forward(mlp:forward(input), output) + + -- train over this example in 3 steps + -- (1) zero the accumulation of the gradients + mlp:zeroGradParameters() + -- (2) accumulate gradients + mlp:backward(input, criterion:backward(mlp.output, output)) + -- (3) update parameters with a 0.01 learning rate + mlp:updateParameters(0.01) +end +``` + +__Test the network__ + +```lua +x = torch.Tensor(2) +x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) +``` + +You should see something like: +```lua +> x = torch.Tensor(2) +> x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) + +-0.6140 +[torch.Tensor of dimension 1] + +> x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) + + 0.8878 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) + + 0.8548 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) + +-0.5498 +[torch.Tensor of dimension 1] +``` + |