diff options
author | nicholas-leonard <nick@nikopia.org> | 2014-05-11 21:47:37 +0400 |
---|---|---|
committer | nicholas-leonard <nick@nikopia.org> | 2014-05-11 21:47:37 +0400 |
commit | 151b75e717037458072e88d7d66a6ac51a6893c0 (patch) | |
tree | 1ad8f515ec5e7a3c5de60a108f1818b77e809f5f /README.md | |
parent | d389ba76e9676ee2d00ed87dd1934640f9266c63 (diff) |
initial Commit for refactoring doc
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 3013 |
1 files changed, 1 insertions, 3012 deletions
@@ -13,7 +13,7 @@ are several sub-classes of `Module` available: container classes like functions like [Tanh](#nn.Tanh). Loss functions are implemented as sub-classes of -[Criterion](#nn.Criterions). They are helpful to train neural network on +[Criterion](doc/criterion#nn.Criterions). They are helpful to train neural network on classical tasks. Common criterions are the Mean Squared Error criterion implemented in [MSECriterion](#nn.MSECriterion) and the cross-entropy criterion implemented in @@ -144,3020 +144,9 @@ For example, if you wish to use your own criterion you can simple replace `gradCriterion` with the gradient vector of your criterion of choice. -<a name="nn.Modules"/> -# Modules # -Modules are bricks to build neural networks. A [Module](#nn.Module) is a neural network -by itself, but it can be combined with other networks using [container classes](#nn.Containers) to create -complex neural networks. -<a name="nn.Module"/> -## Module ## -`Module` is an abstract class which defines fundamental methods necessary -for a training a neural network. Modules are [serializable](..:torch:file#torch.file.serialization). -Modules contain two states variables: [output](#nn.ModuleOutput) and -[gradInput](#nn.ModuleGradInput). -<a name="nn.Module.forward"/> -### [output] forward(input) ### - -Takes an `input` object, and computes the corresponding `output` of the -module. In general `input` and `output` are -[Tensors](..:torch:tensor). However, some special sub-classes -like [table layers](#nn.TableLayers) might expect something else. Please, -refer to each module specification for further information. - -After a `forward()`, the [ouput](#nn.ModuleOutput) state variable should -have been updated to the new value. - -It is not advised to override this function. Instead, one should -implement [updateOutput(input)](#nn.Module.updateOutput) -function. The forward module in the abstract parent class -[Module](#nn.Module) will call `updateOutput(input)`. - -<a name="nn.Module.backward"/> -### [gradInput] backward(input, gradOutput) ### - -Performs a _backpropagation step_ through the module, with respect to the -given `input`. In general this method makes the assumption -[forward(input)](#nn.Module.forward) has been called before, _with the same input_. -This is necessary for optimization reasons. If you do not respect -this rule, `backward()` will compute incorrect gradients. - -In general `input` and `gradOutput` and `gradInput` are -[Tensors](..:torch:tensor). However, some special sub-classes -like [table layers](#nn.TableLayers) might expect something else. Please, -refer to each module specification for further information. - -A _backpropagation step_ consist in computing two kind of gradients -at `input` given `gradOutput` (gradients with respect to the -output of the module). This function simply performs this task using -two function calls: - - - A function call to [updateGradInput(input, gradOutput)](#nn.Module.updateGradInput). - - A function call to [accGradParameters(input,gradOutput)](#nn.Module.accGradParameters). - -It is not advised to override this function call in custom classes. It -is better to override -[updateGradInput(input, gradOutput)](#nn.Module.updateGradInput) and -[accGradParameters(input, gradOutput)](#nn.Module.accGradParameters) -functions. - -<a name="nn.Module.updateOutput"/> -### updateOutput(input) ### - -Computes the output using the current parameter set of the class and -input. This function returns the result which is stored in the -[output](#nn.Module.output) field. - -<a name="nn.Module.updateGradInput"/> -### updateGradInput(input, gradOutput) ### - -Computing the gradient of the module with respect to its own -input. This is returned in `gradInput`. Also, the -[gradInput](#nn.Module.gradInput) state variable is updated -accordingly. - -<a name="nn.Module.accGradParameters"/> -### accGradParameters(input, gradOutput) ### - -Computing the gradient of the module with respect to its -ownparameters. Many modules do not perform this step as they do not -have any parameters. The state variable name for the parameters is -module dependent. The module is expected to _accumulate_ the -gradients with respect to the parameters in some variable. - -Zeroing this accumulation is achieved with -[zeroGradParameters()](#nn.Module.zeroGradParameters) and updating -the parameters according to this accumulation is done with -[updateParameters()](#nn.Module.updateParameters). - -<a name="nn.Module.zeroGradParameters"/> -### zeroGradParameters() ### - -If the module has parameters, this will zero the accumulation of the -gradients with respect to these parameters, accumulated through -[accGradParameters(input, gradOutput)](#nn.Module.accGradParameters) -calls. Otherwise, it does nothing. - -<a name="nn.Module.updateParameters"/> -### updateParameters(learningRate) ### - -If the module has parameters, this will update these parameters, according -to the accumulation of the gradients with respect to these parameters, -accumulated through [backward()](#nn.Module.backward) calls. - -The update is basically: -```lua -parameters = parameters - learningRate * gradients_wrt_parameters -``` -If the module does not have parameters, it does nothing. - -<a name="nn.Module.accUpdateGradParameters"/> -### accUpdateGradParameters(input, gradOutput, learningRate) ### - -This is a convenience module that performs two functions at -once. Calculates and accumulates the gradients with respect to the -weights after mutltiplying with negative of the learning rate -`learningRate`. Performing these two operations at once is more -performance efficient and it might be advantageous in certain -situations. - -Keep in mind that, this function uses a simple trick to achieve its -goal and it might not be valid for a custom module. - -Also note that compared to accGradParameters(), the gradients are not retained -for future use. - -```lua -function Module:accUpdateGradParameters(input, gradOutput, lr) - local gradWeight = self.gradWeight - local gradBias = self.gradBias - self.gradWeight = self.weight - self.gradBias = self.bias - self:accGradParameters(input, gradOutput, -lr) - self.gradWeight = gradWeight - self.gradBias = gradBias -end -``` - -As it can be seen, the gradients are accumulated directly into -weights. This assumption may not be true for a module that computes a -nonlinear operation. - -<a name="nn.Module.share"/> -### share(mlp,s1,s2,...,sn) ### - -This function modifies the parameters of the module named -`s1`,..`sn` (if they exist) so that they are shared with (pointers -to) the parameters with the same names in the given module `mlp`. - -The parameters have to be Tensors. This function is typically used if -you want to have modules that share the same weights or biases. - -Note that this function if called on a [Container](#nn.Containers) -module will share the same parameters for all the contained modules as -well. - -Example: -```lua - --- make an mlp -mlp1=nn.Sequential(); -mlp1:add(nn.Linear(100,10)); - --- make a second mlp -mlp2=nn.Sequential(); -mlp2:add(nn.Linear(100,10)); - --- the second mlp shares the bias of the first -mlp2:share(mlp1,'bias'); - --- we change the bias of the first -mlp1:get(1).bias[1]=99; - --- and see that the second one's bias has also changed.. -print(mlp2:get(1).bias[1]) - -``` - - -<a name="nn.Module.clone"/> -### clone(mlp,...) ### - -Creates a deep copy of (i.e. not just a pointer to) the module, -including the current state of its parameters (e.g. weight, biases -etc., if any). - -If arguments are provided to the `clone(...)` function it also calls -[share(...)](#nn.Module.share) with those arguments on the cloned -module after creating it, hence making a deep copy of this module with -some shared parameters. - -Example: -```lua --- make an mlp -mlp1=nn.Sequential(); -mlp1:add(nn.Linear(100,10)); - --- make a copy that shares the weights and biases -mlp2=mlp1:clone('weight','bias'); - --- we change the bias of the first mlp -mlp1:get(1).bias[1]=99; - --- and see that the second one's bias has also changed.. -print(mlp2:get(1).bias[1]) - -``` - -<a name="nn.Module.type"/> -### type(type) ### - -This function converts all the parameters of a module to the given -`type`. The `type` can be one of the types defined for -[torch.Tensor](..:torch:tensor). - -<a name="nn.Module.float"/> -### float() ### - -Convenience method for calling [module:type('torch.FloatTensor')](#nn.Module.type) - -<a name="nn.Module.double"/> -### double() ### - -Convenience method for calling [module:type('torch.DoubleTensor')](#nn.Module.type) - -<a name="nn.Module.cuda"/> -### cuda() ### - -Convenience method for calling [module:type('torch.CudaTensor')](#nn.Module.type) - -<a name="nn.statevars.dok"/> -### State Variables ### - -These state variables are useful objects if one wants to check the guts of -a `Module`. The object pointer is _never_ supposed to change. However, its -contents (including its size if it is a Tensor) are supposed to change. - -In general state variables are -[Tensors](..:torch:tensor). However, some special sub-classes -like [table layers](#nn.TableLayers) contain something else. Please, -refer to each module specification for further information. - -<a name="nn.Module.output"/> -#### output #### - -This contains the output of the module, computed with the last call of -[forward(input)](#nn.Module.forward). - -<a name="nn.Module.gradInput"/> -#### gradInput #### - -This contains the gradients with respect to the inputs of the module, computed with the last call of -[updateGradInput(input, gradOutput)](#nn.Module.updateGradInput). - -### Parameters and gradients w.r.t parameters ### - -Some modules contain parameters (the ones that we actually want to -train!). The name of these parameters, and gradients w.r.t these parameters -are module dependent. - -<a name="nn.Module.parameters"/> -### [{weights}, {gradWeights}] parameters() ### - -This function should returns two tables. One for the learnable -parameters `{weights}` and another for the gradients of the energy -wrt to the learnable parameters `{gradWeights}`. - -Custom modules should override this function if they use learnable -parameters that are stored in tensors. - -<a name="nn.Module.getParameters"/> -### [flatParameters, flatGradParameters] getParameters() ### - -This function returns two tensors. One for the flattened learnable -parameters `flatParameters` and another for the gradients of the energy -wrt to the learnable parameters `flatGradParameters`. - -Custom modules should not override this function. They should instead override [parameters(...)](#nn.Module.parameters) which is, in turn, called by the present function. - -This function will go over all the weights and gradWeights and make them view into a single tensor (one for weights and one for gradWeights). Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network. - -<a name="nn.Containers"/> -## Containers ## - -<a name="nn.Concat"/> -### Concat ### - -```lua -module = nn.Concat(dim) -``` -Concat concatenates the output of one layer of "parallel" modules along the -provided dimension `dim`: they take the same inputs, and their output is -concatenated. -```lua -mlp=nn.Concat(1); -mlp:add(nn.Linear(5,3)) -mlp:add(nn.Linear(5,7)) -print(mlp:forward(torch.randn(5))) -``` -which gives the output: -```lua - 0.7486 - 0.1349 - 0.7924 --0.0371 --0.4794 - 0.3044 --0.0835 --0.7928 - 0.7856 --0.1815 -[torch.Tensor of dimension 10] -``` - - -<a name="nn.Sequential"/> -### Sequential ### - -Sequential provides a means to plug layers together -in a feed-forward fully connected manner. - -E.g. -creating a one hidden-layer multi-layer perceptron is thus just as easy as: -```lua -mlp = nn.Sequential() -mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units -mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function -mlp:add( nn.Linear(25, 1) ) -- 1 output - -print(mlp:forward(torch.randn(10))) -``` -which gives the output: -```lua --0.1815 -[torch.Tensor of dimension 1] -``` - -<a name="nn.Parallel"/> -### Parallel ### - -`module` = `Parallel(inputDimension,outputDimension)` - -Creates a container module that applies its `ith` child module to the `ith` slice of the input Tensor by using [select](..:torch:tensor#torch.tensor.select) -on dimension `inputDimension`. It concatenates the results of its contained modules together along dimension `outputDimension`. - -Example: -```lua - mlp=nn.Parallel(2,1); -- iterate over dimension 2 of input - mlp:add(nn.Linear(10,3)); -- apply to first slice - mlp:add(nn.Linear(10,2)) -- apply to first second slice - print(mlp:forward(torch.randn(10,2))) -``` -gives the output: -```lua --0.5300 --1.1015 - 0.7764 - 0.2819 --0.6026 -[torch.Tensor of dimension 5] -``` - -A more complicated example: -```lua - -mlp=nn.Sequential(); -c=nn.Parallel(1,2) -for i=1,10 do - local t=nn.Sequential() - t:add(nn.Linear(3,2)) - t:add(nn.Reshape(2,1)) - c:add(t) -end -mlp:add(c) - -pred=mlp:forward(torch.randn(10,3)) -print(pred) - -for i=1,10000 do -- Train for a few iterations - x=torch.randn(10,3); - y=torch.ones(2,10); - pred=mlp:forward(x) - - criterion= nn.MSECriterion() - local err=criterion:forward(pred,y) - local gradCriterion = criterion:backward(pred,y); - mlp:zeroGradParameters(); - mlp:backward(x, gradCriterion); - mlp:updateParameters(0.01); - print(err) -end -``` -<a name="nn.simplelayers.dok"/> -## Simple layers ## - -<a name="nn.Linear"/> -### Linear ### - -`module` = `Linear(inputDimension,outputDimension)` - -Applies a linear transformation to the incoming data, i.e. //y= -Ax+b//. The `input` tensor given in `forward(input)` must be -either a vector (1D tensor) or matrix (2D tensor). If the input is a -matrix, then each row is assumed to be an input sample of given batch. - -You can create a layer in the following way: -```lua - module= nn.Linear(10,5) -- 10 inputs, 5 outputs -``` -Usually this would be added to a network of some kind, e.g.: -```lua - mlp = nn.Sequential(); - mlp:add(module) -``` -The weights and biases (_A_ and _b_) can be viewed with: -```lua - print(module.weight) - print(module.bias) -``` -The gradients for these weights can be seen with: -```lua - print(module.gradWeight) - print(module.gradBias) -``` -As usual with `nn` modules, -applying the linear transformation is performed with: -```lua - x=torch.Tensor(10) -- 10 inputs - y=module:forward(x) -``` - -<a name="nn.SparseLinear"/> -### SparseLinear ### - -`module` = `SparseLinear(inputDimension,outputDimension)` - -Applies a linear transformation to the incoming sparse data, i.e. -_y= Ax+b_. The `input` tensor given in `forward(input)` must -be a sparse vector represented as 2D tensor of the form -torch.Tensor(N, 2) where the pairs represent indices and values. -The SparseLinear layer is useful when the number of input -dimensions is very large and the input data is sparse. - -You can create a sparse linear layer in the following way: - -```lua - module= nn.SparseLinear(10000,2) -- 10000 inputs, 2 outputs -``` -The sparse linear module may be used as part of a larger network, -and apart from the form of the input, -[SparseLinear](#nn.SparseLinear) -operates in exactly the same way as the [Linear](#nn.Linear) layer. - -A sparse input vector may be created as so.. -```lua - - x=torch.Tensor({{1, 0.1},{2, 0.3},{10, 0.3},{31, 0.2}}) - - print(x) - - 1.0000 0.1000 - 2.0000 0.3000 - 10.0000 0.3000 - 31.0000 0.2000 -[torch.Tensor of dimension 4x2] - -``` - -The first column contains indices, the second column contains -values in a a vector where all other elements are zeros. The -indices should not exceed the stated dimesions of the input to the -layer (10000 in the example). - -<a name="nn.Abs"/> -### Abs ### - -`module` = `Abs()` - -`output = abs(input)`. - -```lua -m=nn.Abs() -ii=torch.linspace(-5,5) -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` - -![](doc/abs.png) - -### Add ### -![](anchor:nn.Add) - -`module` = `Add(inputDimension,scalar)` - -Applies a bias term to the incoming data, i.e. -_y_i= x_i + b_i, or if _scalar=true_ then uses a single bias term, -_y_i= x_i + b. - -Example: -```lua -y=torch.Tensor(5); -mlp=nn.Sequential() -mlp:add(nn.Add(5)) - -function gradUpdate(mlp, x, y, criterion, learningRate) - local pred = mlp:forward(x) - local err = criterion:forward(pred, y) - local gradCriterion = criterion:backward(pred, y) - mlp:zeroGradParameters() - mlp:backward(x, gradCriterion) - mlp:updateParameters(learningRate) - return err -end - -for i=1,10000 do - x=torch.rand(5) - y:copy(x); - for i=1,5 do y[i]=y[i]+i; end - err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) -end -print(mlp:get(1).bias) -``` -gives the output: -```lua - 1.0000 - 2.0000 - 3.0000 - 4.0000 - 5.0000 -[torch.Tensor of dimension 5] -``` -i.e. the network successfully learns the input _x_ has been shifted -to produce the output _y_. - - -<a name="nn.Mul"/> -### Mul ### - -`module` = `Mul(inputDimension)` - -Applies a _single_ scaling factor to the incoming data, i.e. -_y= w x_, where _w_ is a scalar. - -Example: -```lua -y=torch.Tensor(5); -mlp=nn.Sequential() -mlp:add(nn.Mul(5)) - -function gradUpdate(mlp, x, y, criterion, learningRate) - local pred = mlp:forward(x) - local err = criterion:forward(pred,y) - local gradCriterion = criterion:backward(pred,y); - mlp:zeroGradParameters(); - mlp:backward(x, gradCriterion); - mlp:updateParameters(learningRate); - return err -end - - -for i=1,10000 do - x=torch.rand(5) - y:copy(x); y:mul(math.pi); - err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) -end -print(mlp:get(1).weight) -``` -gives the output: -```lua - 3.1416 -[torch.Tensor of dimension 1] -``` -i.e. the network successfully learns the input `x` has been scaled by -pi. - -### CMul ### -![](anchor:nn.CMul) - -`module` = `CMul(inputDimension)` - -Applies a component-wise multiplication to the incoming data, i.e. -`y_i` = `w_i` =x_i=. - -Example: -```lua -mlp=nn.Sequential() -mlp:add(nn.CMul(5)) - -y=torch.Tensor(5); -sc=torch.Tensor(5); for i=1,5 do sc[i]=i; end -- scale input with this - -function gradUpdate(mlp,x,y,criterion,learningRate) - local pred = mlp:forward(x) - local err = criterion:forward(pred,y) - local gradCriterion = criterion:backward(pred,y); - mlp:zeroGradParameters(); - mlp:backward(x, gradCriterion); - mlp:updateParameters(learningRate); - return err -end - -for i=1,10000 do - x=torch.rand(5) - y:copy(x); y:cmul(sc); - err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) -end -print(mlp:get(1).weight) -``` -gives the output: -```lua - 1.0000 - 2.0000 - 3.0000 - 4.0000 - 5.0000 -[torch.Tensor of dimension 5] -``` -i.e. the network successfully learns the input _x_ has been scaled by -those scaling factors to produce the output _y_. - - -<a name="nn.Max"/> -### Max ### - -`module` = `Max(dimension)` - -Applies a max operation over dimension `dimension`. -Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` -then an `nxq` matrix would be output. - - -<a name="nn.Min"/> -### Min ### - -`module` = `Min(dimension)` - -Applies a min operation over dimension `dimension`. -Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` -then an `nxq` matrix would be output. - - -<a name="nn.Mean"/> -### Mean ### - -`module` = `Mean(dimension)` - -Applies a mean operation over dimension `dimension`. -Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` -then an `nxq` matrix would be output. - -<a name="nn.Sum"/> -### Sum ### - -`module` = `Sum(dimension)` - -Applies a sum operation over dimension `dimension`. -Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` -then an `nxq` matrix would be output. - - -<a name="nn.Euclidean"/> -### Euclidean ### - -`module` = `Euclidean(inputDimension,outputDimension)` - -Outputs the Euclidean distance of the input to `outputDimension` centers, -i.e. this layer has the weights `c_i`, `i` = `1`,..,`outputDimension`, where -`c_i` are vectors of dimension `inputDimension`. Output dimension `j` is -`|| c_j - x ||`, where `x` is the input. - -<a name="nn.WeightedEuclidean"/> -### WeightedEuclidean ### - -`module` = `WeightedEuclidean(inputDimension,outputDimension)` - -This module is similar to [Euclidian](#nn.Euclidian), but -additionally learns a separate diagonal covariance matrix across the -features of the input space for each center. - - -<a name="nn.Copy"/> -### Copy ### - -`module` = `Copy(inputType,outputType)` - -This layer copies the input to output with type casting from input -type from `inputType` to `outputType`. - - -<a name="nn.Narrow"/> -### Narrow ### - -`module` = `Narrow(dimension, offset, length)` - -Narrow is application of -[narrow](..:torch:tensor:#torch.Tensor.narrow) operation in a -module. - -<a name="nn.Replicate"/> -### Replicate ### - -`module` = `Replicate(nFeature)` - -This class creates an output where the input is replicated -`nFeature` times along its first dimension. There is no memory -allocation or memory copy in this module. It sets the -[stride](..:torch:tensor#torch.Tensor.stride) along the first -dimension to zero. - -```lua -torch> x=torch.linspace(1,5,5) -torch> =x - 1 - 2 - 3 - 4 - 5 -[torch.DoubleTensor of dimension 5] - -torch> m=nn.Replicate(3) -torch> o=m:forward(x) -torch> =o - 1 2 3 4 5 - 1 2 3 4 5 - 1 2 3 4 5 -[torch.DoubleTensor of dimension 3x5] - -torch> x:fill(13) -torch> =x - 13 - 13 - 13 - 13 - 13 -[torch.DoubleTensor of dimension 5] - -torch> =o - 13 13 13 13 13 - 13 13 13 13 13 - 13 13 13 13 13 -[torch.DoubleTensor of dimension 3x5] - -``` - - -<a name="nn.Reshape"/> -### Reshape ### - -`module` = `Reshape(dimension1, dimension2, ..)` - -Reshapes an `nxpxqx..` Tensor into a `dimension1xdimension2x...` Tensor, -taking the elements column-wise. - -Example: -```lua -> x=torch.Tensor(4,4) -> for i=1,4 do -> for j=1,4 do -> x[i][j]=(i-1)*4+j; -> end -> end -> print(x) - - 1 2 3 4 - 5 6 7 8 - 9 10 11 12 - 13 14 15 16 -[torch.Tensor of dimension 4x4] - -> print(nn.Reshape(2,8):forward(x)) - - 1 9 2 10 3 11 4 12 - 5 13 6 14 7 15 8 16 -[torch.Tensor of dimension 2x8] - -> print(nn.Reshape(8,2):forward(x)) - - 1 3 - 5 7 - 9 11 - 13 15 - 2 4 - 6 8 - 10 12 - 14 16 -[torch.Tensor of dimension 8x2] - -> print(nn.Reshape(16):forward(x)) - - 1 - 5 - 9 - 13 - 2 - 6 - 10 - 14 - 3 - 7 - 11 - 15 - 4 - 8 - 12 - 16 -[torch.Tensor of dimension 16] - - -``` - - -<a name="nn.Select"/> -### Select ### - -Selects a dimension and index of a `nxpxqx..` Tensor. - -Example: -```lua -mlp=nn.Sequential(); -mlp:add(nn.Select(1,3)) - -x=torch.randn(10,5) -print(x) -print(mlp:forward(x)) -``` -gives the output: -```lua - 0.9720 -0.0836 0.0831 -0.2059 -0.0871 - 0.8750 -2.0432 -0.1295 -2.3932 0.8168 - 0.0369 1.1633 0.6483 1.2862 0.6596 - 0.1667 -0.5704 -0.7303 0.3697 -2.2941 - 0.4794 2.0636 0.3502 0.3560 -0.5500 --0.1898 -1.1547 0.1145 -1.1399 0.1711 --1.5130 1.4445 0.2356 -0.5393 -0.6222 --0.6587 0.4314 1.1916 -1.4509 1.9400 - 0.2733 1.0911 0.7667 0.4002 0.1646 - 0.5804 -0.5333 1.1621 1.5683 -0.1978 -[torch.Tensor of dimension 10x5] - - 0.0369 - 1.1633 - 0.6483 - 1.2862 - 0.6596 -[torch.Tensor of dimension 5] -``` - -This can be used in conjunction with [Concat](#nn.Concat) -to emulate the behavior -of [Parallel](#nn.Parallel), or to select various parts of an input Tensor to -perform operations on. Here is a fairly complicated example: -```lua - -mlp=nn.Sequential(); -c=nn.Concat(2) -for i=1,10 do - local t=nn.Sequential() - t:add(nn.Select(1,i)) - t:add(nn.Linear(3,2)) - t:add(nn.Reshape(2,1)) - c:add(t) -end -mlp:add(c) - -pred=mlp:forward(torch.randn(10,3)) -print(pred) - -for i=1,10000 do -- Train for a few iterations - x=torch.randn(10,3); - y=torch.ones(2,10); - pred=mlp:forward(x) - - criterion= nn.MSECriterion() - err=criterion:forward(pred,y) - gradCriterion = criterion:backward(pred,y); - mlp:zeroGradParameters(); - mlp:backward(x, gradCriterion); - mlp:updateParameters(0.01); - print(err) -end -``` - -<a name="nn.Exp"/> -### Exp ### - -Applies the `exp` function element-wise to the input Tensor, -thus outputting a Tensor of the same dimension. -```lua -ii=torch.linspace(-2,2) -m=nn.Exp() -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/exp.png) - - -<a name="nn.Square"/> -### Square ### - -Takes the square of each element. - -```lua -ii=torch.linspace(-5,5) -m=nn.Square() -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/square.png) - -<a name="nn.Sqrt"/> -### Sqrt ### - -Takes the square root of each element. - -```lua -ii=torch.linspace(0,5) -m=nn.Sqrt() -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/sqrt.png) - -<a name="nn.Power"/> -### Power ### - -`module` = `Power(p)` - -Raises each element to its `pth` power. - -```lua -ii=torch.linspace(0,2) -m=nn.Power(1.25) -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/power.png) - -<a name="nn.transfer.dok"/> -## Transfer Function Layers ## - -<a name="nn.HardTanh"/> -### HardTanh ### - -Applies the `HardTanh` function element-wise to the input Tensor, -thus outputting a Tensor of the same dimension. - -`HardTanh` is defined as: - - * `f(x)` = `1, if x >` `1,` - * `f(x)` = `-1, if x <` `-1,` - * `f(x)` = `x,` `otherwise.` - -```lua -ii=torch.linspace(-2,2) -m=nn.HardTanh() -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/htanh.png) - - -<a name="nn.HardShrink"/> -### HardShrink ### - -`module = nn.HardShrink(lambda)` - -Applies the hard shrinkage function element-wise to the input -[Tensor](..:torch:Tensor). The output is the same size as the input. - -`HardShrinkage` operator is defined as: - - * `f(x) = x, if x > lambda` - * `f(x) = -x, if < -lambda` - * `f(x) = 0, otherwise` - -```lua -ii=torch.linspace(-2,2) -m=nn.HardShrink(0.85) -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/hshrink.png) - -<a name="nn.SoftShrink"/> -### SoftShrink ### - -`module = nn.SoftShrink(lambda)` - -Applies the hard shrinkage function element-wise to the input -[Tensor](..:torch:Tensor). The output is the same size as the input. - -`HardShrinkage` operator is defined as: - - * `f(x) = x-lambda, if x > lambda` - * `f(x) = -x+lambda, if < -lambda` - * `f(x) = 0, otherwise` - -```lua -ii=torch.linspace(-2,2) -m=nn.SoftShrink(0.85) -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/sshrink.png) - - -<a name="nn.SoftMax"/> -### SoftMax ### - -Applies the `Softmax` function to an n-dimensional input Tensor, -rescaling them so that the elements of the n-dimensional output Tensor -lie in the range (0,1) and sum to 1. - -`Softmax` is defined as `f_i(x)` = `exp(x_i-shift) / sum_j exp(x_j-shift)`, -where `shift` = `max_i x_i`. - - -```lua -ii=torch.exp(torch.abs(torch.randn(10))) -m=nn.SoftMax() -oo=m:forward(ii) -gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'}) -gnuplot.grid(true) -``` -![](doc/softmax.png) - -<a name="nn.SoftMin"/> -### SoftMin ### - -Applies the `Softmin` function to an n-dimensional input Tensor, -rescaling them so that the elements of the n-dimensional output Tensor -lie in the range (0,1) and sum to 1. - -`Softmin` is defined as `f_i(x)` = `exp(-x_i-shift) / sum_j exp(-x_j-shift)`, -where `shift` = `max_i x_i`. - - -```lua -ii=torch.exp(torch.abs(torch.randn(10))) -m=nn.SoftMin() -oo=m:forward(ii) -gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'}) -gnuplot.grid(true) -``` -![](doc/softmin.png) - -<a name="nn.SoftPlus"/> -### SoftPlus ### - -Applies the `SoftPlus` function to an n-dimensioanl input Tensor. -Can be used to constrain the output of a machine to always be positive. - -`SoftPlus` is defined as `f_i(x)` = `log(1 + exp(x_i)))`. - -```lua -ii=torch.randn(10) -m=nn.SoftPlus() -oo=m:forward(ii) -go=torch.ones(10) -gi=m:backward(ii,go) -gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/softplus.png) - -<a name="nn.SoftSign"/> -### SoftSign ### - -Applies the `SoftSign` function to an n-dimensioanl input Tensor. - -`SoftSign` is defined as `f_i(x) = x_i / (1+|x_i|)` - -```lua -ii=torch.linspace(-5,5) -m=nn.SoftSign() -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/softsign.png) - -<a name="nn.LogSigmoid"/> -### LogSigmoid ### - -Applies the `LogSigmoid` function to an n-dimensional input Tensor. - -`LogSigmoid` is defined as `f_i(x)` = `log(1/(1+ exp(-x_i)))`. - - -```lua -ii=torch.randn(10) -m=nn.LogSigmoid() -oo=m:forward(ii) -go=torch.ones(10) -gi=m:backward(ii,go) -gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/logsigmoid.png) - - -<a name="nn.LogSoftMax"/> -### LogSoftMax ### - -Applies the `LogSoftmax` function to an n-dimensional input Tensor. - -`LogSoftmax` is defined as `f_i(x)` = `log(1/a exp(x_i))`, -where `a` = `sum_j exp(x_j)`. - -```lua -ii=torch.randn(10) -m=nn.LogSoftMax() -oo=m:forward(ii) -go=torch.ones(10) -gi=m:backward(ii,go) -gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/logsoftmax.png) - -<a name="nn.Sigmoid"/> -### Sigmoid ### - -Applies the `Sigmoid` function element-wise to the input Tensor, -thus outputting a Tensor of the same dimension. - -`Sigmoid` is defined as `f(x)` = `1/(1+exp(-x))`. - -```lua -ii=torch.linspace(-5,5) -m=nn.Sigmoid() -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/sigmoid.png) - -<a name="nn.Tanh"/> -### Tanh ### - -Applies the `Tanh` function element-wise to the input Tensor, -thus outputting a Tensor of the same dimension. - -```lua -ii=torch.linspace(-3,3) -m=nn.Tanh() -oo=m:forward(ii) -go=torch.ones(100) -gi=m:backward(ii,go) -gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) -gnuplot.grid(true) -``` -![](doc/tanh.png) - -<a name="nn.convlayers.dok"/> -## Convolutional layers ## - -SpatialConvolution and SpatialSubsampling apply to inputs with -two-dimensional relationships (e.g. images). TemporalConvolution and -TemporalSubsampling apply to sequences with a one-dimensional -relationship (e.g. strings of some kind). - -For spatial convolutional layers, the input is supposed to be 3D. The -first dimension is the number of features, the last two dimenstions -are spatial. - -<a name="nn.SpatialConvolution"/> -### SpatialConvolution ### - -```lua -module = nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH]) -``` - -Applies a 2D convolution over an input image composed of several input planes. The `input` tensor in -`forward(input)` is expected to be a 3D tensor (`nInputPlane x height x width`). - -The parameters are the following: - * `nInputPlane`: The number of expected input planes in the image given into `forward()`. - * `nOutputPlane`: The number of output planes the convolution layer will produce. - * `kW`: The kernel width of the convolution - * `kH`: The kernel height of the convolution - * `dW`: The step of the convolution in the width dimension. Default is `1`. - * `dH`: The step of the convolution in the height dimension. Default is `1`. - -Note that depending of the size of your kernel, several (of the last) -columns or rows of the input image might be lost. It is up to the user to -add proper padding in images. - -If the input image is a 3D tensor `nInputPlane x height x width`, the output image size -will be `nOutputPlane x owidth x oheight` where -```lua -owidth = (width - kW) / dW + 1 -oheight = (height - kH) / dH + 1 . -``` - -The parameters of the convolution can be found in `self.weight` (Tensor of -size `nOutputPlane x nInputPlane x kH x kW`) and `self.bias` (Tensor of -size `nOutputPlane`). The corresponding gradients can be found in -`self.gradWeight` and `self.gradBias`. - -The output value of the layer can be precisely described as: -```lua -output[i][j][k] = bias[k] - + sum_l sum_{s=1}^kW sum_{t=1}^kH weight[s][t][l][k] - * input[dW*(i-1)+s)][dH*(j-1)+t][l] -``` - -<a name="nn.VolumetricConvolution"/> -### VolumetricConvolution ### - -```lua -module = nn.VolumetricConvolution(nInputPlane, nOutputPlane, kT, kW, kH [, dT, dW, dH]) -``` - -Applies a 3D convolution over an input image composed of several input planes. The `input` tensor in -`forward(input)` is expected to be a 4D tensor (`nInputPlane x time x height x width`). - -The parameters are the following: - * `nInputPlane`: The number of expected input planes in the image given into `forward()`. - * `nOutputPlane`: The number of output planes the convolution layer will produce. - * `kT`: The kernel size of the convolution in time - * `kW`: The kernel width of the convolution - * `kH`: The kernel height of the convolution - * `dT`: The step of the convolution in the time dimension. Default is `1`. - * `dW`: The step of the convolution in the width dimension. Default is `1`. - * `dH`: The step of the convolution in the height dimension. Default is `1`. - -Note that depending of the size of your kernel, several (of the last) -columns or rows of the input image might be lost. It is up to the user to -add proper padding in images. - -If the input image is a 4D tensor `nInputPlane x time x height x width`, the output image size -will be `nOutputPlane x otime x owidth x oheight` where -```lua -otime = (time - kT) / dT + 1 -owidth = (width - kW) / dW + 1 -oheight = (height - kH) / dH + 1 . -``` - -The parameters of the convolution can be found in `self.weight` (Tensor of -size `nOutputPlane x nInputPlane x kT x kH x kW`) and `self.bias` (Tensor of -size `nOutputPlane`). The corresponding gradients can be found in -`self.gradWeight` and `self.gradBias`. - -<a name="nn.SpatialConvolutionMap"/> -### SpatialConvolutionMap ### - -```lua -module = nn.SpatialConvolutionMap(connectionMatrix, kW, kH, [dW], [dH]) -``` - -This class is a generalization of -[nn.SpatialConvolution](#nn.SpatialConvolution). It uses a geenric -connection table between input and output features. The -[nn.SpatialConvolution](#nn.SpatialConvolution) is equivalent to -using a [full connection table](#nn.tables.full). One can specify -different types of connection tables. - -<a name="nn.tables.full"/> -#### Full Connection Table #### - -`table = nn.tables.full(nin,nout)` - -This is a precomputed table that specifies connections between every -input and output node. - -<a name="nn.tables.onetoone"/> -#### One to One Connection Table #### - -`table = nn.tables.oneToOne(n)` - -This is a precomputed table that specifies a single connection to each -output node from corresponding input node. - -<a name="nn.tables.random"/> -#### Random Connection Table #### - -`table = nn.tables.random(nin,nout, nto)` - -This table is randomly populated such that each output unit has -`nto` incoming connections. The algorihtm tries to assign uniform -number of outgoing connections to each input node if possible. - -<a name="nn.SpatialLPPooling"/> -### SpatialLPPooling ### - -```lua -module = nn.SpatialLPPooling(nInputPlane, pnorm, kW, kH, [dW], [dH]) -``` - -Computes the `p` norm in a convolutional manner on a set of 2D input planes. - -<a name="nn.SpatialMaxPooling"/> -### SpatialMaxPooling ### - -```lua -module = nn.SpatialMaxPooling(kW, kH [, dW, dH]) -``` - -Applies 2D max-pooling operation in `kWxkH` regions by step size -`dWxdH` steps. The number of output features is equal to the number of -input planes. - -<a name="nn.VolumetricMaxPooling"/> -### VolumetricMaxPooling ### - -```lua -module = nn.VolumetricMaxPooling(kT, kW, kH [, dT, dW, dH]) -``` - -Applies 3D max-pooling operation in `kTxkWxkH` regions by step size -`dTxdWxdH` steps. The number of output features is equal to the number of -input planes. - -<a name="nn.SpatialSubSampling"/> -### SpatialSubSampling ### - -```lua -module = nn.SpatialSubSampling(nInputPlane, kW, kH, [dW], [dH]) -``` - -Applies a 2D sub-sampling over an input image composed of several input planes. The `input` tensor in -`forward(input)` is expected to be a 3D tensor (`nInputPlane x height x width`). The number of output -planes will be the same as `nInputPlane`. - -The parameters are the following: - * `nInputPlane`: The number of expected input planes in the image given into `forward()`. - * `kW`: The kernel width of the sub-sampling - * `kH`: The kernel height of the sub-sampling - * `dW`: The step of the sub-sampling in the width dimension. Default is `1`. - * `dH`: The step of the sub-sampling in the height dimension. Default is `1`. - -Note that depending of the size of your kernel, several (of the last) -columns or rows of the input image might be lost. It is up to the user to -add proper padding in images. - -If the input image is a 3D tensor `nInputPlane x height x width`, the output image size -will be `nInputPlane x oheight x owidth` where -```lua -owidth = (width - kW) / dW + 1 -oheight = (height - kH) / dH + 1 . -``` - -The parameters of the sub-sampling can be found in `self.weight` (Tensor of -size `nInputPlane`) and `self.bias` (Tensor of size `nInputPlane`). The -corresponding gradients can be found in `self.gradWeight` and -`self.gradBias`. - -The output value of the layer can be precisely described as: -```lua -output[i][j][k] = bias[k] - + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k] -``` - -<a name="nn.SpatialZeroPadding"/> -### SpatialZeroPadding ### - -```lua -module = nn.SpatialZeroPadding(padLeft, padRight, padTop, padBottom) -``` - -Each feature map of a given input is padded with specified number of -zeros. If padding values are negative, then input is cropped. - -<a name="nn.SpatialSubtractiveNormalization"/> -### SpatialSubtractiveNormalization ### - -```lua -module = nn.SpatialSubtractiveNormalization(ninputplane, kernel) -``` - -Applies a spatial subtraction operation on a series of 2D inputs using -`kernel` for computing the weighted average in a neighborhood. The -neighborhood is defined for a local spatial region that is the size as -kernel and across all features. For a an input image, since there is -only one feature, the region is only spatial. For an RGB image, the -weighted anerage is taken over RGB channels and a spatial region. - -If the `kernel` is 1D, then it will be used for constructing and seperable -2D kernel. The operations will be much more efficient in this case. - -The kernel is generally chosen as a gaussian when it is believed that -the correlation of two pixel locations decrease with increasing -distance. On the feature dimension, a uniform average is used since -the weighting across features is not known. - -For this example we use an external package -[image](http://www.github.com/clementfarabet/lua---image/) - -```lua -require 'image' -require 'nn' -lena = image.rgb2y(image.lena()) -ker = torch.ones(11) -m=nn.SpatialSubtractiveNormalization(1,ker) -processed = m:forward(lena) -w1=image.display(lena) -w2=image.display(processed) -``` -![](lena.jpg)![](lenap.jpg) - -<a name="nn.TemporalConvolution"/> -### TemporalConvolution ### - -```lua -module = nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW]) -``` - -Applies a 1D convolution over an input sequence composed of `nInputFrame` frames. The `input` tensor in -`forward(input)` is expected to be a 2D tensor (`nInputFrame x inputFrameSize`) or a 3D tensor (`nBatchFrame x nInputFrame x inputFrameSize`). - -The parameters are the following: - * `inputFrameSize`: The input frame size expected in sequences given into `forward()`. - * `outputFrameSize`: The output frame size the convolution layer will produce. - * `kW`: The kernel width of the convolution - * `dW`: The step of the convolution. Default is `1`. - -Note that depending of the size of your kernel, several (of the last) -frames of the sequence might be lost. It is up to the user to add proper padding frames in the input -sequences. - -If the input sequence is a 2D tensor of dimension `nInputFrame x inputFrameSize`, the output sequence will be -`nOutputFrame x outputFrameSize` where -```lua -nOutputFrame = (nInputFrame - kW) / dW + 1 -``` - -If the input sequence is a 3D tensor of dimension `nBatchFrame x nInputFrame x inputFrameSize`, the output sequence will be -`nBatchFrame x nOutputFrame x outputFrameSize`. - -The parameters of the convolution can be found in `self.weight` (Tensor of -size `outputFrameSize x (inputFrameSize x kW) `) and `self.bias` (Tensor of -size `outputFrameSize`). The corresponding gradients can be found in -`self.gradWeight` and `self.gradBias`. - -For a 2D input, the output value of the layer can be precisely described as: -```lua -output[t][i] = bias[i] - + sum_j sum_{k=1}^kW weight[i][j][k] - * input[dW*(t-1)+k)][j] -``` - -Here is a simple example: - -```lua -inp=5; -- dimensionality of one sequence element -outp=1; -- number of derived features for one sequence element -kw=1; -- kernel only operates on one sequence element per step -dw=1; -- we step once and go on to the next sequence element - -mlp=nn.TemporalConvolution(inp,outp,kw,dw) - -x=torch.rand(7,inp) -- a sequence of 7 elements -print(mlp:forward(x)) -``` -which gives: -```lua --0.9109 --0.9872 --0.6808 --0.9403 --0.9680 --0.6901 --0.6387 -[torch.Tensor of dimension 7x1] -``` - -This is equivalent to: -```lua -weights=torch.reshape(mlp.weight,inp) -- weights applied to all -bias= mlp.bias[1]; -for i=1,x:size(1) do -- for each sequence element - element= x[i]; -- features of ith sequence element - print(element:dot(weights) + bias) -end -``` -which gives: -```lua --0.91094998687717 --0.98721705771773 --0.68075004276185 --0.94030132495887 --0.96798754116609 --0.69008470895581 --0.63871422284166 -``` - -<a name="nn.TemporalMaxPooling"/> -### TemporalMaxPooling ### - -```lua -module = nn.TemporalMaxPooling(kW, [dW]) -``` - -Applies 1D max-pooling operation in `kW` regions by step size -`dW` steps. Input sequence composed of `nInputFrame` frames. The `input` tensor in -`forward(input)` is expected to be a 2D tensor (`nInputFrame x inputFrameSize`) -or a 3D tensor (`nBatchFrame x nInputFrame x inputFrameSize`). - -If the input sequence is a 2D tensor of dimension `nInputFrame x inputFrameSize`, the output sequence will be -`nOutputFrame x inputFrameSize` where -```lua -nOutputFrame = (nInputFrame - kW) / dW + 1 -``` - -<a name="nn.TemporalSubSampling"/> -### TemporalSubSampling ### - -```lua -module = nn.TemporalSubSampling(inputFrameSize, kW, [dW]) -``` - -Applies a 1D sub-sampling over an input sequence composed of `nInputFrame` frames. The `input` tensor in -`forward(input)` is expected to be a 2D tensor (`nInputFrame x inputFrameSize`). The output frame size -will be the same as the input one (`inputFrameSize`). - -The parameters are the following: - * `inputFrameSize`: The input frame size expected in sequences given into `forward()`. - * `kW`: The kernel width of the sub-sampling - * `dW`: The step of the sub-sampling. Default is `1`. - -Note that depending of the size of your kernel, several (of the last) -frames of the sequence might be lost. It is up to the user to add proper padding frames in the input -sequences. - -If the input sequence is a 2D tensor `nInputFrame x inputFrameSize`, the output sequence will be -`inputFrameSize x nOutputFrame` where -```lua -nOutputFrame = (nInputFrame - kW) / dW + 1 -``` - -The parameters of the sub-sampling can be found in `self.weight` (Tensor of -size `inputFrameSize`) and `self.bias` (Tensor of -size `inputFrameSize`). The corresponding gradients can be found in -`self.gradWeight` and `self.gradBias`. - -The output value of the layer can be precisely described as: -```lua -output[i][t] = bias[i] + weight[i] * sum_{k=1}^kW input[i][dW*(t-1)+k)] -``` - -<a name="nn.LookupTable"/> -### LookupTable ### - -```lua -module = nn.LookupTable(nIndex, sizes) -``` -or -```lua -module = nn.LookupTable(nIndex, size1, [size2], [size3], ...) -``` - -This layer is a particular case of a convolution, where the width of the convolution would be `1`. -When calling `forward(input)`, it assumes `input` is a 1D or 2D tensor filled with indices. -If the input is a matrix, then each row is assumed to be an input sample of given batch. Indices start -at `1` and can go up to `nIndex`. For each index, it outputs a corresponding `Tensor` of size -specified by `sizes` (a `LongStorage`) or `size1 x size2 x...`. - -Given a 1D input, the output tensors are concatenated, -generating a `n x size1 x size2 x ... x sizeN` tensor, where `n` -is the size of a 1D `input` tensor. - -Again with a 1D input, when only `size1` is provided, the `forward(input)` is equivalent to -performing the following matrix-matrix multiplication in an efficient manner: -```lua -M P -``` -where `M` is a 2D matrix `size1 x nIndex` containing the parameters of the lookup-table and -`P` is a 2D matrix, where each column vector `i` is a zero vector except at index `input[i]` where it is `1`. - -1D example: -```lua - -- a lookup table containing 10 tensors of size 3 - module = nn.LookupTable(10, 3) - - input = torch.Tensor{1,2,1,10} - print(module:forward(input)) -``` - -Outputs something like: -```lua --1.4415 -0.1001 -0.1708 --0.6945 -0.4350 0.7977 --1.4415 -0.1001 -0.1708 --0.0745 1.9275 1.0915 -[torch.DoubleTensor of dimension 4x3] -``` -Note that the first row vector is the same as the 3rd one! - -Given a 2D input tensor of size `m x n`, the output is a `m x n x size1 x size2 x ... x sizeN` -tensor, where `m` is the number of samples in -the batch and `n` is the number of indices per sample. - -2D example: -```lua - -- a lookup table containing 10 tensors of size 3 - module = nn.LookupTable(10, 3) - - -- a batch of 2 samples of 4 indices each - input = torch.Tensor({{1,2,4,5},{4,3,2,10}}) - print(module:forward(input)) -``` - -Outputs something like: -```lua -(1,.,.) = - -0.0570 -1.5354 1.8555 - -0.9067 1.3392 0.6275 - 1.9662 0.4645 -0.8111 - 0.1103 1.7811 1.5969 - -(2,.,.) = - 1.9662 0.4645 -0.8111 - 0.0026 -1.4547 -0.5154 - -0.9067 1.3392 0.6275 - -0.0193 -0.8641 0.7396 -[torch.DoubleTensor of dimension 2x4x3] -``` - - -<a name="nn.TableLayers"/> -## Layers for manipulating tables ## - -This set of modules allows the manipulation of Tables -through the layers of a neural network. -This allows one to build very rich architectures. - -Table-based modules work by supporting forward and backward methods that can accept -tables as inputs. It turns out that the usual [Sequential](#nn.Sequential) module can do this, so all that is needed is other child modules that take advantage of such tables. -```lua -mlp = nn.Sequential(); -t={x,y,z} -pred=mlp:forward(t) -pred=mlp:forward{x,y,z} -- This is equivalent to the line before -``` - -<a name="nn.ConcatTable"/> -### ConcatTable ### - -ConcatTable is a container module that applies each member module to -the same input Tensor. - -Example: -```lua -mlp= nn.ConcatTable() -mlp:add(nn.Linear(5,2)) -mlp:add(nn.Linear(5,3)) - -pred=mlp:forward(torch.randn(5)); -for i,k in pairs(pred) do print(i,k); end -``` -which gives the output: -```lua -1 --0.4073 - 0.0110 -[torch.Tensor of dimension 2] - -2 - 0.0027 --0.0598 --0.1189 -[torch.Tensor of dimension 3] -``` - -<a name="nn.ParallelTable"/> -### ParallelTable ### - -ParallelTable is a container module that, in its `forward` method, applies the `ith` member module to the `ith` input, and outputs a table of the set of outputs. - -Example: -```lua -mlp= nn.ParallelTable() -mlp:add(nn.Linear(10,2)) -mlp:add(nn.Linear(5,3)) - -x=torch.randn(10) -y=torch.rand(5) - -pred=mlp:forward{x,y} -for i,k in pairs(pred) do print(i,k); end -``` -which gives the output: -```lua -1 - 0.0331 - 0.7003 -[torch.Tensor of dimension 2] - -2 - 0.0677 --0.1657 --0.7383 -[torch.Tensor of dimension 3] -``` - -<a name="nn.SplitTable"/> -### SplitTable ### - -`module` = `SplitTable(dimension)` - -Creates a module that takes a Tensor as input and outputs several tables, splitting the Tensor along dimension `dimension`. - -Example 1: -```lua -mlp=nn.SplitTable(2) -x=torch.randn(4,3) -pred=mlp:forward(x) -for i,k in pairs(pred) do print(i,k); end -``` -gives the output: -```lua -1 - 1.3885 - 1.3295 - 0.4281 --1.0171 -[torch.Tensor of dimension 4] - -2 --1.1565 --0.8556 --1.0717 --0.8316 -[torch.Tensor of dimension 4] - -3 --1.3678 --0.1709 --0.0191 --2.5871 -[torch.Tensor of dimension 4] -``` - -Example 2: -```lua -mlp=nn.SplitTable(1) -pred=mlp:forward(torch.randn(10,3)) -for i,k in pairs(pred) do print(i,k); end -``` -gives the output: -```lua -1 - 1.6114 - 0.9038 - 0.8419 -[torch.Tensor of dimension 3] - -2 - 2.4742 - 0.2208 - 1.6043 -[torch.Tensor of dimension 3] - -3 - 1.3415 - 0.2984 - 0.2260 -[torch.Tensor of dimension 3] - -4 - 2.0889 - 1.2309 - 0.0983 -[torch.Tensor of dimension 3] -``` - -A more complicated example: -```lua - -mlp=nn.Sequential(); --Create a network that takes a Tensor as input -mlp:add(nn.SplitTable(2)) - c=nn.ParallelTable() --The two Tensors go through two different Linear - c:add(nn.Linear(10,3)) --Layers in Parallel - c:add(nn.Linear(10,7)) -mlp:add(c) --Outputing a table with 2 elements - p=nn.ParallelTable() --These tables go through two more linear layers - p:add(nn.Linear(3,2)) -- separately. - p:add(nn.Linear(7,1)) -mlp:add(p) -mlp:add(nn.JoinTable(1)) --Finally, the tables are joined together and output. - -pred=mlp:forward(torch.randn(10,2)) -print(pred) - -for i=1,100 do -- A few steps of training such a network.. - x=torch.ones(10,2); - y=torch.Tensor(3); y:copy(x:select(2,1,1):narrow(1,1,3)) - pred=mlp:forward(x) - - criterion= nn.MSECriterion() - local err=criterion:forward(pred,y) - local gradCriterion = criterion:backward(pred,y); - mlp:zeroGradParameters(); - mlp:backward(x, gradCriterion); - mlp:updateParameters(0.05); - - print(err) -end -``` - -<a name="nn.JoinTable"/> -### JoinTable ### - -`module` = `JoinTable(dimension)` - -Creates a module that takes a list of Tensors as input and outputs a Tensor by joining them together along dimension `dimension`. - -Example: -```lua -x=torch.randn(5,1) -y=torch.randn(5,1) -z=torch.randn(2,1) - -print(nn.JoinTable(1):forward{x,y}) -print(nn.JoinTable(2):forward{x,y}) -print(nn.JoinTable(1):forward{x,z}) -``` -gives the output: -```lua -1.3965 - 0.5146 --1.5244 --0.9540 - 0.4256 - 0.1575 - 0.4491 - 0.6580 - 0.1784 --1.7362 - - 1.3965 0.1575 - 0.5146 0.4491 --1.5244 0.6580 --0.9540 0.1784 - 0.4256 -1.7362 - - 1.3965 - 0.5146 --1.5244 --0.9540 - 0.4256 --1.2660 - 1.0869 -[torch.Tensor of dimension 7x1] -``` - -A more complicated example: -```lua - -mlp=nn.Sequential(); --Create a network that takes a Tensor as input - c=nn.ConcatTable() --The same Tensor goes through two different Linear - c:add(nn.Linear(10,3)) --Layers in Parallel - c:add(nn.Linear(10,7)) -mlp:add(c) --Outputing a table with 2 elements - p=nn.ParallelTable() --These tables go through two more linear layers - p:add(nn.Linear(3,2)) -- separately. - p:add(nn.Linear(7,1)) -mlp:add(p) -mlp:add(nn.JoinTable(1)) --Finally, the tables are joined together and output. - -pred=mlp:forward(torch.randn(10)) -print(pred) - -for i=1,100 do -- A few steps of training such a network.. - x=torch.ones(10); - y=torch.Tensor(3); y:copy(x:narrow(1,1,3)) - pred=mlp:forward(x) - - criterion= nn.MSECriterion() - local err=criterion:forward(pred,y) - local gradCriterion = criterion:backward(pred,y); - mlp:zeroGradParameters(); - mlp:backward(x, gradCriterion); - mlp:updateParameters(0.05); - - print(err) -end -``` - -<a name="nn.Identity"/> -### Identity ### - -`module` = `Identity()` - -Creates a module that returns whatever is input to it as output. -This is useful when combined with the module -[ParallelTable](#nn.ParallelTable) -in case you do not wish to do anything to one of the input Tensors. -Example: -```lua -mlp=nn.Identity() -print(mlp:forward(torch.ones(5,2))) -``` -gives the output: -```lua - 1 1 - 1 1 - 1 1 - 1 1 - 1 1 -[torch.Tensor of dimension 5x2] -``` - -Here is a more useful example, where one can implement a network which also computes a Criterion using this module: -```lua -pred_mlp=nn.Sequential(); -- A network that makes predictions given x. -pred_mlp:add(nn.Linear(5,4)) -pred_mlp:add(nn.Linear(4,3)) - -xy_mlp=nn.ParallelTable();-- A network for predictions and for keeping the -xy_mlp:add(pred_mlp) -- true label for comparison with a criterion -xy_mlp:add(nn.Identity()) -- by forwarding both x and y through the network. - -mlp=nn.Sequential(); -- The main network that takes both x and y. -mlp:add(xy_mlp) -- It feeds x and y to parallel networks; -cr=nn.MSECriterion(); -cr_wrap=nn.CriterionTable(cr) -mlp:add(cr_wrap) -- and then applies the criterion. - -for i=1,100 do -- Do a few training iterations - x=torch.ones(5); -- Make input features. - y=torch.Tensor(3); - y:copy(x:narrow(1,1,3)) -- Make output label. - err=mlp:forward{x,y} -- Forward both input and output. - print(err) -- Print error from criterion. - - mlp:zeroGradParameters(); -- Do backprop... - mlp:backward({x, y} ); - mlp:updateParameters(0.05); -end -``` - -<a name="nn.PairwiseDistance"/> -### PairwiseDistance ### - -`module` = `PairwiseDistance(p)` creates a module that takes a table of two vectors as input and outputs the distance between them using the `p`-norm. - -Example: -```lua -mlp_l1=nn.PairwiseDistance(1) -mlp_l2=nn.PairwiseDistance(2) -x=torch.Tensor(1,2,3) -y=torch.Tensor(4,5,6) -print(mlp_l1:forward({x,y})) -print(mlp_l2:forward({x,y})) -``` -gives the output: -```lua - 9 -[torch.Tensor of dimension 1] - - 5.1962 -[torch.Tensor of dimension 1] -``` - -A more complicated example: -```lua --- imagine we have one network we are interested in, it is called "p1_mlp" -p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) - --- But we want to push examples towards or away from each other --- so we make another copy of it called p2_mlp --- this *shares* the same weights via the set command, but has its own set of temporary gradient storage --- that's why we create it again (so that the gradients of the pair don't wipe each other) -p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2)) -p2_mlp:get(1).weight:set(p1_mlp:get(1).weight) -p2_mlp:get(1).bias:set(p1_mlp:get(1).bias) - --- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp -prl = nn.ParallelTable() -prl:add(p1_mlp) -prl:add(p2_mlp) - --- now we define our top level network that takes this parallel table and computes the pairwise distance betweem --- the pair of outputs -mlp= nn.Sequential() -mlp:add(prl) -mlp:add(nn.PairwiseDistance(1)) - --- and a criterion for pushing together or pulling apart pairs -crit=nn.HingeEmbeddingCriterion(1) - --- lets make two example vectors -x=torch.rand(5) -y=torch.rand(5) - - --- Use a typical generic gradient update function -function gradUpdate(mlp, x, y, criterion, learningRate) -local pred = mlp:forward(x) -local err = criterion:forward(pred, y) -local gradCriterion = criterion:backward(pred, y) -mlp:zeroGradParameters() -mlp:backward(x, gradCriterion) -mlp:updateParameters(learningRate) -end - --- push the pair x and y together, notice how then the distance between them given --- by print(mlp:forward({x,y})[1]) gets smaller -for i=1,10 do -gradUpdate(mlp,{x,y},1,crit,0.01) -print(mlp:forward({x,y})[1]) -end - - --- pull apart the pair x and y, notice how then the distance between them given --- by print(mlp:forward({x,y})[1]) gets larger - -for i=1,10 do -gradUpdate(mlp,{x,y},-1,crit,0.01) -print(mlp:forward({x,y})[1]) -end - -``` - -<a name="nn.DotProduct"/> -### DotProduct ### - -`module` = `DotProduct()` creates a module that takes a table of two vectors as input and outputs the dot product between them. - -Example: -```lua -mlp=nn.DotProduct() -x=torch.Tensor(1,2,3) -y=torch.Tensor(4,5,6) -print(mlp:forward({x,y})) -``` -gives the output: -```lua - 32 -[torch.Tensor of dimension 1] -``` - - -A more complicated example: -```lua - --- Train a ranking function so that mlp:forward({x,y},{x,z}) returns a number --- which indicates whether x is better matched with y or z (larger score = better match), or vice versa. - -mlp1=nn.Linear(5,10) -mlp2=mlp1:clone('weight','bias') - -prl=nn.ParallelTable(); -prl:add(mlp1); prl:add(mlp2) - -mlp1=nn.Sequential() -mlp1:add(prl) -mlp1:add(nn.DotProduct()) - -mlp2=mlp1:clone('weight','bias') - -mlp=nn.Sequential() -prla=nn.ParallelTable() -prla:add(mlp1) -prla:add(mlp2) -mlp:add(prla) - -x=torch.rand(5); -y=torch.rand(5) -z=torch.rand(5) - - -print(mlp1:forward{x,x}) -print(mlp1:forward{x,y}) -print(mlp1:forward{y,y}) - - -crit=nn.MarginRankingCriterion(1); - --- Use a typical generic gradient update function -function gradUpdate(mlp, x, y, criterion, learningRate) - local pred = mlp:forward(x) - local err = criterion:forward(pred, y) - local gradCriterion = criterion:backward(pred, y) - mlp:zeroGradParameters() - mlp:backward(x, gradCriterion) - mlp:updateParameters(learningRate) -end - -inp={{x,y},{x,z}} - -math.randomseed(1) - --- make the pair x and y have a larger dot product than x and z - -for i=1,100 do - gradUpdate(mlp,inp,1,crit,0.05) - o1=mlp1:forward{x,y}[1]; - o2=mlp2:forward{x,z}[1]; - o=crit:forward(mlp:forward{{x,y},{x,z}},1) - print(o1,o2,o) -end - -print "________________**" - --- make the pair x and z have a larger dot product than x and y - -for i=1,100 do - gradUpdate(mlp,inp,-1,crit,0.05) - o1=mlp1:forward{x,y}[1]; - o2=mlp2:forward{x,z}[1]; - o=crit:forward(mlp:forward{{x,y},{x,z}},-1) - print(o1,o2,o) -end -``` - - -<a name="nn.CosineDistance"/> -### CosineDistance ### - -`module` = `CosineDistance()` creates a module that takes a table of two vectors as input and outputs the cosine distance between them. - -Example: -```lua -mlp=nn.CosineDistance() -x=torch.Tensor(1,2,3) -y=torch.Tensor(4,5,6) -print(mlp:forward({x,y})) -``` -gives the output: -```lua - 0.9746 -[torch.Tensor of dimension 1] -``` - -A more complicated example: -```lua - --- imagine we have one network we are interested in, it is called "p1_mlp" -p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) - --- But we want to push examples towards or away from each other --- so we make another copy of it called p2_mlp --- this *shares* the same weights via the set command, but has its own set of temporary gradient storage --- that's why we create it again (so that the gradients of the pair don't wipe each other) -p2_mlp= p1_mlp:clone('weight','bias') - --- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp -prl = nn.ParallelTable() -prl:add(p1_mlp) -prl:add(p2_mlp) - --- now we define our top level network that takes this parallel table and computes the cosine distance betweem --- the pair of outputs -mlp= nn.Sequential() -mlp:add(prl) -mlp:add(nn.CosineDistance()) - - --- lets make two example vectors -x=torch.rand(5) -y=torch.rand(5) - --- Grad update function.. -function gradUpdate(mlp, x, y, learningRate) -local pred = mlp:forward(x) -if pred[1]*y < 1 then - gradCriterion=torch.Tensor(-y) - mlp:zeroGradParameters() - mlp:backward(x, gradCriterion) - mlp:updateParameters(learningRate) -end -end - --- push the pair x and y together, the distance should get larger.. -for i=1,1000 do - gradUpdate(mlp,{x,y},1,0.1) - if ((i%100)==0) then print(mlp:forward({x,y})[1]);end -end - - --- pull apart the pair x and y, the distance should get smaller.. - -for i=1,1000 do - gradUpdate(mlp,{x,y},-1,0.1) - if ((i%100)==0) then print(mlp:forward({x,y})[1]);end -end -``` - - - -<a name="nn.CriterionTable"/> -### CriterionTable ### - -`module` = `CriterionTable(criterion)` - -Creates a module that wraps a Criterion module so that it can accept a Table of inputs. Typically the table would contain two elements: the input and output `x` and `y` that the Criterion compares. - -Example: -```lua -mlp = nn.CriterionTable(nn.MSECriterion()) -x=torch.randn(5) -y=torch.randn(5) -print(mlp:forward{x,x}) -print(mlp:forward{x,y}) -``` -gives the output: -```lua -0 -1.9028918413199 -``` - -Here is a more complex example of embedding the criterion into a network: -```lua - -function table.print(t) - for i,k in pairs(t) do print(i,k); end -end - -mlp=nn.Sequential(); -- Create an mlp that takes input - main_mlp=nn.Sequential(); -- and output using ParallelTable - main_mlp:add(nn.Linear(5,4)) - main_mlp:add(nn.Linear(4,3)) - cmlp=nn.ParallelTable(); - cmlp:add(main_mlp) - cmlp:add(nn.Identity()) -mlp:add(cmlp) -mlp:add(nn.CriterionTable(nn.MSECriterion())) -- Apply the Criterion - -for i=1,20 do -- Train for a few iterations - x=torch.ones(5); - y=torch.Tensor(3); y:copy(x:narrow(1,1,3)) - err=mlp:forward{x,y} -- Pass in both input and output - print(err) - - mlp:zeroGradParameters(); - mlp:backward({x, y} ); - mlp:updateParameters(0.05); -end -``` - -<a name="nn.CAddTable"/> -### CAddTable ### - -Takes a table of tensors and outputs summation of all tensors. - -```lua -ii = {torch.ones(5),torch.ones(5)*2,torch.ones(5)*3} -=ii[1] - 1 - 1 - 1 - 1 - 1 -[torch.DoubleTensor of dimension 5] - -return ii[2] - 2 - 2 - 2 - 2 - 2 -[torch.DoubleTensor of dimension 5] - -return ii[3] - 3 - 3 - 3 - 3 - 3 -[torch.DoubleTensor of dimension 5] - -m=nn.CAddTable() -=m:forward(ii) - 6 - 6 - 6 - 6 - 6 -[torch.DoubleTensor of dimension 5] -``` - - -<a name="nn.CSubTable"/> -### CSubTable ### - -Takes a table with two tensor and returns the component-wise -subtraction between them. - -```lua -m=nn.CSubTable() -=m:forward({torch.ones(5)*2.2,torch.ones(5)}) - 1.2000 - 1.2000 - 1.2000 - 1.2000 - 1.2000 -[torch.DoubleTensor of dimension 5] -``` - -<a name="nn.CMulTable"/> -### CMulTable ### - -Takes a table of tensors and outputs the multiplication of all of them. - -```lua -ii = {torch.ones(5)*2,torch.ones(5)*3,torch.ones(5)*4} -m=nn.CMulTable() -=m:forward(ii) - 24 - 24 - 24 - 24 - 24 -[torch.DoubleTensor of dimension 5] - -``` - -<a name="nn.CDivTable"/> -### CDivTable ### - -Takes a table with two tensor and returns the component-wise -division between them. - -```lua -m=nn.CDivTable() -=m:forward({torch.ones(5)*2.2,torch.ones(5)*4.4}) - 0.5000 - 0.5000 - 0.5000 - 0.5000 - 0.5000 -[torch.DoubleTensor of dimension 5] -``` - -<a name="nn.Criterions"/> -# Criterions # - -Criterions are helpful to train a neural network. Given an input and a -target, they compute a gradient according to a given loss -function. [AbsCriterion](#nn.AbsCriterion) and -[MSECriterion](#nn.MSECriterion) are perfect for regression problems, while -[ClassNLLCriterion](#nn.ClassNLLCriterion) is the criterion of choice when -dealing with classification. - -Criterions are [serializable](..:torch:file#torch.file.serialization). - -<a name="nn.Criterion"/> -## Criterion ## - -This is an abstract class which declares methods defined in all criterions. -This class is [serializable](..:torch:file#torch.file.serialization). - -<a name="nn.Criterion.forward"/> -### [output] forward(input, target) ### - -Given an `input` and a `target`, compute the loss function associated to the criterion and return the -result. In general `input` and `target` are [tensors](..:torch:tensor), but some specific criterions -might require some other type of object. - -The `output` returned should be a scalar in general. - -The state variable [self.output](#nn.Criterion.output) should be updated after a call to `forward()`. - -<a name="nn.Criterion.backward"/> -### [gradInput] backward(input, target) ### - -Given an `input` and a `target`, compute the gradients of the loss function associated to the criterion and -return the result.In general `input`, `target` and `gradInput` are [tensors](..:torch:tensor), but some specific criterions -might require some other type of object. - -The state variable [self.gradInput](#nn.Criterion.gradInput) should be updated after a call to `backward()`. - -<a name="nn.Criterion.output"/> -### State variable: output ### - -State variable which contains the result of the last [forward(input, target)](#nn.Criterion.forward) call. - -<a name="nn.Criterion.gradInput"/> -### State variable: gradInput ### - -State variable which contains the result of the last [backward(input, target)](#nn.Criterion.backward) call. - -<a name="nn.AbsCriterion"/> -## AbsCriterion ## - -```lua -criterion = AbsCriterion() -``` - -Creates a criterion that -measures the mean absolute value between `n` elements in the input `x` -and output `y`: - -`loss(x,y)` = `1/n \sum |x_i-y_i|`. - -If `x` and `y` are `d`-dimensional Tensors with a total of `n` elements, -the sum operation still operates over all the elements, and divides by `n`. - -The division by `n` can be avoided if one sets the internal variable `sizeAverage` to `false`: -```lua -criterion = nn.AbsCriterion() -criterion.sizeAverage = false -``` - -<a name="nn.ClassNLLCriterion"/> -## ClassNLLCriterion ## - -```lua -criterion = ClassNLLCriterion() -``` - -The negative log likelihood criterion. It is useful to train a classication -problem with `n` classes. The `input` given through a `forward()` is -expected to contain _log-probabilities_ of each class: `input` has to be a -1D tensor of size `n`. Obtaining log-probabilities in a neural network is -easily achieved by adding a [LogSoftMax](#nn.LogSoftMax) layer in the last -layer of your neural network. - -This criterion expect a class index (1 to the number of class) as `target` -when calling [forward(input, target)](#nn.CriterionForward) and -[backward(input, target)](#nn.CriterionBackward). - -The loss can be described as: -```lua -loss(x, class) = forward(x, class) = -x[class] -``` - -The following is a code fragment showing how to make a gradient step -given an input `x`, a desired output `y` (an integer `1` to `n`, -in this case `n` = `2` classes), -a network `mlp` and a learning rate `learningRate`: -```lua -function gradUpdate(mlp,x,y,learningRate) - local criterion = nn.ClassNLLCriterion() - pred = mlp:forward(x) - local err = criterion:forward(pred, y); - mlp:zeroGradParameters(); - local t = criterion:backward(pred, y); - mlp:backward(x, t); - mlp:updateParameters(learningRate); -end -``` - -<a name="nn.MarginCriterion"/> -## MarginCriterion ## - -```lua -criterion = MarginCriterion() -``` - -Creates a criterion that optimizes a two-class classification hinge loss (margin-based loss) between input `x` (a Tensor of dimension 1) and output `y` (which is a scalar, either 1 or -1) : - -```lua -loss(x,y) = forward(x,y) = max(0,m- y x). -``` - -`m` is the margin, which is by default 1. - -```lua -criterion = MarginCriterion(marginValue) -``` - -sets a different value of `m`. - - -Example: -```lua -require "nn" - -function gradUpdate(mlp, x, y, criterion, learningRate) - local pred = mlp:forward(x) - local err = criterion:forward(pred, y) - local gradCriterion = criterion:backward(pred, y) - mlp:zeroGradParameters() - mlp:backward(x, gradCriterion) - mlp:updateParameters(learningRate) -end - -mlp=nn.Sequential() -mlp:add(nn.Linear(5,1)) - -x1=torch.rand(5) -x2=torch.rand(5) -criterion=nn.MarginCriterion(1) - -for i=1,1000 do - gradUpdate(mlp,x1,1,criterion,0.01) - gradUpdate(mlp,x2,-1,criterion,0.01) -end - -print(mlp:forward(x1)) -print(mlp:forward(x2)) - -print(criterion:forward(mlp:forward(x1),1)) -print(criterion:forward(mlp:forward(x2),-1)) -``` -gives the output: -```lua - 1.0043 -[torch.Tensor of dimension 1] - - --1.0061 -[torch.Tensor of dimension 1] - -0 -0 -``` -i.e. the mlp successfully separates the two data points such that they both have a margin of 1, and hence a loss of 0. - -<a name="nn.MultiMarginCriterion"/> -## MultiMarginCriterion ## - -```lua -criterion = MultiMarginCriterion() -``` - -Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input `x` (a Tensor of dimension 1) and output `y` (which is a target class index, 1 <= y <= x:size(1)) : - -```lua -loss(x,y) = forward(x,y) = sum_i(max(0, 1 - (x[y] - x[i]))) / x:size(1) -``` -where i = 1 to x:size(1) and i ~= y - -<a name="nn.MSECriterion"/> -## MSECriterion ## - -```lua -criterion = MSECriterion() -``` - -Creates a criterion that measures the mean squared error between `n` elements in the input `x` -and output `y`: - -```lua -loss(x,y) = forward(x,y) = 1/n \sum |x_i-y_i|^2 . -``` - -If `x` and `y` are `d`-dimensional Tensors with a total of `n` elements, -the sum operation still operates over all the elements, and divides by `n`. The two tensors must -have the same number of elements (but their sizes might be different...) - -The division by `n` can be avoided if one sets the internal variable `sizeAverage` to `false`: -```lua -criterion = nn.MSECriterion() -criterion.sizeAverage = false -``` - -<a name="nn.MultiCriterion"/> -## MultiCriterion ## - -```lua -criterion = MultiCriterion() -``` - -This returns a Criterion which is a weighted sum of other Criterion. -Criterions are added using the method: - -`criterion:add(singleCriterion, weight)` - -where `weight` is a scalar. - - -<a name="nn.HingeEmbeddingCriterion"/> -## HingeEmbeddingCriterion ## - -```lua -criterion = HingeEmbeddingCriterion() -``` - -Creates a criterion that measures the loss given an input -`x` which is a 1-dimensional vector and a label `y` (1 or -1). -This is usually used for measuring whether two inputs are similar -or dissimilar, e.g. using the L1 pairwise distance, -and is typically used for -learning nonlinear embeddings or semi-supervised learning. - -<verbatim> -loss(x,y) = forward(x,y) = x, if y=1 -= max(0,margin - x), if y=-1 -</verbatim> - -The `margin` has a default value of 1, or can be set in the constructor: -```lua -criterion = HingeEmbeddingCriterion(marginValue) -``` - -Example use: -```lua --- imagine we have one network we are interested in, it is called "p1_mlp" -p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) - --- But we want to push examples towards or away from each other --- so we make another copy of it called p2_mlp --- this *shares* the same weights via the set command, but has its own set of temporary gradient storage --- that's why we create it again (so that the gradients of the pair don't wipe each other) -p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2)) -p2_mlp:get(1).weight:set(p1_mlp:get(1).weight) -p2_mlp:get(1).bias:set(p1_mlp:get(1).bias) - --- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp -prl = nn.ParallelTable() -prl:add(p1_mlp) -prl:add(p2_mlp) - --- now we define our top level network that takes this parallel table and computes the pairwise distance betweem --- the pair of outputs -mlp= nn.Sequential() -mlp:add(prl) -mlp:add(nn.PairwiseDistance(1)) - --- and a criterion for pushing together or pulling apart pairs -crit=nn.HingeEmbeddingCriterion(1) - --- lets make two example vectors -x=torch.rand(5) -y=torch.rand(5) - - --- Use a typical generic gradient update function -function gradUpdate(mlp, x, y, criterion, learningRate) -local pred = mlp:forward(x) -local err = criterion:forward(pred, y) -local gradCriterion = criterion:backward(pred, y) -mlp:zeroGradParameters() -mlp:backward(x, gradCriterion) -mlp:updateParameters(learningRate) -end - --- push the pair x and y together, notice how then the distance between them given --- by print(mlp:forward({x,y})[1]) gets smaller -for i=1,10 do -gradUpdate(mlp,{x,y},1,crit,0.01) -print(mlp:forward({x,y})[1]) -end - - --- pull apart the pair x and y, notice how then the distance between them given --- by print(mlp:forward({x,y})[1]) gets larger - -for i=1,10 do -gradUpdate(mlp,{x,y},-1,crit,0.01) -print(mlp:forward({x,y})[1]) -end - -``` - -<a name="nn.L1HingeEmbeddingCriterion"/> -## L1HingeEmbeddingCriterion ## - -```lua -criterion = L1HingeEmbeddingCriterion(margin) -``` - -Creates a criterion that measures the loss given an input -`x` = `{x1,x2}`, a table of two tensors, and a label `y` (1 or -1): -This is used for measuring whether two inputs are similar -or dissimilar, using the L1 distance, and is typically used for -learning nonlinear embeddings or semi-supervised learning. - -<verbatim> -loss(x,y) = forward(x,y) = ||x1-x2||_1, if y=1 -= max(0,margin - ||x1-x2||_1), if y=-1 -</verbatim> - -The `margin` has a default value of 1, or can be set in the constructor: -```lua -criterion = L1HingeEmbeddingCriterion(marginValue) -``` - -<a name="nn.CosineEmbeddingCriterion"/> -## CosineEmbeddingCriterion ## - -```lua -criterion = nn.CosineEmbeddingCriterion(margin) -``` - -Creates a criterion that measures the loss given an input -`x` = `{x1,x2}`, a table of two tensors, and a label `y` (1 or -1): -This is used for measuring whether two inputs are similar -or dissimilar, using the cosine distance, and is typically used for -learning nonlinear embeddings or semi-supervised learning. - -`margin` should be a number from -1 to 1, 0 to 0.5 is suggested. -Forward and Backward have to be used alternately. If `margin` is missing, the default value is 0. - -The loss function is: -<verbatim> -loss(x,y) = forward(x,y) = 1-cos(x1, x2), if y=1 -= max(0,cos(x1, x2)-margin), if y=-1 -</verbatim> - -<a name="nn.MarginRankingCriterion"/> -## MarginRankingCriterion ## - -```lua -criterion = nn.MarginRankingCriterion(margin) -``` - -Creates a criterion that measures the loss given an input -`x` = `{x1,x2}`, a table of two Tensors of size 1 (they contain only scalars), -and a label `y` (1 or -1): - -If `y` = `1` then it assumed the first input should be ranked higher (have a larger value) -than the second input, and vice-versa for `y` = `-1`. - -The loss function is: -<verbatim> -loss(x,y) = forward(x,y) = max(0,-y*(x[1]-x[2])+margin) -</verbatim> - -Example: -```lua - -p1_mlp= nn.Linear(5,2) -p2_mlp= p1_mlp:clone('weight','bias') - -prl=nn.ParallelTable() -prl:add(p1_mlp) -prl:add(p2_mlp) - -mlp1=nn.Sequential() -mlp1:add(prl) -mlp1:add(nn.DotProduct()) - -mlp2=mlp1:clone('weight','bias') - -mlpa=nn.Sequential() -prla=nn.ParallelTable() -prla:add(mlp1) -prla:add(mlp2) -mlpa:add(prla) - -crit=nn.MarginRankingCriterion(0.1) - -x=torch.randn(5) -y=torch.randn(5) -z=torch.randn(5) - - --- Use a typical generic gradient update function -function gradUpdate(mlp, x, y, criterion, learningRate) - local pred = mlp:forward(x) - local err = criterion:forward(pred, y) - local gradCriterion = criterion:backward(pred, y) - mlp:zeroGradParameters() - mlp:backward(x, gradCriterion) - mlp:updateParameters(learningRate) -end - -for i=1,100 do - gradUpdate(mlpa,{{x,y},{x,z}},1,crit,0.01) - if true then - o1=mlp1:forward{x,y}[1]; - o2=mlp2:forward{x,z}[1]; - o=crit:forward(mlpa:forward{{x,y},{x,z}},1) - print(o1,o2,o) - end -end - -print "--" - -for i=1,100 do - gradUpdate(mlpa,{{x,y},{x,z}},-1,crit,0.01) - if true then - o1=mlp1:forward{x,y}[1]; - o2=mlp2:forward{x,z}[1]; - o=crit:forward(mlpa:forward{{x,y},{x,z}},-1) - print(o1,o2,o) - end -end -``` - -<a name="nn.traningneuralnet.dok"/> -# Training a neural network # - -Training a neural network is easy with a [simple `for` loop](#nn.DoItYourself). -While doing your own loop provides great flexibility, you might -want sometimes a quick way of training neural -networks. [StochasticGradient](#nn.StochasticGradient), a simple class -which does the job for you is provided as standard. - -<a name="nn.StochasticGradient.dok"/> -## StochasticGradient ## - -`StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient -algorithm. This class is [serializable](..:torch:file#torch.file.serialization). - -<a name="nn.StochasticGradient"/> -### StochasticGradient(module, criterion) ### - -Create a `StochasticGradient` class, using the given [Module](#nn.Module) and [Criterion](#nn.Criterion). -The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization. - -<a name="nn.StochasticGradientTrain"/> -### train(dataset) ### - -Train the module and criterion given in the -[constructor](#nn.StochasticGradient) over `dataset`, using the -internal [parameters](#nn.StochasticGradientParameters). - -StochasticGradient expect as a `dataset` an object which implements the operator -`dataset[index]` and implements the method `dataset:size()`. The `size()` methods -returns the number of examples and `dataset[i]` has to return the i-th example. - -An `example` has to be an object which implements the operator -`example[field]`, where `field` might take the value `1` (input features) -or `2` (corresponding label which will be given to the criterion). -The input is usually a Tensor (except if you use special kind of gradient modules, -like [table layers](#nn.TableLayers)). The label type depends of the criterion. -For example, the [MSECriterion](#nn.MSECriterion) expects a Tensor, but the -[ClassNLLCriterion](#nn.ClassNLLCriterion) except a integer number (the class). - -Such a dataset is easily constructed by using Lua tables, but it could any `C` object -for example, as long as required operators/methods are implemented. -[See an example](#nn.DoItStochasticGradient). - -<a name="nn.StochasticGradientParameters"/> -### Parameters ### - -`StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain). - - * `learningRate`: This is the learning rate used during training. The update of the parameters will be `parameters = parameters - learningRate * parameters_gradient`. Default value is `0.01`. - * `learningRateDecay`: The learning rate decay. If non-zero, the learning rate (note: the field learningRate will not change value) will be computed after each iteration (pass over the dataset) with: `current_learning_rate =learningRate / (1 + iteration * learningRateDecay)` - * `maxIteration`: The maximum number of iteration (passes over the dataset). Default is `25`. - * `shuffleIndices`: Boolean which says if the examples will be randomly sampled or not. Default is `true`. If `false`, the examples will be taken in the order of the dataset. - * `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`. - * `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration)` as parameters. Default is `nil`. - -<a name="nn.DoItStochasticGradient"/> -## Example of training using StochasticGradient ## - -We show an example here on a classical XOR problem. - -__Dataset__ - -We first need to create a dataset, following the conventions described in -[StochasticGradient](#nn.StochasticGradientTrain). -```lua -dataset={}; -function dataset:size() return 100 end -- 100 examples -for i=1,dataset:size() do - local input = torch.randn(2); -- normally distributed example in 2d - local output = torch.Tensor(1); - if input[1]*input[2]>0 then -- calculate label for XOR function - output[1] = -1; - else - output[1] = 1 - end - dataset[i] = {input, output} -end -``` - -__Neural Network__ - -We create a simple neural network with one hidden layer. -```lua -require "nn" -mlp = nn.Sequential(); -- make a multi-layer perceptron -inputs = 2; outputs = 1; HUs = 20; -- parameters -mlp:add(nn.Linear(inputs, HUs)) -mlp:add(nn.Tanh()) -mlp:add(nn.Linear(HUs, outputs)) -``` - -__Training__ - -We choose the Mean Squared Error criterion and train the beast. -```lua -criterion = nn.MSECriterion() -trainer = nn.StochasticGradient(mlp, criterion) -trainer.learningRate = 0.01 -trainer:train(dataset) -``` - -__Test the network__ - -```lua -x = torch.Tensor(2) -x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) -x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) -x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) -x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) -``` - -You should see something like: -```lua -> x = torch.Tensor(2) -> x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) - --0.3490 -[torch.Tensor of dimension 1] - -> x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) - - 1.0561 -[torch.Tensor of dimension 1] - -> x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) - - 0.8640 -[torch.Tensor of dimension 1] - -> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) - --0.2941 -[torch.Tensor of dimension 1] -``` - -<a name="nn.DoItYourself"/> -## Example of manual training of a neural network ## - -We show an example here on a classical XOR problem. - -__Neural Network__ - -We create a simple neural network with one hidden layer. -```lua -require "nn" -mlp = nn.Sequential(); -- make a multi-layer perceptron -inputs = 2; outputs = 1; HUs = 20; -- parameters -mlp:add(nn.Linear(inputs, HUs)) -mlp:add(nn.Tanh()) -mlp:add(nn.Linear(HUs, outputs)) -``` - -__Loss function__ - -We choose the Mean Squared Error criterion. -```lua -criterion = nn.MSECriterion() -``` - -__Training__ - -We create data _on the fly_ and feed it to the neural network. - -```lua -for i = 1,2500 do - -- random sample - local input= torch.randn(2); -- normally distributed example in 2d - local output= torch.Tensor(1); - if input[1]*input[2] > 0 then -- calculate label for XOR function - output[1] = -1 - else - output[1] = 1 - end - - -- feed it to the neural network and the criterion - criterion:forward(mlp:forward(input), output) - - -- train over this example in 3 steps - -- (1) zero the accumulation of the gradients - mlp:zeroGradParameters() - -- (2) accumulate gradients - mlp:backward(input, criterion:backward(mlp.output, output)) - -- (3) update parameters with a 0.01 learning rate - mlp:updateParameters(0.01) -end -``` - -__Test the network__ - -```lua -x = torch.Tensor(2) -x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) -x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) -x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) -x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) -``` - -You should see something like: -```lua -> x = torch.Tensor(2) -> x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) - --0.6140 -[torch.Tensor of dimension 1] - -> x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) - - 0.8878 -[torch.Tensor of dimension 1] - -> x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) - - 0.8548 -[torch.Tensor of dimension 1] - -> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) - --0.5498 -[torch.Tensor of dimension 1] -``` |