diff options
Diffstat (limited to 'dok/index.dok')
-rw-r--r-- | dok/index.dok | 3053 |
1 files changed, 3053 insertions, 0 deletions
diff --git a/dok/index.dok b/dok/index.dok new file mode 100644 index 0000000..ded5265 --- /dev/null +++ b/dok/index.dok @@ -0,0 +1,3053 @@ +====== Neural Network Package ======= +{{anchor:nn.dok}} + +This package provides an easy way to build and train simple or complex +neural networks. + +Each module of a network is composed of [[#nn.Modules|Modules]] and there +are several sub-classes of ''Module'' available: container classes like +[[#nn.Sequential|Sequential]], [[#nn.Parallel|Parallel]] and +[[#nn.Concat|Concat]] , which can contain simple layers like +[[#nn.Linear|Linear]], [[#nn.Mean|Mean]], [[#nn.Max|Max]] and +[[#nn.Reshape|Reshape]], as well as convolutional layers, and transfer +functions like [[#nn.Tanh|Tanh]]. + +Loss functions are implemented as sub-classes of +[[#nn.Criterions|Criterion]]. They are helpful to train neural network on +classical tasks. Common criterions are the Mean Squared Error +criterion implemented in [[#nn.MSECriterion|MSECriterion]] and the +cross-entropy criterion implemented in +[[#nn.ClassNLLCriterion|ClassNLLCriterion]]. + +Finally, the [[#nn.StochasticGradient|StochasticGradient]] class provides a +high level way to train the neural network of choice, even though it is +easy with a simple for loop to [[#nn.DoItYourself|train a neural network yourself]]. + +For those who want to implement their own modules, we suggest using +the ''nn.Jacobian'' class for testing the derivatives of their class, +together with the [[..:torch:tester|torch.Tester]] class. The sources +of ''nn'' package contains sufficiently many examples of such tests. + + +====== Detailed Overview of the Neural Network Package ====== +{{anchor:nn.overview.dok}} + +**Module** + +A neural network is called a [[#nn.Module|Module]] (or simply +//module// in this documentation) in Torch. ''Module'' is an abstract +class which defines four main methods: + * [[#nn.Module.forward|forward(input)]] which computes the output of the module given the ''input'' [[..:torch:tensor|Tensor]]. + * [[#nn.Module.backward|backward(input, gradOutput)]] which computes the gradients of the module with respect to its own parameters, and its own inputs. + * [[#nn.Module.zeroGradParameters|zeroGradParameters()]] which zeroes the gradient with respect to the parameters of the module. + * [[#nn.Module.updateParameters|updateParameters(learningRate)]] which updates the parameters after one has computed the gradients with ''backward()'' + +It also declares two members: + * [[#nn.Module.output|output]] which is the output returned by ''forward()''. + * [[#nn.Module.gradInput|gradInput]] which contains the gradients with respect to the input of the module, computed in a ''backward()''. + +Two other perhaps less used but handy methods are also defined: + * [[#nn.Module.share|share(mlp,s1,s2,...,sn)]] which makes this module share the parameters s1,..sn of the module ''mlp''. This is useful if you want to have modules that share the same weights. + * [[#nn.Module.clone|clone(...)]] which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any). + +Some important remarks: + * ''output'' contains only valid values after a [[#nn.Module.forward|forward(input)]]. + * ''gradInput'' contains only valid values after a [[#nn.Module.backward|backward(input, gradOutput)]]. + * [[#nn.Module.backward|backward(input, gradOutput)]] uses certain computations obtained during [[#nn.Module.forward|forward(input)]]. You //must// call ''forward()'' before calling a ''backward()'', on the //same// ''input'', or your gradients are going to be incorrect! + + +**Plug and play** + +Building a simple neural network can be achieved by constructing an available layer. +A linear neural network (perceptron!) is built only in one line: +<file lua> +nn = nn.Linear(10,1) -- perceptron with 10 inputs +</file> + +More complex neural networks are easily built using container classes +[[#nn.Sequential|Sequential]] and [[#nn.Concat|Concat]]. ''Sequential'' plugs +layer in a feed-forward fully connected manner. ''Concat'' concatenates in +one layer several modules: they take the same inputs, and their output is +concatenated. + +Creating a one hidden-layer multi-layer perceptron is thus just as easy as: +<file lua> +mlp = nn.Sequential() +mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units +mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function +mlp:add( nn.Linear(25, 1) ) -- 1 output +</file> + +Of course, ''Sequential'' and ''Concat'' can contains other +''Sequential'' or ''Concat'', allowing you to try the craziest neural +networks you ever dreamt of! See the [[#nn.Modules|complete list of +available modules]]. + +**Training a neural network** + +Once you built your neural network, you have to choose a particular +[[#nn.Criterions|Criterion]] to train it. A criterion is a class which +describes the cost to be minimized during training. + +You can then train the neural network by using the +[[#nn.StochasticGradient|StochasticGradient]] class. + +<file lua> + criterion = nn.MSECriterion() -- Mean Squared Error criterion + trainer = nn.StochasticGradient(mlp, criterion) + trainer:train(dataset) -- train using some examples +</file> + +StochasticGradient expect as a ''dataset'' an object which implements +the operator ''dataset[index]'' and implements the method +''dataset:size()''. The ''size()'' methods returns the number of +examples and ''dataset[i]'' has to return the i-th example. + +An ''example'' has to be an object which implements the operator +''example[field]'', where ''field'' might take the value ''1'' (input +features) or ''2'' (corresponding label which will be given to the +criterion). The input is usually a Tensor (except if you use special +kind of gradient modules, like [[#nn.TableLayers|table layers]]). The +label type depends of the criterion. For example, the +[[#nn.MSECriterion|MSECriterion]] expect a Tensor, but the +[[#nn.ClassNLLCriterion|ClassNLLCriterion]] except a integer number (the +class). + +Such a dataset is easily constructed by using Lua tables, but it could +any ''C'' object for example, as long as required operators/methods +are implemented. [[#nn.DoItStochasticGradient|See an example]]. + +''StochasticGradient'' being written in ''Lua'', it is extremely easy +to cut-and-paste it and create a variant to it adapted to your needs +(if the constraints of ''StochasticGradient'' do not satisfy you). + +**Low Level Training Of a Neural Network** + +If you want to program the ''StochasticGradient'' by hand, you +essentially need to control the use of forwards and backwards through +the network yourself. For example, here is the code fragment one +would need to make a gradient step given an input ''x'', a desired +output ''y'', a network ''mlp'' and a given criterion ''criterion'' +and learning rate ''learningRate'': + +<file lua> +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end +</file> +For example, if you wish to use your own criterion you can simple replace +''gradCriterion'' with the gradient vector of your criterion of choice. + + +====== Modules ====== +{{anchor:nn.Modules}} + +Modules are bricks to build neural networks. A [[#nn.Module|Module]] is a neural network +by itself, but it can be combined with other networks using [[#nn.Containers|container classes]] to create +complex neural networks. + +===== Module ===== +{{anchor:nn.Module}} + +''Module'' is an abstract class which defines fundamental methods necessary +for a training a neural network. Modules are [[..:torch:file#torch.file.serialization|serializable]]. + +Modules contain two states variables: [[#nn.ModuleOutput|output]] and +[[#nn.ModuleGradInput|gradInput]]. + +==== [output] forward(input) ==== +{{anchor:nn.Module.forward}} + +Takes an ''input'' object, and computes the corresponding ''output'' of the +module. In general ''input'' and ''output'' are +[[..:torch:tensor|Tensors]]. However, some special sub-classes +like [[#nn.TableLayers|table layers]] might expect something else. Please, +refer to each module specification for further information. + +After a ''forward()'', the [[#nn.ModuleOutput|ouput]] state variable should +have been updated to the new value. + +It is not advised to override this function. Instead, one should +implement [[#nn.Module.updateOutput|updateOutput(input)]] +function. The forward module in the abstract parent class +[[#nn.Module|Module]] will call ''updateOutput(input)''. + +==== [gradInput] backward(input, gradOutput) ==== +{{anchor:nn.Module.backward}} + +Performs a //backpropagation step// through the module, with respect to the +given ''input''. In general this method makes the assumption +[[#nn.Module.forward|forward(input)]] has been called before, //with the same input//. +This is necessary for optimization reasons. If you do not respect +this rule, ''backward()'' will compute incorrect gradients. + +In general ''input'' and ''gradOutput'' and ''gradInput'' are +[[..:torch:tensor|Tensors]]. However, some special sub-classes +like [[#nn.TableLayers|table layers]] might expect something else. Please, +refer to each module specification for further information. + +A //backpropagation step// consist in computing two kind of gradients +at ''input'' given ''gradOutput'' (gradients with respect to the +output of the module). This function simply performs this task using +two function calls: + + - A function call to [[#nn.Module.updateGradInput|updateGradInput(input, gradOutput)]]. + - A function call to [[#nn.Module.accGradParameters|accGradParameters(input,gradOutput)]]. + +It is not advised to override this function call in custom classes. It +is better to override +[[#nn.Module.updateGradInput|updateGradInput(input, gradOutput)]] and +[[#nn.Module.accGradParameters|accGradParameters(input, gradOutput)]] +functions. + +==== updateOutput(input) ==== +{{anchor:nn.Module.updateOutput}} + +Computes the output using the current parameter set of the class and +input. This function returns the result which is stored in the +[[#nn.Module.output|output]] field. + +==== updateGradInput(input, gradOutput) ==== +{{anchor:nn.Module.updateGradInput}} + +Computing the gradient of the module with respect to its own +input. This is returned in ''gradInput''. Also, the +[[#nn.Module.gradInput|gradInput]] state variable is updated +accordingly. + +==== accGradParameters(input, gradOutput) ==== +{{anchor:nn.Module.accGradParameters}} + +Computing the gradient of the module with respect to its +ownparameters. Many modules do not perform this step as they do not +have any parameters. The state variable name for the parameters is +module dependent. The module is expected to //accumulate// the +gradients with respect to the parameters in some variable. + +Zeroing this accumulation is achieved with +[[#nn.Module.zeroGradParameters|zeroGradParameters()]] and updating +the parameters according to this accumulation is done with +[[#nn.Module.updateParameters|updateParameters()]]. + +==== zeroGradParameters() ==== +{{anchor:nn.Module.zeroGradParameters}} + +If the module has parameters, this will zero the accumulation of the +gradients with respect to these parameters, accumulated through +[[#nn.Module.accGradParameters|accGradParameters(input, gradOutput)]] +calls. Otherwise, it does nothing. + +==== updateParameters(learningRate) ==== +{{anchor:nn.Module.updateParameters}} + +If the module has parameters, this will update these parameters, according +to the accumulation of the gradients with respect to these parameters, +accumulated through [[#nn.Module.backward|backward()]] calls. + +The update is basically: +<file lua> +parameters = parameters - learningRate * gradients_wrt_parameters +</file> +If the module does not have parameters, it does nothing. + +==== accUpdateGradParameters(input, gradOutput, learningRate) ==== +{{anchor:nn.Module.accUpdateGradParameters}} + +This is a convenience module that performs two functions at +once. Calculates and accumulates the gradients with respect to the +weights after mutltiplying with negative of the learning rate +''learningRate''. Performing these two operations at once is more +performance efficient and it might be advantageous in certain +situations. + +Keep in mind that, this function uses a simple trick to achieve its +goal and it might not be valid for a custom module. + +<file lua> +function Module:accUpdateGradParameters(input, gradOutput, lr) + local gradWeight = self.gradWeight + local gradBias = self.gradBias + self.gradWeight = self.weight + self.gradBias = self.bias + self:accGradParameters(input, gradOutput, -lr) + self.gradWeight = gradWeight + self.gradBias = gradBias +end +</file> + +As it can be seen, the gradients are accumulated directly into +weights. This assumption may not be true for a module that computes a +nonlinear operation. + +==== share(mlp,s1,s2,...,sn) ==== +{{anchor:nn.Module.share}} + +This function modifies the parameters of the module named +''s1'',..''sn'' (if they exist) so that they are shared with (pointers +to) the parameters with the same names in the given module ''mlp''. + +The parameters have to be Tensors. This function is typically used if +you want to have modules that share the same weights or biases. + +Note that this function if called on a [[#nn.Containers|Container]] +module will share the same parameters for all the contained modules as +well. + +Example: +<file lua> + +-- make an mlp +mlp1=nn.Sequential(); +mlp1:add(nn.Linear(100,10)); + +-- make a second mlp +mlp2=nn.Sequential(); +mlp2:add(nn.Linear(100,10)); + +-- the second mlp shares the bias of the first +mlp2:share(mlp1,'bias'); + +-- we change the bias of the first +mlp1:get(1).bias[1]=99; + +-- and see that the second one's bias has also changed.. +print(mlp2:get(1).bias[1]) + +</file> + + +==== clone(mlp,...) ==== +{{anchor:nn.Module.clone}} + +Creates a deep copy of (i.e. not just a pointer to) the module, +including the current state of its parameters (e.g. weight, biases +etc., if any). + +If arguments are provided to the ''clone(...)'' function it also calls +[[#nn.Module.share|share(...)]] with those arguments on the cloned +module after creating it, hence making a deep copy of this module with +some shared parameters. + +Example: +<file lua> +-- make an mlp +mlp1=nn.Sequential(); +mlp1:add(nn.Linear(100,10)); + +-- make a copy that shares the weights and biases +mlp2=mlp1:clone('weight','bias'); + +-- we change the bias of the first mlp +mlp1:get(1).bias[1]=99; + +-- and see that the second one's bias has also changed.. +print(mlp2:get(1).bias[1]) + +</file> + +==== type(type) ==== +{{anchor:nn.Module.type}} + +This function converts all the parameters of a module to the given +''type''. The ''type'' can be one of the types defined for +[[..:torch:tensor|torch.Tensor]]. + +==== float() ==== +{{anchor:nn.Module.float}} + +Convenience method for calling [[#nn.Module.type|module:type('torch.FloatTensor')]] + +==== double() ==== +{{anchor:nn.Module.double}} + +Convenience method for calling [[#nn.Module.type|module:type('torch.DoubleTensor')]] + +==== cuda() ==== +{{anchor:nn.Module.cuda}} + +Convenience method for calling [[#nn.Module.type|module:type('torch.CudaTensor')]] + +==== State Variables ==== +{{anchor:nn.statevars.dok}} + +These state variables are useful objects if one wants to check the guts of +a ''Module''. The object pointer is //never// supposed to change. However, its +contents (including its size if it is a Tensor) are supposed to change. + +In general state variables are +[[..:torch:tensor|Tensors]]. However, some special sub-classes +like [[#nn.TableLayers|table layers]] contain something else. Please, +refer to each module specification for further information. + +=== output === +{{anchor:nn.Module.output}} + +This contains the output of the module, computed with the last call of +[[#nn.Module.forward|forward(input)]]. + +=== gradInput === +{{anchor:nn.Module.gradInput}} + +This contains the gradients with respect to the inputs of the module, computed with the last call of +[[#nn.Module.updateGradInput|updateGradInput(input, gradOutput)]]. + +==== Parameters and gradients w.r.t parameters ==== + +Some modules contain parameters (the ones that we actually want to +train!). The name of these parameters, and gradients w.r.t these parameters +are module dependent. + +==== [{weights}, {gradWeights}] parameters() ==== +{{anchor:nn.Module.parameters}} + +This function should returns two tables. One for the learnable +parameters ''{weights}'' and another for the gradients of the energy +wrt to the learnable parameters ''{gradWeights}''. + +For custom modules, it is a good idea to also override this +function. By default none of the built-in functions/modules use this +function call, but it is especialy useful when one wants to obtain a +global view of the whole network. + +===== Containers ===== +{{anchor:nn.Containers}} + +==== Concat ==== +{{anchor:nn.Concat}} + +<file lua> +module = nn.Concat(dim) +</file> +Concat concatenates the output of one layer of "parallel" modules along the +provided dimension ''dim'': they take the same inputs, and their output is +concatenated. +<file lua> +mlp=nn.Concat(1); +mlp:add(nn.Linear(5,3)) +mlp:add(nn.Linear(5,7)) +require "lab" +print(mlp:forward(lab.randn(5))) +</file> +which gives the output: +<file lua> + 0.7486 + 0.1349 + 0.7924 +-0.0371 +-0.4794 + 0.3044 +-0.0835 +-0.7928 + 0.7856 +-0.1815 +[torch.Tensor of dimension 10] +</file> + + +==== Sequential ==== +{{anchor:nn.Sequential}} + +Sequential provides a means to plug layers together +in a feed-forward fully connected manner. + +E.g. +creating a one hidden-layer multi-layer perceptron is thus just as easy as: +<file lua> +mlp = nn.Sequential() +mlp:add( nn.Linear(10, 25) ) -- 10 input, 25 hidden units +mlp:add( nn.Tanh() ) -- some hyperbolic tangent transfer function +mlp:add( nn.Linear(25, 1) ) -- 1 output + +require "lab" +print(mlp:forward(lab.randn(10))) +</file> +which gives the output: +<file lua> +-0.1815 +[torch.Tensor of dimension 1] +</file> + +==== Parallel ==== +{{anchor:nn.Parallel}} + +''module'' = ''Parallel(inputDimension,outputDimension)'' + +Creates a container module that applies its ''ith'' child module to the ''ith'' slice of the input Tensor by using [[..:torch:tensor#torch.tensor.select|select]] +on dimension ''inputDimension''. It concatenates the results of its contained modules together along dimension ''outputDimension''. + +Example: +<file lua> + require "lab" + mlp=nn.Parallel(2,1); -- iterate over dimension 2 of input + mlp:add(nn.Linear(10,3)); -- apply to first slice + mlp:add(nn.Linear(10,2)) -- apply to first second slice + print(mlp:forward(lab.randn(10,2))) +</file> +gives the output: +<file lua> +-0.5300 +-1.1015 + 0.7764 + 0.2819 +-0.6026 +[torch.Tensor of dimension 5] +</file> + +A more complicated example: +<file lua> +require "lab" + +mlp=nn.Sequential(); +c=nn.Parallel(1,2) +for i=1,10 do + local t=nn.Sequential() + t:add(nn.Linear(3,2)) + t:add(nn.Reshape(2,1)) + c:add(t) +end +mlp:add(c) + +pred=mlp:forward(lab.randn(10,3)) +print(pred) + +for i=1,10000 do -- Train for a few iterations + x=lab.randn(10,3); + y=lab.ones(2,10); + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + local err=criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.01); + print(err) +end +</file> +===== Simple layers ===== +{{anchor:nn.simplelayers.dok}} +==== Linear ==== +{{anchor:nn.Linear}} + +''module'' = ''Linear(inputDimension,outputDimension)'' + +Applies a linear transformation to the incoming data, i.e. //y= +Ax+b//. The ''input'' tensor given in ''forward(input)'' must be +either a vector (1D tensor) or matrix (2D tensor). If the input is a +matrix, then each row is assumed to be an input sample of given batch. + +You can create a layer in the following way: +<file lua> + module= nn.Linear(10,5) -- 10 inputs, 5 outputs +</file> +Usually this would be added to a network of some kind, e.g.: +<file lua> + mlp = nn.Sequential(); + mlp:add(module) +</file> +The weights and biases (//A// and //b//) can be viewed with: +<file lua> + print(module.weight) + print(module.bias) +</file> +The gradients for these weights can be seen with: +<file lua> + print(module.gradWeight) + print(module.gradBias) +</file> +As usual with ''nn'' modules, +applying the linear transformation is performed with: +<file lua> + x=torch.Tensor(10) -- 10 inputs + y=module:forward(x) +</file> + +==== SparseLinear ==== +{{anchor:nn.SparseLinear}} + +''module'' = ''SparseLinear(inputDimension,outputDimension)'' + +Applies a linear transformation to the incoming sparse data, i.e. +//y= Ax+b//. The ''input'' tensor given in ''forward(input)'' must +be a sparse vector represented as 2D tensor of the form +torch.Tensor(N, 2) where the pairs represent indices and values. +The SparseLinear layer is useful when the number of input +dimensions is very large and the input data is sparse. + +You can create a sparse linear layer in the following way: + +<file lua> + module= nn.SparseLinear(10000,2) -- 10000 inputs, 2 outputs +</file> +The sparse linear module may be used as part of a larger network, +and apart from the form of the input, +[[#nn.SparseLinear|SparseLinear]] +operates in exactly the same way as the [[#nn.Linear|Linear]] layer. + +A sparse input vector may be created as so.. +<file lua> + + x=lab.new({1, 0.1},{2, 0.3},{10, 0.3},{31, 0.2}) + + print(x) + + 1.0000 0.1000 + 2.0000 0.3000 + 10.0000 0.3000 + 31.0000 0.2000 +[torch.Tensor of dimension 4x2] + +</file> + +The first column contains indices, the second column contains +values in a a vector where all other elements are zeros. The +indices should not exceed the stated dimesions of the input to the +layer (10000 in the example). + +==== Abs ==== +{{anchor:nn.Abs}} + +''module'' = ''Abs()'' + +''output = abs(input)''. + +<file lua> +m=nn.Abs() +ii=lab.linspace(-5,5) +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> + +{{abs.png?400}} + +==== Add ==== +{{anchor:nn.Add }} + +''module'' = ''Add(inputDimension,scalar)'' + +Applies a bias term to the incoming data, i.e. +//y_i= x_i + b_i, or if _scalar=true// then uses a single bias term, +_y_i= x_i + b. + +Example: +<file lua> +y=torch.Tensor(5); +mlp=nn.Sequential() +mlp:add(nn.Add(5)) + +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) + return err +end + +for i=1,10000 do + x=lab.rand(5) + y:copy(x); + for i=1,5 do y[i]=y[i]+i; end + err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) +end +print(mlp:get(1).bias) +</file> +gives the output: +<file lua> + 1.0000 + 2.0000 + 3.0000 + 4.0000 + 5.0000 +[torch.Tensor of dimension 5] +</file> +i.e. the network successfully learns the input //x// has been shifted +to produce the output //y//. + + +==== Mul ==== +{{anchor:nn.Mul}} + +''module'' = ''Mul(inputDimension)'' + +Applies a //single// scaling factor to the incoming data, i.e. +//y= w x//, where //w// is a scalar. + +Example: +<file lua> +y=torch.Tensor(5); +mlp=nn.Sequential() +mlp:add(nn.Mul(5)) + +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(learningRate); + return err +end + + +for i=1,10000 do + x=lab.rand(5) + y:copy(x); y:mul(math.pi); + err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) +end +print(mlp:get(1).weight) +</file> +gives the output: +<file lua> + 3.1416 +[torch.Tensor of dimension 1] +</file> +i.e. the network successfully learns the input ''x'' has been scaled by +pi. + +==== CMul ==== +{{anchor:nn.CMul }} + +''module'' = ''CMul(inputDimension)'' + +Applies a component-wise multiplication to the incoming data, i.e. +''y_i'' = ''w_i'' =x_i=. + +Example: +<file lua> +mlp=nn.Sequential() +mlp:add(nn.CMul(5)) + +y=torch.Tensor(5); +sc=torch.Tensor(5); for i=1,5 do sc[i]=i; end -- scale input with this + +function gradUpdate(mlp,x,y,criterion,learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(learningRate); + return err +end + +for i=1,10000 do + x=lab.rand(5) + y:copy(x); y:cmul(sc); + err=gradUpdate(mlp,x,y,nn.MSECriterion(),0.01) +end +print(mlp:get(1).weight) +</file> +gives the output: +<file lua> + 1.0000 + 2.0000 + 3.0000 + 4.0000 + 5.0000 +[torch.Tensor of dimension 5] +</file> +i.e. the network successfully learns the input //x// has been scaled by +those scaling factors to produce the output //y//. + + +==== Max ==== +{{anchor:nn.Max}} + +''module'' = ''Max(dimension)'' + +Applies a max operation over dimension ''dimension''. +Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2'' +then an ''nxq'' matrix would be output. + + +==== Min ==== +{{anchor:nn.Min}} + +''module'' = ''Min(dimension)'' + +Applies a min operation over dimension ''dimension''. +Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2'' +then an ''nxq'' matrix would be output. + + +==== Mean ==== +{{anchor:nn.Mean}} + +''module'' = ''Mean(dimension)'' + +Applies a mean operation over dimension ''dimension''. +Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2'' +then an ''nxq'' matrix would be output. + +==== Sum ==== +{{anchor:nn.Sum}} + +''module'' = ''Sum(dimension)'' + +Applies a sum operation over dimension ''dimension''. +Hence, if an ''nxpxq'' Tensor was given as input, and ''dimension'' = ''2'' +then an ''nxq'' matrix would be output. + + +==== Euclidean ==== +{{anchor:nn.Euclidean}} + +''module'' = ''Euclidean(inputDimension,outputDimension)'' + +Outputs the Euclidean distance of the input to ''outputDimension'' centers, +i.e. this layer has the weights ''c_i'', ''i'' = ''1'',..,''outputDimension'', where +''c_i'' are vectors of dimension ''inputDimension''. Output dimension ''j'' is +''|| c_i - x||^2'', where ''x'' is the input. + +==== WeightedEuclidean ==== +{{anchor:nn.WeightedEuclidean}} + +''module'' = ''WeightedEuclidean(inputDimension,outputDimension)'' + +This module is similar to [[#nn.Euclidian|Euclidian]], but +additionally learns a separate diagonal covariance matrix across the +features of the input space for each center. + + +==== Copy ==== +{{anchor:nn.Copy}} + +''module'' = ''Copy(inputType,outputType)'' + +This layer copies the input to output with type casting from input +type from ''inputType'' to ''outputType''. + + +==== Narrow ==== +{{anchor:nn.Narrow}} + +''module'' = ''Narrow(dimension, offset, length)'' + +Narrow is application of +[[..:torch:tensor:#torch.Tensor.narrow|narrow]] operation in a +module. + +==== Replicate ==== +{{anchor:nn.Replicate}} + +''module'' = ''Replicate(nFeature)'' + +This class creates an output where the input is replicated +''nFeature'' times along its first dimension. There is no memory +allocation or memory copy in this module. It sets the +[[..:torch:tensor#torch.Tensor.stride|stride]] along the first +dimension to zero. + +<file lua> +torch> x=lab.linspace(1,5,5) +torch> =x + 1 + 2 + 3 + 4 + 5 +[torch.DoubleTensor of dimension 5] + +torch> m=nn.Replicate(3) +torch> o=m:forward(x) +torch> =o + 1 2 3 4 5 + 1 2 3 4 5 + 1 2 3 4 5 +[torch.DoubleTensor of dimension 3x5] + +torch> x:fill(13) +torch> =x + 13 + 13 + 13 + 13 + 13 +[torch.DoubleTensor of dimension 5] + +torch> =o + 13 13 13 13 13 + 13 13 13 13 13 + 13 13 13 13 13 +[torch.DoubleTensor of dimension 3x5] + +</file> + + +==== Reshape ==== +{{anchor:nn.Reshape}} + +''module'' = ''Reshape(dimension1, dimension2, ..)'' + +Reshapes an ''nxpxqx..'' Tensor into a ''dimension1xdimension2x...'' Tensor, +taking the elements column-wise. + +Example: +<file lua> +> x=torch.Tensor(4,4) +> for i=1,4 do +> for j=1,4 do +> x[i][j]=(i-1)*4+j; +> end +> end +> print(x) + + 1 2 3 4 + 5 6 7 8 + 9 10 11 12 + 13 14 15 16 +[torch.Tensor of dimension 4x4] + +> print(nn.Reshape(2,8):forward(x)) + + 1 9 2 10 3 11 4 12 + 5 13 6 14 7 15 8 16 +[torch.Tensor of dimension 2x8] + +> print(nn.Reshape(8,2):forward(x)) + + 1 3 + 5 7 + 9 11 + 13 15 + 2 4 + 6 8 + 10 12 + 14 16 +[torch.Tensor of dimension 8x2] + +> print(nn.Reshape(16):forward(x)) + + 1 + 5 + 9 + 13 + 2 + 6 + 10 + 14 + 3 + 7 + 11 + 15 + 4 + 8 + 12 + 16 +[torch.Tensor of dimension 16] + + +</file> + + +==== Select ==== +{{anchor:nn.Select}} + +Selects a dimension and index of a ''nxpxqx..'' Tensor. + +Example: +<file lua> +mlp=nn.Sequential(); +mlp:add(nn.Select(1,3)) + +require "lab" +x=lab.randn(10,5) +print(x) +print(mlp:forward(x)) +</file> +gives the output: +<file lua> + 0.9720 -0.0836 0.0831 -0.2059 -0.0871 + 0.8750 -2.0432 -0.1295 -2.3932 0.8168 + 0.0369 1.1633 0.6483 1.2862 0.6596 + 0.1667 -0.5704 -0.7303 0.3697 -2.2941 + 0.4794 2.0636 0.3502 0.3560 -0.5500 +-0.1898 -1.1547 0.1145 -1.1399 0.1711 +-1.5130 1.4445 0.2356 -0.5393 -0.6222 +-0.6587 0.4314 1.1916 -1.4509 1.9400 + 0.2733 1.0911 0.7667 0.4002 0.1646 + 0.5804 -0.5333 1.1621 1.5683 -0.1978 +[torch.Tensor of dimension 10x5] + + 0.0369 + 1.1633 + 0.6483 + 1.2862 + 0.6596 +[torch.Tensor of dimension 5] +</file> + +This can be used in conjunction with [[#nn.Concat|Concat]] +to emulate the behavior +of [[#nn.Parallel|Parallel]], or to select various parts of an input Tensor to +perform operations on. Here is a fairly complicated example: +<file lua> +require "lab" + +mlp=nn.Sequential(); +c=nn.Concat(2) +for i=1,10 do + local t=nn.Sequential() + t:add(nn.Select(1,i)) + t:add(nn.Linear(3,2)) + t:add(nn.Reshape(2,1)) + c:add(t) +end +mlp:add(c) + +pred=mlp:forward(lab.randn(10,3)) +print(pred) + +for i=1,10000 do -- Train for a few iterations + x=lab.randn(10,3); + y=lab.ones(2,10); + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + err=criterion:forward(pred,y) + gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.01); + print(err) +end +</file> + +==== Exp ==== +{{anchor:nn.Exp}} + +Applies the ''exp'' function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. +<file lua> +ii=lab.linspace(-2,2) +m=nn.Exp() +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{exp.png?400}} + + +==== Square ==== +{{anchor:nn.Square}} + +Takes the square of each element. + +<file lua> +ii=lab.linspace(-5,5) +m=nn.Square() +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{square.png?400}} + +==== Sqrt ==== +{{anchor:nn.Sqrt}} + +Takes the square root of each element. + +<file lua> +ii=lab.linspace(0,5) +m=nn.Sqrt() +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{sqrt.png?400}} + +==== Power ==== +{{anchor:nn.Power}} + +''module'' = ''Power(p)'' + +Raises each element to its ''pth'' power. + +<file lua> +ii=lab.linspace(0,2) +m=nn.Power(1.25) +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{power.png?400}} + +===== Transfer Function Layers ===== +{{anchor:nn.transfer.dok}} + +==== HardTanh ==== +{{anchor:nn.HardTanh}} + +Applies the ''HardTanh'' function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. + +''HardTanh'' is defined as: + + * ''f(x)'' = ''1, if x >'' ''1,'' + * ''f(x)'' = ''-1, if x <'' ''-1,'' + * ''f(x)'' = ''x,'' ''otherwise.'' + +<file lua> +ii=lab.linspace(-2,2) +m=nn.HardTanh() +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{htanh.png?400}} + + +==== HardShrink ==== +{{anchor:nn.HardShrink}} + +''module = nn.HardShrink(lambda)'' + +Applies the hard shrinkage function element-wise to the input +[[..:torch:Tensor|Tensor]]. The output is the same size as the input. + +''HardShrinkage'' operator is defined as: + + * ''f(x) = x, if x > lambda'' + * ''f(x) = -x, if < -lambda'' + * ''f(x) = 0, otherwise'' + +<file lua> +ii=lab.linspace(-2,2) +m=nn.HardShrink(0.85) +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{hshrink.png?400}} + +==== SoftShrink ==== +{{anchor:nn.SoftShrink}} + +''module = nn.SoftShrink(lambda)'' + +Applies the hard shrinkage function element-wise to the input +[[..:torch:Tensor|Tensor]]. The output is the same size as the input. + +''HardShrinkage'' operator is defined as: + + * ''f(x) = x-lambda, if x > lambda'' + * ''f(x) = -x+lambda, if < -lambda'' + * ''f(x) = 0, otherwise'' + +<file lua> +ii=lab.linspace(-2,2) +m=nn.SoftShrink(0.85) +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{sshrink.png?400}} + + +==== SoftMax ==== +{{anchor:nn.SoftMax}} + +Applies the ''Softmax'' function to an n-dimensional input Tensor, +rescaling them so that the elements of the n-dimensional output Tensor +lie in the range (0,1) and sum to 1. + +''Softmax'' is defined as ''f_i(x)'' = ''exp(x_i-shift) / sum_j exp(x_j-shift)'', +where ''shift'' = ''max_i x_i''. + + +<file lua> +ii=lab.exp(lab.abs(lab.randn(10))) +m=nn.SoftMax() +oo=m:forward(ii) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'}) +gnuplot.grid(true) +</file> +{{softmax.png?400}} + +==== SoftMin ==== +{{anchor:nn.SoftMin}} + +Applies the ''Softmin'' function to an n-dimensional input Tensor, +rescaling them so that the elements of the n-dimensional output Tensor +lie in the range (0,1) and sum to 1. + +''Softmin'' is defined as ''f_i(x)'' = ''exp(-x_i-shift) / sum_j exp(-x_j-shift)'', +where ''shift'' = ''max_i x_i''. + + +<file lua> +ii=lab.exp(lab.abs(lab.randn(10))) +m=nn.SoftMin() +oo=m:forward(ii) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'}) +gnuplot.grid(true) +</file> +{{softmin.png?400}} + +==== SoftPlus ==== +{{anchor:nn.SoftPlus}} + +Applies the ''SoftPlus'' function to an n-dimensioanl input Tensor. +Can be used to constrain the output of a machine to always be positive. + +''SoftPlus'' is defined as ''f_i(x)'' = ''log(1 + exp(x_i)))''. + +<file lua> +ii=lab.randn(10) +m=nn.SoftPlus() +oo=m:forward(ii) +go=lab.ones(10) +gi=m:backward(ii,go) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) +gnuplot.grid(true) +</file> +{{softplus.png?400}} + +==== SoftSign ==== +{{anchor:nn.SoftSign}} + +Applies the ''SoftSign'' function to an n-dimensioanl input Tensor. + +''SoftSign'' is defined as ''f_i(x) = x_i / (1+|x_i|)'' + +<file lua> +ii=lab.linspace(-5,5) +m=nn.SoftSign() +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{softsign.png?400}} + +==== LogSigmoid ==== +{{anchor:nn.LogSigmoid}} + +Applies the ''LogSigmoid'' function to an n-dimensional input Tensor. + +''LogSigmoid'' is defined as ''f_i(x)'' = ''log(1/(1+ exp(-x_i)))''. + + +<file lua> +ii=lab.randn(10) +m=nn.LogSigmoid() +oo=m:forward(ii) +go=lab.ones(10) +gi=m:backward(ii,go) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) +gnuplot.grid(true) +</file> +{{logsigmoid.png?400}} + + +==== LogSoftMax ==== +{{anchor:nn.LogSoftMax}} + +Applies the ''LogSoftmax'' function to an n-dimensional input Tensor. + +''LogSoftmax'' is defined as ''f_i(x)'' = ''log(1/a exp(x_i))'', +where ''a'' = ''sum_j exp(x_j)''. + +<file lua> +ii=lab.randn(10) +m=nn.LogSoftMax() +oo=m:forward(ii) +go=lab.ones(10) +gi=m:backward(ii,go) +gnuplot.plot({'Input',ii,'+-'},{'Output',oo,'+-'},{'gradInput',gi,'+-'}) +gnuplot.grid(true) +</file> +{{logsoftmax.png?400}} + +==== Sigmoid ==== +{{anchor:nn.Sigmoid}} + +Applies the ''Sigmoid'' function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. + +''Sigmoid'' is defined as ''f(x)'' = ''1/(1+exp(-x))''. + +<file lua> +ii=lab.linspace(-5,5) +m=nn.Sigmoid() +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{sigmoid.png?400}} + +==== Tanh ==== +{{anchor:nn.Tanh}} + +Applies the ''Tanh'' function element-wise to the input Tensor, +thus outputting a Tensor of the same dimension. + +<file lua> +ii=lab.linspace(-3,3) +m=nn.Tanh() +oo=m:forward(ii) +go=lab.ones(100) +gi=m:backward(ii,go) +gnuplot.plot({'f(x)',ii,oo,'+-'},{'df/dx',ii,gi,'+-'}) +gnuplot.grid(true) +</file> +{{tanh.png?400}} + +===== Convolutional layers ===== +{{anchor:nn.convlayers.dok}} + +SpatialConvolution and SpatialSubsampling apply to inputs with +two-dimensional relationships (e.g. images). TemporalConvolution and +TemporalSubsampling apply to sequences with a one-dimensional +relationship (e.g. strings of some kind). + +For spatial convolutional layers, the input is supposed to be 3D. The +first dimension is the number of features, the last two dimenstions +are spatial. + +==== SpatialConvolution ==== +{{anchor:nn.SpatialConvolution}} + +<file lua> +module = nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH]) +</file> + +Applies a 2D convolution over an input image composed of several input planes. The ''input'' tensor in +''forward(input)'' is expected to be a 3D tensor (''width x height x nInputPlane''). + +The parameters are the following: + * ''nInputPlane'': The number of expected input planes in the image given into ''forward()''. + * ''nOutputPlane'': The number of output planes the convolution layer will produce. + * ''kW'': The kernel width of the convolution + * ''kH'': The kernel height of the convolution + * ''dW'': The step of the convolution in the width dimension. Default is ''1''. + * ''dH'': The step of the convolution in the height dimension. Default is ''1''. + +Note that depending of the size of your kernel, several (of the last) +columns or rows of the input image might be lost. It is up to the user to +add proper padding in images. + +If the input image is a 3D tensor ''nInputPlane x width x height'', the output image size +will be ''nOutputPlane x owidth x oheight'' where +<file lua> +owidth = (width - kW) / dW + 1 +oheight = (height - kH) / dH + 1 . +</file> + +The parameters of the convolution can be found in ''self.weight'' (Tensor of +size ''nOutputPlane x nInputPlane x kH x kW'') and ''self.bias'' (Tensor of +size ''nOutputPlane''). The corresponding gradients can be found in +''self.gradWeight'' and ''self.gradBias''. + +The output value of the layer can be precisely described as: +<file lua> +output[i][j][k] = bias[k] + + sum_l sum_{s=1}^kW sum_{t=1}^kH weight[s][t][l][k] + * input[dW*(i-1)+s)][dH*(j-1)+t][l] +</file> + +==== SpatialConvolutionMap ==== +{{anchor:nn.SpatialConvolutionMap}} + +<file lua> +module = nn.SpatialConvolutionMap(connectionMatrix, kW, kH, [dW], [dH]) +</file> + +This class is a generalization of +[[#nn.SpatialConvolution|nn.SpatialConvolution]]. It uses a geenric +connection table between input and output features. The +[[#nn.SpatialConvolution|nn.SpatialConvolution]] is equivalent to +using a [[#nn.tables.full|full connection table]]. One can specify +different types of connection tables. + +=== Full Connection Table === +{{anchor:nn.tables.full}} + +''table = nn.tables.full(nin,nout)'' + +This is a precomputed table that specifies connections between every +input and output node. + +=== One to One Connection Table === +{{anchor:nn.tables.onetoone}} + +''table = nn.tables.oneToOne(n)'' + +This is a precomputed table that specifies a single connection to each +output node from corresponding input node. + +=== Random Connection Table === +{{anchor:nn.tables.random}} + +''table = nn.tables.random(nin,nout, nto)'' + +This table is randomly populated such that each output unit has +''nto'' incoming connections. The algorihtm tries to assign uniform +number of outgoing connections to each input node if possible. + +==== SpatialLPPooling ==== +{{anchor:nn.SpatialLPPooling}} + +<file lua> +module = nn.SpatialLPPooling(nInputPlane, pnorm, kW, kH, [dW], [dH]) +</file> + +Computes the ''p'' norm in a convolutional manner on a set of 2D input planes. + +==== SpatialMaxPooling ==== +{{anchor:nn.SpatialMaxPooling}} + +<file lua> +module = nn.SpatialMaxPooling(kW, kH [, dW, dH]) +</file> + +Applies 2D max-pooling operation in ''kWxkH'' regions by step size +''dWxdH'' steps. The number of output features is equal to the number of +input planes. + +==== SpatialSubSampling ==== +{{anchor:nn.SpatialSubSampling}} + +<file lua> +module = nn.SpatialSubSampling(nInputPlane, kW, kH, [dW], [dH]) +</file> + +Applies a 2D sub-sampling over an input image composed of several input planes. The ''input'' tensor in +''forward(input)'' is expected to be a 3D tensor (''nInputPlane x width x height''). The number of output +planes will be the same as ''nInputPlane''. + +The parameters are the following: + * ''nInputPlane'': The number of expected input planes in the image given into ''forward()''. + * ''kW'': The kernel width of the sub-sampling + * ''kH'': The kernel height of the sub-sampling + * ''dW'': The step of the sub-sampling in the width dimension. Default is ''1''. + * ''dH'': The step of the sub-sampling in the height dimension. Default is ''1''. + +Note that depending of the size of your kernel, several (of the last) +columns or rows of the input image might be lost. It is up to the user to +add proper padding in images. + +If the input image is a 3D tensor ''width x height x nInputPlane'', the output image size +will be ''owidth x oheight x nInputPlane'' where +<file lua> +owidth = (width - kW) / dW + 1 +oheight = (height - kH) / dH + 1 . +</file> + +The parameters of the sub-sampling can be found in ''self.weight'' (Tensor of +size ''nInputPlane'') and ''self.bias'' (Tensor of size ''nInputPlane''). The +corresponding gradients can be found in ''self.gradWeight'' and +''self.gradBias''. + +The output value of the layer can be precisely described as: +<file lua> +output[i][j][k] = bias[k] + + weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k] +</file> + +==== SpatialZeroPadding ==== +{{anchor:nn.SpatialZeroPadding}} + +<file lua> +module = nn.SpatialZeroPadding(padLeft, padRight, padTop, padBottom) +</file> + +Each feature map of a given input is padded with specified number of +zeros. If padding values are negative, then input is cropped. + +==== SpatialSubtractiveNormalization ==== +{{anchor:nn.SpatialSubtractiveNormalization}} + +<file lua> +module = nn.SpatialSubtractiveNormalization(ninputplane, kernel) +</file> + +Applies a spatial subtraction operation on a series of 2D inputs using +''kernel'' for computing the weighted average in a neighborhood. The +neighborhood is defined for a local spatial region that is the size as +kernel and across all features. For a an input image, since there is +only one feature, the region is only spatial. For an RGB image, the +weighted anerage is taken over RGB channels and a spatial region. + +If the ''kernel'' is 1D, then it will be used for constructing and seperable +2D kernel. The operations will be much more efficient in this case. + +The kernel is generally chosen as a gaussian when it is believed that +the correlation of two pixel locations decrease with increasing +distance. On the feature dimension, a uniform average is used since +the weighting across features is not known. + +For this example we use an external package +[[http://www.github.com/clementfarabet/lua---image/|image]] + +<file lua> +require 'image' +require 'nn' +lena = image.rgb2y(image.lena()) +ker = lab.ones(11) +m=nn.SpatialSubtractiveNormalization(1,ker) +processed = m:forward(lena) +w1=image.display(lena) +w2=image.display(processed) +</file> +{{lena.jpg?300}}{{lenap.jpg?300}} + +==== TemporalConvolution ==== +{{anchor:nn.TemporalConvolution}} + +<file lua> +module = nn.TemporalConvolution(inputFrameSize, outputFrameSize, kW, [dW]) +</file> + +Applies a 1D convolution over an input sequence composed of ''nInputFrame'' frames. The ''input'' tensor in +''forward(input)'' is expected to be a 2D tensor (''nInputFrame x inputFrameSize''). + +The parameters are the following: + * ''inputFrameSize'': The input frame size expected in sequences given into ''forward()''. + * ''outputFrameSize'': The output frame size the convolution layer will produce. + * ''kW'': The kernel width of the convolution + * ''dW'': The step of the convolution. Default is ''1''. + +Note that depending of the size of your kernel, several (of the last) +frames of the sequence might be lost. It is up to the user to add proper padding frames in the input +sequences. + +If the input sequence is a 2D tensor ''inputFrameSize x nInputFrame'', the output sequence will be +''nOutputFrame x outputFrameSize'' where +<file lua> +nOutputFrame = (nInputFrame - kW) / dW + 1 +</file> + +The parameters of the convolution can be found in ''self.weight'' (Tensor of +size ''outputFrameSize x (inputFrameSize x kW) '') and ''self.bias'' (Tensor of +size ''outputFrameSize''). The corresponding gradients can be found in +''self.gradWeight'' and ''self.gradBias''. + +The output value of the layer can be precisely described as: +<file lua> +output[i][t] = bias[i] + + sum_j sum_{k=1}^kW weight[j][k][i] + * input[j][dW*(t-1)+k)] +</file> + +Here is a simple example: + +<file lua> +inp=5; -- dimensionality of one sequence element +outp=1; -- number of derived features for one sequence element +kw=1; -- kernel only operates on one sequence element at once +dw=1; -- we step once and go on to the next sequence element + +mlp=nn.TemporalConvolution(inp,outp,kw,dw) + +require "lab" +x=lab.rand(7,inp) -- a sequence of 7 elements +print(mlp:forward(x)) +</file> +which gives: +<file lua> +-0.9109 +-0.9872 +-0.6808 +-0.9403 +-0.9680 +-0.6901 +-0.6387 +[torch.Tensor of dimension 7x1] +</file> + +This is equivalent to: +<file lua> +weights=lab.reshape(mlp.weight,inp) -- weights applied to all +bias= mlp.bias[1]; +for i=1,x:size(1) do -- for each sequence element + element= x[i]; -- features of ith sequence element + print(element:dot(weights) + bias) +end +</file> +which gives: +<file lua> +-0.91094998687717 +-0.98721705771773 +-0.68075004276185 +-0.94030132495887 +-0.96798754116609 +-0.69008470895581 +-0.63871422284166 +</file> + + +==== TemporalSubSampling ==== +{{anchor:nn.TemporalSubSampling}} + +<file lua> +module = nn.TemporalSubSampling(inputFrameSize, kW, [dW]) +</file> + +Applies a 1D sub-sampling over an input sequence composed of ''nInputFrame'' frames. The ''input'' tensor in +''forward(input)'' is expected to be a 2D tensor (''nInputFrame x inputFrameSize''). The output frame size +will be the same as the input one (''inputFrameSize''). + +The parameters are the following: + * ''inputFrameSize'': The input frame size expected in sequences given into ''forward()''. + * ''kW'': The kernel width of the sub-sampling + * ''dW'': The step of the sub-sampling. Default is ''1''. + +Note that depending of the size of your kernel, several (of the last) +frames of the sequence might be lost. It is up to the user to add proper padding frames in the input +sequences. + +If the input sequence is a 2D tensor ''nInputFrame x inputFrameSize'', the output sequence will be +''inputFrameSize x nOutputFrame'' where +<file lua> +nOutputFrame = (nInputFrame - kW) / dW + 1 +</file> + +The parameters of the sub-sampling can be found in ''self.weight'' (Tensor of +size ''inputFrameSize'') and ''self.bias'' (Tensor of +size ''inputFrameSize''). The corresponding gradients can be found in +''self.gradWeight'' and ''self.gradBias''. + +The output value of the layer can be precisely described as: +<file lua> +output[i][t] = bias[i] + weight[i] * sum_{k=1}^kW input[i][dW*(t-1)+k)] +</file> + +==== LookupTable ==== +{{anchor:nn.LookupTable}} + +<file lua> +module = nn.LookupTable(nIndex, sizes) +</file> +or +<file lua> +module = nn.LookupTable(nIndex, size1, [size2], [size3], ...) +</file> + +This layer is a particular case of a convolution, where the width of the convolution would be ''1''. +When calling ''forward(input)'', it assumes ''input'' is a 1D tensor filled with indices. Indices start +at ''1'' and can go up to ''nIndex''. For each index, it outputs a corresponding ''Tensor'' of size +specified by ''sizes'' (an ''LongStorage'') or ''size1 x size2 x...''. + +The output tensors are concatenated, generating a ''size1 x size2 x ... x sizeN x n'' tensor, where ''n'' +is the size of the ''input'' tensor. + +When only ''size1'' is provided, this is equivalent to do the following matrix-matrix multiplication +in an efficient manner: +<file lua> +M P +</file> +where ''M'' is a 2D matrix ''size1 x nIndex'' containing the parameters of the lookup-table and +''P'' is a 2D matrix, where each column vector ''i'' is a zero vector except at index ''input[i]'' where it is ''1''. + +Example: +<file lua> + -- a lookup table containing 10 tensors of size 3 + module = nn.LookupTable(10, 3) + + input = torch.Tensor(4) + input[1] = 1; input[2] = 2; input[3] = 1; input[4] = 10; + print(module:forward(input)) +</file> + +Outputs something like: +<file lua> +-0.1784 2.2045 -0.1784 -0.2475 +-1.0120 0.0537 -1.0120 -0.2148 +-1.2840 0.8685 -1.2840 -0.2792 +[torch.Tensor of dimension 3x4] +</file> +Note that the first column vector is the same than the 3rd one! + +===== Layers for manipulating tables ===== +{{anchor:nn.TableLayers}} + +This set of modules allows the manipulation of Tables +through the layers of a neural network. +This allows one to build very rich architectures. + +Table-based modules work by supporting forward and backward methods that can accept +tables as inputs. It turns out that the usual [[#nn.Sequential|Sequential]] module can do this, so all that is needed is other child modules that take advantage of such tables. +<file lua> +mlp = nn.Sequential(); +t={x,y,z} +pred=mlp:forward(t) +pred=mlp:forward{x,y,z} -- This is equivalent to the line before +</file> + +==== ConcatTable ==== +{{anchor:nn.ConcatTable}} + +ConcatTable is a container module that applies each member module to +the same input Tensor. + +Example: +<file lua> +mlp= nn.ConcatTable() +mlp:add(nn.Linear(5,2)) +mlp:add(nn.Linear(5,3)) + +require "lab" +pred=mlp:forward(lab.randn(5)); +for i,k in pairs(pred) do print(i,k); end +</file> +which gives the output: +<file lua> +1 +-0.4073 + 0.0110 +[torch.Tensor of dimension 2] + +2 + 0.0027 +-0.0598 +-0.1189 +[torch.Tensor of dimension 3] +</file> + +==== ParallelTable ==== +{{anchor:nn.ParallelTable}} + +ParallelTable is a container module that, in its ''forward'' method, applies the ''ith'' member module to the ''ith'' input, and outputs a table of the set of outputs. + +Example: +<file lua> +mlp= nn.ParallelTable() +mlp:add(nn.Linear(10,2)) +mlp:add(nn.Linear(5,3)) + +require "lab" +x=lab.randn(10) +y=lab.rand(5) + +pred=mlp:forward{x,y} +for i,k in pairs(pred) do print(i,k); end +</file> +which gives the output: +<file lua> +1 + 0.0331 + 0.7003 +[torch.Tensor of dimension 2] + +2 + 0.0677 +-0.1657 +-0.7383 +[torch.Tensor of dimension 3] +</file> + +==== SplitTable ==== +{{anchor:nn.SplitTable}} + +''module'' = ''SplitTable(dimension)'' + +Creates a module that takes a Tensor as input and outputs several tables, splitting the Tensor along dimension ''dimension''. + +Example 1: +<file lua> +require "lab" +mlp=nn.SplitTable(2) +x=lab.randn(4,3) +pred=mlp:forward(x) +for i,k in pairs(pred) do print(i,k); end +</file> +gives the output: +<file lua> +1 + 1.3885 + 1.3295 + 0.4281 +-1.0171 +[torch.Tensor of dimension 4] + +2 +-1.1565 +-0.8556 +-1.0717 +-0.8316 +[torch.Tensor of dimension 4] + +3 +-1.3678 +-0.1709 +-0.0191 +-2.5871 +[torch.Tensor of dimension 4] +</file> + +Example 2: +<file lua> +require "lab" +mlp=nn.SplitTable(1) +pred=mlp:forward(lab.randn(10,3)) +for i,k in pairs(pred) do print(i,k); end +</file> +gives the output: +<file lua> +1 + 1.6114 + 0.9038 + 0.8419 +[torch.Tensor of dimension 3] + +2 + 2.4742 + 0.2208 + 1.6043 +[torch.Tensor of dimension 3] + +3 + 1.3415 + 0.2984 + 0.2260 +[torch.Tensor of dimension 3] + +4 + 2.0889 + 1.2309 + 0.0983 +[torch.Tensor of dimension 3] +</file> + +A more complicated example: +<file lua> +require "lab" + +mlp=nn.Sequential(); --Create a network that takes a Tensor as input +mlp:add(nn.SplitTable(2)) + c=nn.ParallelTable() --The two Tensors go through two different Linear + c:add(nn.Linear(10,3)) --Layers in Parallel + c:add(nn.Linear(10,7)) +mlp:add(c) --Outputing a table with 2 elements + p=nn.ParallelTable() --These tables go through two more linear layers + p:add(nn.Linear(3,2)) -- separately. + p:add(nn.Linear(7,1)) +mlp:add(p) +mlp:add(nn.JoinTable(1)) --Finally, the tables are joined together and output. + +pred=mlp:forward(lab.randn(10,2)) +print(pred) + +for i=1,100 do -- A few steps of training such a network.. + x=lab.ones(10,2); + y=torch.Tensor(3); y:copy(x:select(2,1,1):narrow(1,1,3)) + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + local err=criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.05); + + print(err) +end +</file> + +==== JoinTable ==== +{{anchor:nn.JoinTable}} + +''module'' = ''JoinTable(dimension)'' + +Creates a module that takes a list of Tensors as input and outputs a Tensor by joining them together along dimension ''dimension''. + +Example: +<file lua> +require "lab" +x=lab.randn(5,1) +y=lab.randn(5,1) +z=lab.randn(2,1) + +print(nn.JoinTable(1):forward{x,y}) +print(nn.JoinTable(2):forward{x,y}) +print(nn.JoinTable(1):forward{x,z}) +</file> +gives the output: +<file lua> +1.3965 + 0.5146 +-1.5244 +-0.9540 + 0.4256 + 0.1575 + 0.4491 + 0.6580 + 0.1784 +-1.7362 + + 1.3965 0.1575 + 0.5146 0.4491 +-1.5244 0.6580 +-0.9540 0.1784 + 0.4256 -1.7362 + + 1.3965 + 0.5146 +-1.5244 +-0.9540 + 0.4256 +-1.2660 + 1.0869 +[torch.Tensor of dimension 7x1] +</file> + +A more complicated example: +<file lua> +require "lab" + +mlp=nn.Sequential(); --Create a network that takes a Tensor as input + c=nn.ConcatTable() --The same Tensor goes through two different Linear + c:add(nn.Linear(10,3)) --Layers in Parallel + c:add(nn.Linear(10,7)) +mlp:add(c) --Outputing a table with 2 elements + p=nn.ParallelTable() --These tables go through two more linear layers + p:add(nn.Linear(3,2)) -- separately. + p:add(nn.Linear(7,1)) +mlp:add(p) +mlp:add(nn.JoinTable(1)) --Finally, the tables are joined together and output. + +pred=mlp:forward(lab.randn(10)) +print(pred) + +for i=1,100 do -- A few steps of training such a network.. + x=lab.ones(10); + y=torch.Tensor(3); y:copy(x:narrow(1,1,3)) + pred=mlp:forward(x) + + criterion= nn.MSECriterion() + local err=criterion:forward(pred,y) + local gradCriterion = criterion:backward(pred,y); + mlp:zeroGradParameters(); + mlp:backward(x, gradCriterion); + mlp:updateParameters(0.05); + + print(err) +end +</file> + +==== Identity ==== +{{anchor:nn.Identity}} + +''module'' = ''Identity()'' + +Creates a module that returns whatever is input to it as output. +This is useful when combined with the module +[[#nn.ParallelTable|ParallelTable]] +in case you do not wish to do anything to one of the input Tensors. +Example: +<file lua> +require "lab" +mlp=nn.Identity() +print(mlp:forward(lab.ones(5,2))) +</file> +gives the output: +<file lua> + 1 1 + 1 1 + 1 1 + 1 1 + 1 1 +[torch.Tensor of dimension 5x2] +</file> + +Here is a more useful example, where one can implement a network which also computes a Criterion using this module: +<file lua> +pred_mlp=nn.Sequential(); -- A network that makes predictions given x. +pred_mlp:add(nn.Linear(5,4)) +pred_mlp:add(nn.Linear(4,3)) + +xy_mlp=nn.ParallelTable();-- A network for predictions and for keeping the +xy_mlp:add(pred_mlp) -- true label for comparison with a criterion +xy_mlp:add(nn.Identity()) -- by forwarding both x and y through the network. + +mlp=nn.Sequential(); -- The main network that takes both x and y. +mlp:add(xy_mlp) -- It feeds x and y to parallel networks; +cr=nn.MSECriterion(); +cr_wrap=nn.CriterionTable(cr) +mlp:add(cr_wrap) -- and then applies the criterion. + +for i=1,100 do -- Do a few training iterations + x=lab.ones(5); -- Make input features. + y=torch.Tensor(3); + y:copy(x:narrow(1,1,3)) -- Make output label. + err=mlp:forward{x,y} -- Forward both input and output. + print(err) -- Print error from criterion. + + mlp:zeroGradParameters(); -- Do backprop... + mlp:backward({x, y} ); + mlp:updateParameters(0.05); +end +</file> + +==== PairwiseDistance ==== +{{anchor:nn.PairwiseDistance}} + +''module'' = ''PairwiseDistance(p)'' creates a module that takes a table of two vectors as input and outputs the distance between them using the ''p''-norm. + +Example: +<file lua> +mlp_l1=nn.PairwiseDistance(1) +mlp_l2=nn.PairwiseDistance(2) +x=lab.new(1,2,3) +y=lab.new(4,5,6) +print(mlp_l1:forward({x,y})) +print(mlp_l2:forward({x,y})) +</file> +gives the output: +<file lua> + 9 +[torch.Tensor of dimension 1] + + 5.1962 +[torch.Tensor of dimension 1] +</file> + +A more complicated example: +<file lua> +-- imagine we have one network we are interested in, it is called "p1_mlp" +p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) + +-- But we want to push examples towards or away from each other +-- so we make another copy of it called p2_mlp +-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage +-- that's why we create it again (so that the gradients of the pair don't wipe each other) +p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2)) +p2_mlp:get(1).weight:set(p1_mlp:get(1).weight) +p2_mlp:get(1).bias:set(p1_mlp:get(1).bias) + +-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp +prl = nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +-- now we define our top level network that takes this parallel table and computes the pairwise distance betweem +-- the pair of outputs +mlp= nn.Sequential() +mlp:add(prl) +mlp:add(nn.PairwiseDistance(1)) + +-- and a criterion for pushing together or pulling apart pairs +crit=nn.HingeEmbeddingCriterion(1) + +-- lets make two example vectors +x=lab.rand(5) +y=lab.rand(5) + + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) +local pred = mlp:forward(x) +local err = criterion:forward(pred, y) +local gradCriterion = criterion:backward(pred, y) +mlp:zeroGradParameters() +mlp:backward(x, gradCriterion) +mlp:updateParameters(learningRate) +end + +-- push the pair x and y together, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets smaller +for i=1,10 do +gradUpdate(mlp,{x,y},1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + + +-- pull apart the pair x and y, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets larger + +for i=1,10 do +gradUpdate(mlp,{x,y},-1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + +</file> + +==== DotProduct ==== +{{anchor:nn.DotProduct}} + +''module'' = ''DotProduct()'' creates a module that takes a table of two vectors as input and outputs the dot product between them. + +Example: +<file lua> +mlp=nn.DotProduct() +x=lab.new(1,2,3) +y=lab.new(4,5,6) +print(mlp:forward({x,y})) +</file> +gives the output: +<file lua> + 32 +[torch.Tensor of dimension 1] +</file> + + +A more complicated example: +<file lua> + +-- Train a ranking function so that mlp:forward({x,y},{x,z}) returns a number +-- which indicates whether x is better matched with y or z (larger score = better match), or vice versa. + +mlp1=nn.Linear(5,10) +mlp2=mlp1:clone('weight','bias') + +prl=nn.ParallelTable(); +prl:add(mlp1); prl:add(mlp2) + +mlp1=nn.Sequential() +mlp1:add(prl) +mlp1:add(nn.DotProduct()) + +mlp2=mlp1:clone('weight','bias') + +mlp=nn.Sequential() +prla=nn.ParallelTable() +prla:add(mlp1) +prla:add(mlp2) +mlp:add(prla) + +x=lab.rand(5); +y=lab.rand(5) +z=lab.rand(5) + + +print(mlp1:forward{x,x}) +print(mlp1:forward{x,y}) +print(mlp1:forward{y,y}) + + +crit=nn.MarginRankingCriterion(1); + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end + +inp={{x,y},{x,z}} + +math.randomseed(1) + +-- make the pair x and y have a larger dot product than x and z + +for i=1,100 do + gradUpdate(mlp,inp,1,crit,0.05) + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlp:forward{{x,y},{x,z}},1) + print(o1,o2,o) +end + +print "******************" + +-- make the pair x and z have a larger dot product than x and y + +for i=1,100 do + gradUpdate(mlp,inp,-1,crit,0.05) + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlp:forward{{x,y},{x,z}},-1) + print(o1,o2,o) +end +</file> + + +==== CosineDistance ==== +{{anchor:nn.CosineDistance}} + +''module'' = ''CosineDistance()'' creates a module that takes a table of two vectors as input and outputs the cosine distance between them. + +Example: +<file lua> +mlp=nn.CosineDistance() +x=lab.new(1,2,3) +y=lab.new(4,5,6) +print(mlp:forward({x,y})) +</file> +gives the output: +<file lua> + 0.9746 +[torch.Tensor of dimension 1] +</file> + +A more complicated example: +<file lua> + +-- imagine we have one network we are interested in, it is called "p1_mlp" +p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) + +-- But we want to push examples towards or away from each other +-- so we make another copy of it called p2_mlp +-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage +-- that's why we create it again (so that the gradients of the pair don't wipe each other) +p2_mlp= p1_mlp:clone('weight','bias') + +-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp +prl = nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +-- now we define our top level network that takes this parallel table and computes the cosine distance betweem +-- the pair of outputs +mlp= nn.Sequential() +mlp:add(prl) +mlp:add(nn.CosineDistance()) + + +-- lets make two example vectors +x=lab.rand(5) +y=lab.rand(5) + +-- Grad update function.. +function gradUpdate(mlp, x, y, learningRate) +local pred = mlp:forward(x) +if pred[1]*y < 1 then + gradCriterion=lab.new(-y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end +end + +-- push the pair x and y together, the distance should get larger.. +for i=1,1000 do + gradUpdate(mlp,{x,y},1,0.1) + if ((i%100)==0) then print(mlp:forward({x,y})[1]);end +end + + +-- pull apart the pair x and y, the distance should get smaller.. + +for i=1,1000 do + gradUpdate(mlp,{x,y},-1,0.1) + if ((i%100)==0) then print(mlp:forward({x,y})[1]);end +end +</file> + + + +==== CriterionTable ==== +{{anchor:nn.CriterionTable}} + +''module'' = ''CriterionTable(criterion)'' + +Creates a module that wraps a Criterion module so that it can accept a Table of inputs. Typically the table would contain two elements: the input and output ''x'' and ''y'' that the Criterion compares. + +Example: +<file lua> +mlp = nn.CriterionTable(nn.MSECriterion()) +require "lab" +x=lab.randn(5) +y=lab.randn(5) +print(mlp:forward{x,x}) +print(mlp:forward{x,y}) +</file> +gives the output: +<file lua> +0 +1.9028918413199 +</file> + +Here is a more complex example of embedding the criterion into a network: +<file lua> +require "lab" + +function table.print(t) + for i,k in pairs(t) do print(i,k); end +end + +mlp=nn.Sequential(); -- Create an mlp that takes input + main_mlp=nn.Sequential(); -- and output using ParallelTable + main_mlp:add(nn.Linear(5,4)) + main_mlp:add(nn.Linear(4,3)) + cmlp=nn.ParallelTable(); + cmlp:add(main_mlp) + cmlp:add(nn.Identity()) +mlp:add(cmlp) +mlp:add(nn.CriterionTable(nn.MSECriterion())) -- Apply the Criterion + +for i=1,20 do -- Train for a few iterations + x=lab.ones(5); + y=torch.Tensor(3); y:copy(x:narrow(1,1,3)) + err=mlp:forward{x,y} -- Pass in both input and output + print(err) + + mlp:zeroGradParameters(); + mlp:backward({x, y} ); + mlp:updateParameters(0.05); +end +</file> + +==== CAddTable ==== +{{anchor:nn.CAddTable}} + +Takes a table of tensors and outputs summation of all tensors. + +<file lua> +ii = {lab.ones(5),lab.ones(5)*2,lab.ones(5)*3} +=ii[1] + 1 + 1 + 1 + 1 + 1 +[torch.DoubleTensor of dimension 5] + +return ii[2] + 2 + 2 + 2 + 2 + 2 +[torch.DoubleTensor of dimension 5] + +return ii[3] + 3 + 3 + 3 + 3 + 3 +[torch.DoubleTensor of dimension 5] + +m=nn.CAddTable() +=m:forward(ii) + 6 + 6 + 6 + 6 + 6 +[torch.DoubleTensor of dimension 5] + + +==== CSubTable ==== +{{anchor:nn.CSubTable}} + +Takes a table with two tensor and returns the component-wise +subtraction between them. + +<file lua> +m=nn.CSubTable() +=m:forward({lab.ones(5)*2.2,lab.ones(5)}) + 1.2000 + 1.2000 + 1.2000 + 1.2000 + 1.2000 +[torch.DoubleTensor of dimension 5] +</file> + +==== CMulTable ==== +{{anchor:nn.CMulTable}} + +Takes a table of tensors and outputs the multiplication of all of them. + +<file lua> +ii = {lab.ones(5)*2,lab.ones(5)*3,lab.ones(5)*4} +m=nn.CMulTable() +=m:forward(ii) + 24 + 24 + 24 + 24 + 24 +[torch.DoubleTensor of dimension 5] + +</file> + +==== CDivTable ==== +{{anchor:nn.CDivTable}} + +Takes a table with two tensor and returns the component-wise +division between them. + +<file lua> +m=nn.CDivTable() +=m:forward({lab.ones(5)*2.2,lab.ones(5)*4.4}) + 0.5000 + 0.5000 + 0.5000 + 0.5000 + 0.5000 +[torch.DoubleTensor of dimension 5] +</file> + +====== Criterions ====== +{{anchor:nn.Criterions}} + +Criterions are helpful to train a neural network. Given an input and a +target, they compute a gradient according to a given loss +function. [[#nn.AbsCriterion|AbsCriterion]] and +[[#nn.MSECriterion|MSECriterion]] are perfect for regression problems, while +[[#nn.ClassNLLCriterion|ClassNLLCriterion]] is the criterion of choice when +dealing with classification. + +Criterions are [[..:torch:file#torch.file.serialization|serializable]]. + +===== Criterion ===== +{{anchor:nn.Criterion}} + +This is an abstract class which declares methods defined in all criterions. +This class is [[..:torch:file#torch.file.serialization|serializable]]. + +==== [output] forward(input, target) ==== +{{anchor:nn.Criterion.forward}} + +Given an ''input'' and a ''target'', compute the loss function associated to the criterion and return the +result. In general ''input'' and ''target'' are [[..:torch:tensor|tensors]], but some specific criterions +might require some other type of object. + +The ''output'' returned should be a scalar in general. + +The state variable [[#nn.Criterion.output|self.output]] should be updated after a call to ''forward()''. + +==== [gradInput] backward(input, target) ==== +{{anchor:nn.Criterion.backward}} + +Given an ''input'' and a ''target'', compute the gradients of the loss function associated to the criterion and +return the result.In general ''input'', ''target'' and ''gradInput'' are [[..:torch:tensor|tensors]], but some specific criterions +might require some other type of object. + +The state variable [[#nn.Criterion.gradInput|self.gradInput]] should be updated after a call to ''backward()''. + +==== State variable: output ==== +{{anchor:nn.Criterion.output}} + +State variable which contains the result of the last [[#nn.Criterion.forward|forward(input, target)]] call. + +==== State variable: gradInput ==== +{{anchor:nn.Criterion.gradInput}} + +State variable which contains the result of the last [[#nn.Criterion.backward|backward(input, target)]] call. + +===== AbsCriterion ===== +{{anchor:nn.AbsCriterion}} + +<file lua> +criterion = AbsCriterion() +</file> + +Creates a criterion that +measures the mean absolute value between ''n'' elements in the input ''x'' +and output ''y'': + +''loss(x,y)'' = ''1/n \sum |x_i-y_i|''. + +If ''x'' and ''y'' are ''d''-dimensional Tensors with a total of ''n'' elements, +the sum operation still operates over all the elements, and divides by ''n''. + +The division by ''n'' can be avoided if one sets the internal variable ''sizeAverage'' to ''false'': +<file lua> +criterion = nn.AbsCriterion() +criterion.sizeAverage = false +</file> + +===== ClassNLLCriterion ===== +{{anchor:nn.ClassNLLCriterion}} + +<file lua> +criterion = ClassNLLCriterion() +</file> + +The negative log likelihood criterion. It is useful to train a classication +problem with ''n'' classes. The ''input'' given through a ''forward()'' is +expected to contain //log-probabilities// of each class: ''input'' has to be a +1D tensor of size ''n''. Obtaining log-probabilities in a neural network is +easily achieved by adding a [[#nn.LogSoftMax|LogSoftMax]] layer in the last +layer of your neural network. + +This criterion expect a class index (1 to the number of class) as ''target'' +when calling [[#nn.CriterionForward|forward(input, target)]] and +[[#nn.CriterionBackward|backward(input, target)]]. + +The loss can be described as: +<file lua> +loss(x, class) = forward(x, class) = -x[class] +</file> + +The following is a code fragment showing how to make a gradient step +given an input ''x'', a desired output ''y'' (an integer ''1'' to ''n'', +in this case ''n'' = ''2'' classes), +a network ''mlp'' and a learning rate ''learningRate'': +<file lua> +function gradUpdate(mlp,x,y,learningRate) + local criterion = nn.ClassNLLCriterion() + pred = mlp:forward(x) + local err = criterion:forward(pred, y); + mlp:zeroGradParameters(); + local t = criterion:backward(pred, y); + mlp:backward(x, t); + mlp:updateParameters(learningRate); +end +</file> + +===== MarginCriterion ===== +{{anchor:nn.MarginCriterion}} + +<file lua> +criterion = MarginCriterion() +</file> + +Creates a criterion that optimizes a two-class classification hinge loss (margin-based loss) between input ''x'' (a Tensor of dimension 1) and output ''y'' (which is a scalar, either 1 or -1) : + +<file lua> +loss(x,y) = forward(x,y) = max(0,m- y x). +</file> + +''m'' is the margin, which is by default 1. + +<file lua> +criterion = MarginCriterion(marginValue) +</file> + +sets a different value of ''m''. + + +Example: +<file lua> +require "nn" +require "lab" + +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end + +mlp=nn.Sequential() +mlp:add(nn.Linear(5,1)) + +x1=lab.rand(5) +x2=lab.rand(5) +criterion=nn.MarginCriterion(1) + +for i=1,1000 do + gradUpdate(mlp,x1,1,criterion,0.01) + gradUpdate(mlp,x2,-1,criterion,0.01) +end + +print(mlp:forward(x1)) +print(mlp:forward(x2)) + +print(criterion:forward(mlp:forward(x1),1)) +print(criterion:forward(mlp:forward(x2),-1)) +</file> +gives the output: +<file lua> + 1.0043 +[torch.Tensor of dimension 1] + + +-1.0061 +[torch.Tensor of dimension 1] + +0 +0 +</file> +i.e. the mlp successfully separates the two data points such that they both have a margin of 1, and hence a loss of 0. + +===== MSECriterion ===== +{{anchor:nn.MSECriterion}} + +<file lua> +criterion = MSECriterion() +</file> + +Creates a criterion that measures the mean squared error between ''n'' elements in the input ''x'' +and output ''y'': + +<file lua> +loss(x,y) = forward(x,y) = 1/n \sum |x_i-y_i|^2 . +</file> + +If ''x'' and ''y'' are ''d''-dimensional Tensors with a total of ''n'' elements, +the sum operation still operates over all the elements, and divides by ''n''. The two tensors must +have the same number of elements (but their sizes might be different...) + +The division by ''n'' can be avoided if one sets the internal variable ''sizeAverage'' to ''false'': +<file lua> +criterion = nn.MSECriterion() +criterion.sizeAverage = false +</file> + +===== MultiCriterion ===== +{{anchor:nn.MultiCriterion}} + +<file lua> +criterion = MultiCriterion() +</file> + +This returns a Criterion which is a weighted sum of other Criterion. +Criterions are added using the method: + +''criterion:add(singleCriterion, weight)'' + +where ''weight'' is a scalar. + + +===== HingeEmbeddingCriterion ===== +{{anchor:nn.HingeEmbeddingCriterion}} + +<file lua> +criterion = HingeEmbeddingCriterion() +</file> + +Creates a criterion that measures the loss given an input +''x'' which is a 1-dimensional vector and a label ''y'' (1 or -1). +This is usually used for measuring whether two inputs are similar +or dissimilar, e.g. using the L1 pairwise distance, +and is typically used for +learning nonlinear embeddings or semi-supervised learning. + +<verbatim> +loss(x,y) = forward(x,y) = x, if y=1 += max(0,margin - x), if y=-1 +</verbatim> + +The ''margin'' has a default value of 1, or can be set in the constructor: +<file lua> +criterion = HingeEmbeddingCriterion(marginValue) +</file> + +Example use: +<file lua> +-- imagine we have one network we are interested in, it is called "p1_mlp" +p1_mlp= nn.Sequential(); p1_mlp:add(nn.Linear(5,2)) + +-- But we want to push examples towards or away from each other +-- so we make another copy of it called p2_mlp +-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage +-- that's why we create it again (so that the gradients of the pair don't wipe each other) +p2_mlp= nn.Sequential(); p2_mlp:add(nn.Linear(5,2)) +p2_mlp:get(1).weight:set(p1_mlp:get(1).weight) +p2_mlp:get(1).bias:set(p1_mlp:get(1).bias) + +-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) mlp +prl = nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +-- now we define our top level network that takes this parallel table and computes the pairwise distance betweem +-- the pair of outputs +mlp= nn.Sequential() +mlp:add(prl) +mlp:add(nn.PairwiseDistance(1)) + +-- and a criterion for pushing together or pulling apart pairs +crit=nn.HingeEmbeddingCriterion(1) + +-- lets make two example vectors +x=lab.rand(5) +y=lab.rand(5) + + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) +local pred = mlp:forward(x) +local err = criterion:forward(pred, y) +local gradCriterion = criterion:backward(pred, y) +mlp:zeroGradParameters() +mlp:backward(x, gradCriterion) +mlp:updateParameters(learningRate) +end + +-- push the pair x and y together, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets smaller +for i=1,10 do +gradUpdate(mlp,{x,y},1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + + +-- pull apart the pair x and y, notice how then the distance between them given +-- by print(mlp:forward({x,y})[1]) gets larger + +for i=1,10 do +gradUpdate(mlp,{x,y},-1,crit,0.01) +print(mlp:forward({x,y})[1]) +end + +</file> + +===== L1HingeEmbeddingCriterion ===== +{{anchor:nn.L1HingeEmbeddingCriterion}} + +<file lua> +criterion = L1HingeEmbeddingCriterion(margin) +</file> + +Creates a criterion that measures the loss given an input +''x'' = ''{x1,x2}'', a table of two tensors, and a label ''y'' (1 or -1): +This is used for measuring whether two inputs are similar +or dissimilar, using the L1 distance, and is typically used for +learning nonlinear embeddings or semi-supervised learning. + +<verbatim> +loss(x,y) = forward(x,y) = ||x1-x2||_1, if y=1 += max(0,margin - ||x1-x2||_1), if y=-1 +</verbatim> + +The ''margin'' has a default value of 1, or can be set in the constructor: +<file lua> +criterion = L1HingeEmbeddingCriterion(marginValue) +</file> + +===== CosineEmbeddingCriterion ===== +{{anchor:nn.CosineEmbeddingCriterion}} + +<file lua> +criterion = nn.CosineEmbeddingCriterion(margin) +</file> + +Creates a criterion that measures the loss given an input +''x'' = ''{x1,x2}'', a table of two tensors, and a label ''y'' (1 or -1): +This is used for measuring whether two inputs are similar +or dissimilar, using the cosine distance, and is typically used for +learning nonlinear embeddings or semi-supervised learning. + +''margin'' should be a number from -1 to 1, 0 to 0.5 is suggested. +Forward and Backward have to be used alternately. If ''margin'' is missing, the default value is 0. + +The loss function is: +<verbatim> +loss(x,y) = forward(x,y) = 1-cos(x1, x2), if y=1 += max(0,cos(x1, x2)-margin), if y=-1 +</verbatim> + +===== MarginRankingCriterion ===== +{{anchor:nn.MarginRankingCriterion}} + +<file lua> +criterion = nn.MarginRankingCriterion(margin) +</file> + +Creates a criterion that measures the loss given an input +''x'' = ''{x1,x2}'', a table of two Tensors of size 1 (they contain only scalars), +and a label ''y'' (1 or -1): + +If ''y'' = ''1'' then it assumed the first input should be ranked higher (have a larger value) +than the second input, and vice-versa for ''y'' = ''-1''. + +The loss function is: +<verbatim> +loss(x,y) = forward(x,y) = max(0,-y*(x[1]-x[2])+margin) +</verbatim> + +Example: +<file lua> + +p1_mlp= nn.Linear(5,2) +p2_mlp= p1_mlp:clone('weight','bias') + +prl=nn.ParallelTable() +prl:add(p1_mlp) +prl:add(p2_mlp) + +mlp1=nn.Sequential() +mlp1:add(prl) +mlp1:add(nn.DotProduct()) + +mlp2=mlp1:clone('weight','bias') + +mlpa=nn.Sequential() +prla=nn.ParallelTable() +prla:add(mlp1) +prla:add(mlp2) +mlpa:add(prla) + +crit=nn.MarginRankingCriterion(0.1) + +x=lab.randn(5) +y=lab.randn(5) +z=lab.randn(5) + + +-- Use a typical generic gradient update function +function gradUpdate(mlp, x, y, criterion, learningRate) + local pred = mlp:forward(x) + local err = criterion:forward(pred, y) + local gradCriterion = criterion:backward(pred, y) + mlp:zeroGradParameters() + mlp:backward(x, gradCriterion) + mlp:updateParameters(learningRate) +end + +for i=1,100 do + gradUpdate(mlpa,{{x,y},{x,z}},1,crit,0.01) + if true then + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlpa:forward{{x,y},{x,z}},1) + print(o1,o2,o) + end +end + +print "--" + +for i=1,100 do + gradUpdate(mlpa,{{x,y},{x,z}},-1,crit,0.01) + if true then + o1=mlp1:forward{x,y}[1]; + o2=mlp2:forward{x,z}[1]; + o=crit:forward(mlpa:forward{{x,y},{x,z}},-1) + print(o1,o2,o) + end +end +</file> + +====== Training a neural network ====== +{{anchor:nn.traningneuralnet.dok}} + +Training a neural network is easy with a [[#nn.DoItYourself|simple ''for'' loop]]. +While doing your own loop provides great flexibility, you might +want sometimes a quick way of training neural +networks. [[#nn.StochasticGradient|StochasticGradient]], a simple class +which does the job for you is provided as standard. + +===== StochasticGradient ===== +{{anchor:nn.StochasticGradient.dok}} + +''StochasticGradient'' is a high-level class for training [[#nn.Module|neural networks]], using a stochastic gradient +algorithm. This class is [[..:torch:file#torch.file.serialization|serializable]]. + +==== StochasticGradient(module, criterion) ==== +{{anchor:nn.StochasticGradient}} + +Create a ''StochasticGradient'' class, using the given [[#nn.Module|Module]] and [[#nn.Criterion|Criterion]]. +The class contains [[#nn.StochasticGradientParameters|several parameters]] you might want to set after initialization. + +==== train(dataset) ==== +{{anchor:nn.StochasticGradientTrain}} + +Train the module and criterion given in the +[[#nn.StochasticGradient|constructor]] over ''dataset'', using the +internal [[#nn.StochasticGradientParameters|parameters]]. + +StochasticGradient expect as a ''dataset'' an object which implements the operator +''dataset[index]'' and implements the method ''dataset:size()''. The ''size()'' methods +returns the number of examples and ''dataset[i]'' has to return the i-th example. + +An ''example'' has to be an object which implements the operator +''example[field]'', where ''field'' might take the value ''1'' (input features) +or ''2'' (corresponding label which will be given to the criterion). +The input is usually a Tensor (except if you use special kind of gradient modules, +like [[#nn.TableLayers|table layers]]). The label type depends of the criterion. +For example, the [[#nn.MSECriterion|MSECriterion]] expects a Tensor, but the +[[#nn.ClassNLLCriterion|ClassNLLCriterion]] except a integer number (the class). + +Such a dataset is easily constructed by using Lua tables, but it could any ''C'' object +for example, as long as required operators/methods are implemented. +[[#nn.DoItStochasticGradient|See an example]]. + +==== Parameters ==== +{{anchor:nn.StochasticGradientParameters}} + +''StochasticGradient'' has several field which have an impact on a call to [[#nn.StochasticGradientTrain|train()]]. + + * ''learningRate'': This is the learning rate used during training. The update of the parameters will be ''parameters = parameters - learningRate * parameters_gradient''. Default value is ''0.01''. + * ''learningRateDecay'': The learning rate decay. If non-zero, the learning rate (note: the field learningRate will not change value) will be computed after each iteration (pass over the dataset) with: ''current_learning_rate =learningRate / (1 + iteration * learningRateDecay)'' + * ''maxIteration'': The maximum number of iteration (passes over the dataset). Default is ''25''. + * ''shuffleIndices'': Boolean which says if the examples will be randomly sampled or not. Default is ''true''. If ''false'', the examples will be taken in the order of the dataset. + * ''hookExample'': A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes ''(self, example)'' as parameters. Default is ''nil''. + * ''hookIteration'': A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes ''(self, iteration)'' as parameters. Default is ''nil''. + +===== Example of training using StochasticGradient ===== +{{anchor:nn.DoItStochasticGradient}} + +We show an example here on a classical XOR problem. + +**Dataset** + +We first need to create a dataset, following the conventions described in +[[#nn.StochasticGradientTrain|StochasticGradient]]. +<file lua> +require "lab" +dataset={}; +function dataset:size() return 100 end -- 100 examples +for i=1,dataset:size() do + local input = lab.randn(2); -- normally distributed example in 2d + local output = torch.Tensor(1); + if input[1]*input[2]>0 then -- calculate label for XOR function + output[1] = -1; + else + output[1] = 1 + end + dataset[i] = {input, output} +end +</file> + +**Neural Network** + +We create a simple neural network with one hidden layer. +<file lua> +require "nn" +mlp = nn.Sequential(); -- make a multi-layer perceptron +inputs = 2; outputs = 1; HUs = 20; -- parameters +mlp:add(nn.Linear(inputs, HUs)) +mlp:add(nn.Tanh()) +mlp:add(nn.Linear(HUs, outputs)) +</file> + +**Training** + +We choose the Mean Squared Error criterion and train the beast. +<file lua> +criterion = nn.MSECriterion() +trainer = nn.StochasticGradient(mlp, criterion) +trainer.learningRate = 0.01 +trainer:train(dataset) +</file> + +**Test the network** + +<file lua> +x = torch.Tensor(2) +x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) +</file> + +You should see something like: +<file lua> +> x = torch.Tensor(2) +> x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) + +-0.3490 +[torch.Tensor of dimension 1] + +> x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) + + 1.0561 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) + + 0.8640 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) + +-0.2941 +[torch.Tensor of dimension 1] +</file> + +===== Example of manual training of a neural network ===== +{{anchor:nn.DoItYourself}} + +We show an example here on a classical XOR problem. + +**Neural Network** + +We create a simple neural network with one hidden layer. +<file lua> +require "nn" +mlp = nn.Sequential(); -- make a multi-layer perceptron +inputs = 2; outputs = 1; HUs = 20; -- parameters +mlp:add(nn.Linear(inputs, HUs)) +mlp:add(nn.Tanh()) +mlp:add(nn.Linear(HUs, outputs)) +</file> + +**Loss function** + +We choose the Mean Squared Error criterion. +<file lua> +criterion = nn.MSECriterion() +</file> + +**Training** + +We create data //on the fly// and feed it to the neural network. + +<file lua> +require "lab" +for i = 1,2500 do + -- random sample + local input= lab.randn(2); -- normally distributed example in 2d + local output= torch.Tensor(1); + if input[1]*input[2] > 0 then -- calculate label for XOR function + output[1] = -1 + else + output[1] = 1 + end + + -- feed it to the neural network and the criterion + criterion:forward(mlp:forward(input), output) + + -- train over this example in 3 steps + -- (1) zero the accumulation of the gradients + mlp:zeroGradParameters() + -- (2) accumulate gradients + mlp:backward(input, criterion:backward(mlp.output, output)) + -- (3) update parameters with a 0.01 learning rate + mlp:updateParameters(0.01) +end +</file> + +**Test the network** + +<file lua> +x = torch.Tensor(2) +x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) +x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) +</file> + +You should see something like: +<file lua> +> x = torch.Tensor(2) +> x[1] = 0.5; x[2] = 0.5; print(mlp:forward(x)) + +-0.6140 +[torch.Tensor of dimension 1] + +> x[1] = 0.5; x[2] = -0.5; print(mlp:forward(x)) + + 0.8878 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = 0.5; print(mlp:forward(x)) + + 0.8548 +[torch.Tensor of dimension 1] + +> x[1] = -0.5; x[2] = -0.5; print(mlp:forward(x)) + +-0.5498 +[torch.Tensor of dimension 1] +</file> |