Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/torch/nn.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSoumith Chintala <soumith@gmail.com>2015-08-12 07:04:45 +0300
committerSoumith Chintala <soumith@gmail.com>2015-08-12 07:04:45 +0300
commit14599d495549fc0b907dccdb44dc118723319dac (patch)
treea503ebb64143c27ac9afede86421f5273f579889
parent0f5c1cc069519817aee8c22fb16d2af875236fb0 (diff)
parent0e05ac975476fff3ecf75894595d60ba04b5e0d6 (diff)
Merge pull request #348 from nicholas-leonard/readthedocs
nn.readthedocs.org
-rw-r--r--README.md2
-rw-r--r--doc/containers.md35
-rwxr-xr-xdoc/convolution.md91
-rwxr-xr-xdoc/criterion.md82
-rw-r--r--doc/index.md23
-rwxr-xr-xdoc/module.md48
-rw-r--r--doc/overview.md14
-rwxr-xr-xdoc/simple.md131
-rwxr-xr-xdoc/table.md77
-rw-r--r--doc/training.md14
-rwxr-xr-xdoc/transfer.md32
-rw-r--r--mkdocs.yml18
12 files changed, 308 insertions, 259 deletions
diff --git a/README.md b/README.md
index 907be66..378a440 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
[![Build Status](https://travis-ci.org/torch/nn.svg?branch=master)](https://travis-ci.org/torch/nn)
-<a name="nn.dok"/>
+<a name="nn.dok"></a>
# Neural Network Package #
This package provides an easy and modular way to build and train simple or complex neural networks using [Torch](https://github.com/torch/torch7/blob/master/README.md):
diff --git a/doc/containers.md b/doc/containers.md
index d691f41..9a83607 100644
--- a/doc/containers.md
+++ b/doc/containers.md
@@ -1,15 +1,16 @@
-<a name="nn.Containers"/>
+<a name="nn.Containers"></a>
# Containers #
Complex neural networks are easily built using container classes:
- * [Container](#nn.Container) : abstract class inherited by containers ;
- * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ;
- * [Parallel](#nn.Parallel) : applies its `ith` child module to the `ith` slice of the input Tensor ;
- * [Concat](#nn.Concat) : concatenates in one layer several modules along dimension `dim` ;
- * [DepthConcat](#nn.DepthConcat) : like Concat, but adds zero-padding when non-`dim` sizes don't match;
+
+ * [Container](#nn.Container) : abstract class inherited by containers ;
+ * [Sequential](#nn.Sequential) : plugs layers in a feed-forward fully connected manner ;
+ * [Parallel](#nn.Parallel) : applies its `ith` child module to the `ith` slice of the input Tensor ;
+ * [Concat](#nn.Concat) : concatenates in one layer several modules along dimension `dim` ;
+ * [DepthConcat](#nn.DepthConcat) : like Concat, but adds zero-padding when non-`dim` sizes don't match;
See also the [Table Containers](#nn.TableContainers) for manipulating tables of [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md).
-<a name="nn.Container"/>
+<a name="nn.Container"></a>
## Container ##
This is an abstract [Module](module.md#nn.Module) class which declares methods defined in all containers.
@@ -17,19 +18,19 @@ It reimplements many of the Module methods such that calls are propagated to the
contained modules. For example, a call to [zeroGradParameters](module.md#nn.Module.zeroGradParameters)
will be propagated to all contained modules.
-<a name="nn.Container.add"/>
+<a name="nn.Container.add"></a>
### add(module) ###
Adds the given `module` to the container. The order is important
-<a name="nn.Container.get"/>
+<a name="nn.Container.get"></a>
### get(index) ###
Returns the contained modules at index `index`.
-<a name="nn.Container.size"/>
+<a name="nn.Container.size"></a>
### size() ###
Returns the number of contained modules.
-<a name="nn.Sequential"/>
+<a name="nn.Sequential"></a>
## Sequential ##
Sequential provides a means to plug layers together
@@ -51,7 +52,7 @@ which gives the output:
[torch.Tensor of dimension 1]
```
-<a name="nn.Sequential.remove"/>
+<a name="nn.Sequential.remove"></a>
### remove([index]) ###
Remove the module at the given `index`. If `index` is not specified, remove the last layer.
@@ -71,7 +72,7 @@ nn.Sequential {
```
-<a name="nn.Sequential.insert"/>
+<a name="nn.Sequential.insert"></a>
### insert(module, [index]) ###
Inserts the given `module` at the given `index`. If `index` is not specified, the incremented length of the sequence is used and so this is equivalent to use `add(module)`.
@@ -92,7 +93,7 @@ nn.Sequential {
-<a name="nn.Parallel"/>
+<a name="nn.Parallel"></a>
## Parallel ##
`module` = `Parallel(inputDimension,outputDimension)`
@@ -149,7 +150,7 @@ end
```
-<a name="nn.Concat"/>
+<a name="nn.Concat"></a>
## Concat ##
```lua
@@ -179,7 +180,7 @@ which gives the output:
[torch.Tensor of dimension 10]
```
-<a name="nn.DepthConcat"/>
+<a name="nn.DepthConcat"></a>
## DepthConcat ##
```lua
@@ -273,7 +274,7 @@ module output tensors non-`dim` sizes aren't all odd or even.
Such that in order to keep the mappings aligned, one need
only ensure that these be all odd (or even).
-<a name="nn.TableContainers"/>
+<a name="nn.TableContainers"></a>
## Table Containers ##
While the above containers are used for manipulating input [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md), table containers are used for manipulating tables :
* [ConcatTable](table.md#nn.ConcatTable)
diff --git a/doc/convolution.md b/doc/convolution.md
index 8d9e77b..54b8da9 100755
--- a/doc/convolution.md
+++ b/doc/convolution.md
@@ -1,31 +1,32 @@
-<a name="nn.convlayers.dok"/>
+<a name="nn.convlayers.dok"></a>
# Convolutional layers #
A convolution is an integral that expresses the amount of overlap of one function `g` as it is shifted over another function `f`. It therefore "blends" one function with another. The neural network package supports convolution, pooling, subsampling and other relevant facilities. These are divided base on the dimensionality of the input and output [Tensors](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor):
- * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship
+
+ * [Temporal Modules](#nn.TemporalModules) apply to sequences with a one-dimensional relationship
(e.g. sequences of words, phonemes and letters. Strings of some kind).
- * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ;
- * [TemporalSubSampling](#nn.TemporalSubSampling) : a 1D sub-sampling over an input sequence ;
- * [TemporalMaxPooling](#nn.TemporalMaxPooling) : a 1D max-pooling operation over an input sequence ;
- * [LookupTable](#nn.LookupTable) : a convolution of width `1`, commonly used for word embeddings ;
- * [Spatial Modules](#nn.SpatialModules) apply to inputs with two-dimensional relationships (e.g. images):
- * [SpatialConvolution](#nn.SpatialConvolution) : a 2D convolution over an input image ;
- * [SpatialSubSampling](#nn.SpatialSubSampling) : a 2D sub-sampling over an input image ;
- * [SpatialMaxPooling](#nn.SpatialMaxPooling) : a 2D max-pooling operation over an input image ;
- * [SpatialAveragePooling](#nn.SpatialAveragePooling) : a 2D average-pooling operation over an input image ;
- * [SpatialAdaptiveMaxPooling](#nn.SpatialAdaptiveMaxPooling) : a 2D max-pooling operation which adapts its parameters dynamically such that the output is of fixed size ;
- * [SpatialLPPooling](#nn.SpatialLPPooling) : computes the `p` norm in a convolutional manner on a set of input images ;
- * [SpatialConvolutionMap](#nn.SpatialConvolutionMap) : a 2D convolution that uses a generic connection table ;
- * [SpatialZeroPadding](#nn.SpatialZeroPadding) : padds a feature map with specified number of zeros ;
- * [SpatialSubtractiveNormalization](#nn.SpatialSubtractiveNormalization) : a spatial subtraction operation on a series of 2D inputs using
- * [SpatialBatchNormalization](#nn.SpatialBatchNormalization): mean/std normalization over the mini-batch inputs and pixels, with an optional affine transform that follows
+ * [TemporalConvolution](#nn.TemporalConvolution) : a 1D convolution over an input sequence ;
+ * [TemporalSubSampling](#nn.TemporalSubSampling) : a 1D sub-sampling over an input sequence ;
+ * [TemporalMaxPooling](#nn.TemporalMaxPooling) : a 1D max-pooling operation over an input sequence ;
+ * [LookupTable](#nn.LookupTable) : a convolution of width `1`, commonly used for word embeddings ;
+ * [Spatial Modules](#nn.SpatialModules) apply to inputs with two-dimensional relationships (e.g. images):
+ * [SpatialConvolution](#nn.SpatialConvolution) : a 2D convolution over an input image ;
+ * [SpatialSubSampling](#nn.SpatialSubSampling) : a 2D sub-sampling over an input image ;
+ * [SpatialMaxPooling](#nn.SpatialMaxPooling) : a 2D max-pooling operation over an input image ;
+ * [SpatialAveragePooling](#nn.SpatialAveragePooling) : a 2D average-pooling operation over an input image ;
+ * [SpatialAdaptiveMaxPooling](#nn.SpatialAdaptiveMaxPooling) : a 2D max-pooling operation which adapts its parameters dynamically such that the output is of fixed size ;
+ * [SpatialLPPooling](#nn.SpatialLPPooling) : computes the `p` norm in a convolutional manner on a set of input images ;
+ * [SpatialConvolutionMap](#nn.SpatialConvolutionMap) : a 2D convolution that uses a generic connection table ;
+ * [SpatialZeroPadding](#nn.SpatialZeroPadding) : padds a feature map with specified number of zeros ;
+ * [SpatialSubtractiveNormalization](#nn.SpatialSubtractiveNormalization) : a spatial subtraction operation on a series of 2D inputs using
+ * [SpatialBatchNormalization](#nn.SpatialBatchNormalization): mean/std normalization over the mini-batch inputs and pixels, with an optional affine transform that follows
a kernel for computing the weighted average in a neighborhood ;
- * [Volumetric Modules](#nn.VolumetricModules) apply to inputs with three-dimensional relationships (e.g. videos) :
- * [VolumetricConvolution](#nn.VolumetricConvolution) : a 3D convolution over an input video (a sequence of images) ;
- * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video.
- * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video.
+ * [Volumetric Modules](#nn.VolumetricModules) apply to inputs with three-dimensional relationships (e.g. videos) :
+ * [VolumetricConvolution](#nn.VolumetricConvolution) : a 3D convolution over an input video (a sequence of images) ;
+ * [VolumetricMaxPooling](#nn.VolumetricMaxPooling) : a 3D max-pooling operation over an input video.
+ * [VolumetricAveragePooling](#nn.VolumetricAveragePooling) : a 3D average-pooling operation over an input video.
-<a name="nn.TemporalModules"/>
+<a name="nn.TemporalModules"></a>
## Temporal Modules ##
Excluding an optional first batch dimension, temporal layers expect a 2D Tensor as input. The
first dimension is the number of frames in the sequence (e.g. `nInputFrame`), the last dimenstion
@@ -35,7 +36,7 @@ of dimensions, although the size of each dimension may change. These are commonl
Note: The [LookupTable](#nn.LookupTable) is special in that while it does output a temporal Tensor of size `nOutputFrame x outputFrameSize`,
its input is a 1D Tensor of indices of size `nIndices`. Again, this is excluding the option first batch dimension.
-<a name="nn.TemporalConvolution"/>
+<a name="nn.TemporalConvolution"></a>
## TemporalConvolution ##
```lua
@@ -121,7 +122,7 @@ which gives:
-0.63871422284166
```
-<a name="nn.TemporalMaxPooling"/>
+<a name="nn.TemporalMaxPooling"></a>
## TemporalMaxPooling ##
```lua
@@ -139,7 +140,7 @@ If the input sequence is a 2D tensor of dimension `nInputFrame x inputFrameSize`
nOutputFrame = (nInputFrame - kW) / dW + 1
```
-<a name="nn.TemporalSubSampling"/>
+<a name="nn.TemporalSubSampling"></a>
## TemporalSubSampling ##
```lua
@@ -175,7 +176,7 @@ The output value of the layer can be precisely described as:
output[i][t] = bias[i] + weight[i] * sum_{k=1}^kW input[i][dW*(t-1)+k)]
```
-<a name="nn.LookupTable"/>
+<a name="nn.LookupTable"></a>
## LookupTable ##
```lua
@@ -253,13 +254,13 @@ Outputs something like:
[torch.DoubleTensor of dimension 2x4x3]
```
-<a name="nn.SpatialModules"/>
+<a name="nn.SpatialModules"></a>
## Spatial Modules ##
Excluding and optional batch dimension, spatial layers expect a 3D Tensor as input. The
first dimension is the number of features (e.g. `frameSize`), the last two dimenstions
are spatial (e.g. `height x width`). These are commonly used for processing images.
-<a name="nn.SpatialConvolution"/>
+<a name="nn.SpatialConvolution"></a>
### SpatialConvolution ###
```lua
@@ -303,7 +304,7 @@ output[i][j][k] = bias[k]
```
-<a name="nn.SpatialConvolutionMap"/>
+<a name="nn.SpatialConvolutionMap"></a>
### SpatialConvolutionMap ###
```lua
@@ -317,7 +318,7 @@ connection table between input and output features. The
using a [full connection table](#nn.tables.full). One can specify
different types of connection tables.
-<a name="nn.tables.full"/>
+<a name="nn.tables.full"></a>
#### Full Connection Table ####
```lua
@@ -327,7 +328,7 @@ table = nn.tables.full(nin,nout)
This is a precomputed table that specifies connections between every
input and output node.
-<a name="nn.tables.onetoone"/>
+<a name="nn.tables.onetoone"></a>
#### One to One Connection Table ####
```lua
@@ -337,7 +338,7 @@ table = nn.tables.oneToOne(n)
This is a precomputed table that specifies a single connection to each
output node from corresponding input node.
-<a name="nn.tables.random"/>
+<a name="nn.tables.random"></a>
#### Random Connection Table ####
```lua
@@ -348,7 +349,7 @@ This table is randomly populated such that each output unit has
`nto` incoming connections. The algorihtm tries to assign uniform
number of outgoing connections to each input node if possible.
-<a name="nn.SpatialLPPooling"/>
+<a name="nn.SpatialLPPooling"></a>
### SpatialLPPooling ###
```lua
@@ -357,7 +358,7 @@ module = nn.SpatialLPPooling(nInputPlane, pnorm, kW, kH, [dW], [dH])
Computes the `p` norm in a convolutional manner on a set of 2D input planes.
-<a name="nn.SpatialMaxPooling"/>
+<a name="nn.SpatialMaxPooling"></a>
### SpatialMaxPooling ###
```lua
@@ -379,7 +380,7 @@ oheight = op((height + 2*padH - kH) / dH + 1)
`op` is a rounding operator. By default, it is `floor`. It can be changed
by calling `:ceil()` or `:floor()` methods.
-<a name="nn.SpatialAveragePooling"/>
+<a name="nn.SpatialAveragePooling"></a>
### SpatialAveragePooling ###
```lua
@@ -390,7 +391,7 @@ Applies 2D average-pooling operation in `kWxkH` regions by step size
`dWxdH` steps. The number of output features is equal to the number of
input planes.
-<a name="nn.SpatialAdaptiveMaxPooling"/>
+<a name="nn.SpatialAdaptiveMaxPooling"></a>
### SpatialAdaptiveMaxPooling ###
```lua
@@ -413,7 +414,7 @@ y_i_start = floor((i /oheight) * iheight)
y_i_end = ceil(((i+1)/oheight) * iheight)
```
-<a name="nn.SpatialSubSampling"/>
+<a name="nn.SpatialSubSampling"></a>
### SpatialSubSampling ###
```lua
@@ -454,7 +455,7 @@ output[i][j][k] = bias[k]
+ weight[k] sum_{s=1}^kW sum_{t=1}^kH input[dW*(i-1)+s)][dH*(j-1)+t][k]
```
-<a name="nn.SpatialUpSamplingNearest"/>
+<a name="nn.SpatialUpSamplingNearest"></a>
### SpatialUpSamplingNearest ###
```lua
@@ -475,7 +476,7 @@ output(u,v) = input(floor((u-1)/scale)+1, floor((v-1)/scale)+1)
Where `u` and `v` are index from 1 (as per lua convention). There are no learnable parameters.
-<a name="nn.SpatialZeroPadding"/>
+<a name="nn.SpatialZeroPadding"></a>
### SpatialZeroPadding ###
```lua
@@ -485,7 +486,7 @@ module = nn.SpatialZeroPadding(padLeft, padRight, padTop, padBottom)
Each feature map of a given input is padded with specified number of
zeros. If padding values are negative, then input is cropped.
-<a name="nn.SpatialSubtractiveNormalization"/>
+<a name="nn.SpatialSubtractiveNormalization"></a>
### SpatialSubtractiveNormalization ###
```lua
@@ -522,7 +523,7 @@ w2=image.display(processed)
```
![](image/lena.jpg)![](image/lenap.jpg)
-<a name="nn.SpatialBatchNormalization"/>
+<a name="nn.SpatialBatchNormalization"></a>
## SpatialBatchNormalization ##
`module` = `nn.SpatialBatchNormalization(N [,eps] [, momentum] [,affine])`
@@ -565,13 +566,13 @@ A = torch.randn(b, m, h, w)
C = model.forward(A) -- C will be of size `b x m x h x w`
```
-<a name="nn.VolumetricModules"/>
+<a name="nn.VolumetricModules"></a>
## Volumetric Modules ##
Excluding and optional batch dimension, volumetric layers expect a 4D Tensor as input. The
first dimension is the number of features (e.g. `frameSize`), the second is sequential (e.g. `time`) and the
last two dimenstions are spatial (e.g. `height x width`). These are commonly used for processing videos (sequences of images).
-<a name="nn.VolumetricConvolution"/>
+<a name="nn.VolumetricConvolution"></a>
### VolumetricConvolution ###
```lua
@@ -608,7 +609,7 @@ size `nOutputPlane x nInputPlane x kT x kH x kW`) and `self.bias` (Tensor of
size `nOutputPlane`). The corresponding gradients can be found in
`self.gradWeight` and `self.gradBias`.
-<a name="nn.VolumetricMaxPooling"/>
+<a name="nn.VolumetricMaxPooling"></a>
### VolumetricMaxPooling ###
```lua
@@ -619,7 +620,7 @@ Applies 3D max-pooling operation in `kTxkWxkH` regions by step size
`dTxdWxdH` steps. The number of output features is equal to the number of
input planes / dT.
-<a name="nn.VolumetricAveragePooling"/>
+<a name="nn.VolumetricAveragePooling"></a>
### VolumetricAveragePooling ###
```lua
diff --git a/doc/criterion.md b/doc/criterion.md
index 64e6d63..2928938 100755
--- a/doc/criterion.md
+++ b/doc/criterion.md
@@ -1,36 +1,36 @@
-<a name="nn.Criterions"/>
+<a name="nn.Criterions"></a>
# Criterions #
[`Criterions`](#nn.Criterion) are helpful to train a neural network. Given an input and a
target, they compute a gradient according to a given loss function.
- * Classification criterions:
- * [`BCECriterion`](#nn.BCECriterion): binary cross-entropy (two-class version of [`ClassNLLCriterion`](#nn.ClassNLLCriterion));
- * [`ClassNLLCriterion`](#nn.ClassNLLCriterion): negative log-likelihood for [`LogSoftMax`](transfer.md#nn.LogSoftMax) (multi-class);
- * [`CrossEntropyCriterion`](#nn.CrossEntropyCriterion): combines [`LogSoftMax`](transfer.md#nn.LogSoftMax) and [`ClassNLLCriterion`](#nn.ClassNLLCriterion);
- * [`MarginCriterion`](#nn.MarginCriterion): two class margin-based loss;
- * [`MultiMarginCriterion`](#nn.MultiMarginCriterion): multi-class margin-based loss;
- * [`MultiLabelMarginCriterion`](#nn.MultiLabelMarginCriterion): multi-class multi-classification margin-based loss;
- * Regression criterions:
- * [`AbsCriterion`](#nn.AbsCriterion): measures the mean absolute value of the element-wise difference between input;
- * [`MSECriterion`](#nn.MSECriterion): mean square error (a classic);
- * [`DistKLDivCriterion`](#nn.DistKLDivCriterion): Kullback–Leibler divergence (for fitting continuous probability distributions);
- * Embedding criterions (measuring whether two inputs are similar or dissimilar):
- * [`HingeEmbeddingCriterion`](#nn.HingeEmbeddingCriterion): takes a distance as input;
- * [`L1HingeEmbeddingCriterion`](#nn.L1HingeEmbeddingCriterion): L1 distance between two inputs;
- * [`CosineEmbeddingCriterion`](#nn.CosineEmbeddingCriterion): cosine distance between two inputs;
- * Miscelaneus criterions:
- * [`MultiCriterion`](#nn.MultiCriterion) : a weighted sum of other criterions each applied to the same input and target;
- * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target;
- * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs;
-
-<a name="nn.Criterion"/>
+ * Classification criterions:
+ * [`BCECriterion`](#nn.BCECriterion): binary cross-entropy (two-class version of [`ClassNLLCriterion`](#nn.ClassNLLCriterion));
+ * [`ClassNLLCriterion`](#nn.ClassNLLCriterion): negative log-likelihood for [`LogSoftMax`](transfer.md#nn.LogSoftMax) (multi-class);
+ * [`CrossEntropyCriterion`](#nn.CrossEntropyCriterion): combines [`LogSoftMax`](transfer.md#nn.LogSoftMax) and [`ClassNLLCriterion`](#nn.ClassNLLCriterion);
+ * [`MarginCriterion`](#nn.MarginCriterion): two class margin-based loss;
+ * [`MultiMarginCriterion`](#nn.MultiMarginCriterion): multi-class margin-based loss;
+ * [`MultiLabelMarginCriterion`](#nn.MultiLabelMarginCriterion): multi-class multi-classification margin-based loss;
+ * Regression criterions:
+ * [`AbsCriterion`](#nn.AbsCriterion): measures the mean absolute value of the element-wise difference between input;
+ * [`MSECriterion`](#nn.MSECriterion): mean square error (a classic);
+ * [`DistKLDivCriterion`](#nn.DistKLDivCriterion): Kullback–Leibler divergence (for fitting continuous probability distributions);
+ * Embedding criterions (measuring whether two inputs are similar or dissimilar):
+ * [`HingeEmbeddingCriterion`](#nn.HingeEmbeddingCriterion): takes a distance as input;
+ * [`L1HingeEmbeddingCriterion`](#nn.L1HingeEmbeddingCriterion): L1 distance between two inputs;
+ * [`CosineEmbeddingCriterion`](#nn.CosineEmbeddingCriterion): cosine distance between two inputs;
+ * Miscelaneus criterions:
+ * [`MultiCriterion`](#nn.MultiCriterion) : a weighted sum of other criterions each applied to the same input and target;
+ * [`ParallelCriterion`](#nn.ParallelCriterion) : a weighted sum of other criterions each applied to a different input and target;
+ * [`MarginRankingCriterion`](#nn.MarginRankingCriterion): ranks two inputs;
+
+<a name="nn.Criterion"></a>
## Criterion ##
This is an abstract class which declares methods defined in all criterions.
This class is [serializable](https://github.com/torch/torch7/blob/master/doc/file.md#serialization-methods).
-<a name="nn.Criterion.forward"/>
+<a name="nn.Criterion.forward"></a>
### [output] forward(input, target) ###
Given an `input` and a `target`, compute the loss function associated to the criterion and return the result.
@@ -41,7 +41,7 @@ The `output` returned should be a scalar in general.
The state variable [`self.output`](#nn.Criterion.output) should be updated after a call to `forward()`.
-<a name="nn.Criterion.backward"/>
+<a name="nn.Criterion.backward"></a>
### [gradInput] backward(input, target) ###
Given an `input` and a `target`, compute the gradients of the loss function associated to the criterion and return the result.
@@ -50,19 +50,19 @@ In general `input`, `target` and `gradInput` are [`Tensor`s](..:torch:tensor), b
The state variable [`self.gradInput`](#nn.Criterion.gradInput) should be updated after a call to `backward()`.
-<a name="nn.Criterion.output"/>
+<a name="nn.Criterion.output"></a>
### State variable: output ###
State variable which contains the result of the last [`forward(input, target)`](#nn.Criterion.forward) call.
-<a name="nn.Criterion.gradInput"/>
+<a name="nn.Criterion.gradInput"></a>
### State variable: gradInput ###
State variable which contains the result of the last [`backward(input, target)`](#nn.Criterion.backward) call.
-<a name="nn.AbsCriterion"/>
+<a name="nn.AbsCriterion"></a>
## AbsCriterion ##
```lua
@@ -85,7 +85,7 @@ criterion.sizeAverage = false
```
-<a name="nn.ClassNLLCriterion"/>
+<a name="nn.ClassNLLCriterion"></a>
## ClassNLLCriterion ##
```lua
@@ -128,7 +128,7 @@ end
```
-<a name="nn.CrossEntropyCriterion"/>
+<a name="nn.CrossEntropyCriterion"></a>
## CrossEntropyCriterion ##
```lua
@@ -157,7 +157,7 @@ loss(x, class) = weights[class] * (-x[class] + log(\sum_j exp(x[j])))
```
-<a name="nn.DistKLDivCriterion"/>
+<a name="nn.DistKLDivCriterion"></a>
## DistKLDivCriterion ##
```lua
@@ -177,7 +177,7 @@ loss(x, target) = \sum(target_i * (log(target_i) - x_i))
```
-<a name="nn.BCECriterion"/>
+<a name="nn.BCECriterion"></a>
## BCECriterion
```lua
@@ -193,7 +193,7 @@ loss(t, o) = -(t * log(o) + (1 - t) * log(1 - o))
This is used for measuring the error of a reconstruction in for example an auto-encoder.
-<a name="nn.MarginCriterion"/>
+<a name="nn.MarginCriterion"></a>
## MarginCriterion ##
```lua
@@ -256,7 +256,7 @@ gives the output:
i.e. the mlp successfully separates the two data points such that they both have a `margin` of `1`, and hence a loss of `0`.
-<a name="nn.MultiMarginCriterion"/>
+<a name="nn.MultiMarginCriterion"></a>
## MultiMarginCriterion ##
```lua
@@ -281,7 +281,7 @@ mlp:add(nn.MulConstant(-1)) -- distance to similarity
```
-<a name="nn.MultiLabelMarginCriterion"/>
+<a name="nn.MultiLabelMarginCriterion"></a>
## MultiLabelMarginCriterion ##
```lua
@@ -309,7 +309,7 @@ criterion:forward(input, target)
```
-<a name="nn.MSECriterion"/>
+<a name="nn.MSECriterion"></a>
## MSECriterion ##
```lua
@@ -333,7 +333,7 @@ criterion.sizeAverage = false
```
-<a name="nn.MultiCriterion"/>
+<a name="nn.MultiCriterion"></a>
## MultiCriterion ##
```lua
@@ -360,7 +360,7 @@ mc = nn.MultiCriterion():add(nll, 0.5):add(nll2)
output = mc:forward(input, target)
```
-<a name="nn.ParallelCriterion"/>
+<a name="nn.ParallelCriterion"></a>
## ParallelCriterion ##
```lua
@@ -390,7 +390,7 @@ output = pc:forward(input, target)
```
-<a name="nn.HingeEmbeddingCriterion"/>
+<a name="nn.HingeEmbeddingCriterion"></a>
## HingeEmbeddingCriterion ##
```lua
@@ -469,7 +469,7 @@ end
```
-<a name="nn.L1HingeEmbeddingCriterion"/>
+<a name="nn.L1HingeEmbeddingCriterion"></a>
## L1HingeEmbeddingCriterion ##
```lua
@@ -486,7 +486,7 @@ loss(x, y) = ⎨
The `margin` has a default value of `1`, or can be set in the constructor.
-<a name="nn.CosineEmbeddingCriterion"/>
+<a name="nn.CosineEmbeddingCriterion"></a>
## CosineEmbeddingCriterion ##
```lua
@@ -508,7 +508,7 @@ loss(x, y) = ⎨
```
-<a name="nn.MarginRankingCriterion"/>
+<a name="nn.MarginRankingCriterion"></a>
## MarginRankingCriterion ##
```lua
diff --git a/doc/index.md b/doc/index.md
new file mode 100644
index 0000000..5c36166
--- /dev/null
+++ b/doc/index.md
@@ -0,0 +1,23 @@
+[![Build Status](https://travis-ci.org/torch/nn.svg?branch=master)](https://travis-ci.org/torch/nn)
+<a name="nn.dok"></a>
+# Neural Network Package #
+
+This package provides an easy and modular way to build and train simple or complex neural networks using [Torch](https://github.com/torch/torch7/blob/master/README.md):
+
+ * Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks:
+ * [Module](module.md#nn.Module) : abstract class inherited by all modules;
+ * [Containers](containers.md#nn.Containers) : container classes like [Sequential](containers.md#nn.Sequential), [Parallel](containers.md#nn.Parallel) and [Concat](containers.md#nn.Concat);
+ * [Transfer functions](transfer.md#nn.transfer.dok) : non-linear functions like [Tanh](transfer.md#nn.Tanh) and [Sigmoid](transfer.md#nn.Sigmoid);
+ * [Simple layers](simple.md#nn.simplelayers.dok) : like [Linear](simple.md#nn.Linear), [Mean](simple.md#nn.Mean), [Max](simple.md#nn.Max) and [Reshape](simple.md#nn.Reshape);
+ * [Table layers](table.md#nn.TableLayers) : layers for manipulating tables like [SplitTable](table.md#nn.SplitTable), [ConcatTable](table.md#nn.ConcatTable) and [JoinTable](table.md#nn.JoinTable);
+ * [Convolution layers](convolution.md#nn.convlayers.dok) : [Temporal](convolution.md#nn.TemporalModules), [Spatial](convolution.md#nn.SpatialModules) and [Volumetric](convolution.md#nn.VolumetricModules) convolutions ;
+ * Criterions compute a gradient according to a given loss function given an input and a target:
+ * [Criterions](criterion.md#nn.Criterions) : a list of all criterions, including [Criterion](criterion.md#nn.Criterion), the abstract class;
+ * [MSECriterion](criterion.md#nn.MSECriterion) : the Mean Squared Error criterion used for regression;
+ * [ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion) : the Negative Log Likelihood criterion used for classification;
+ * Additional documentation :
+ * [Overview](overview.md#nn.overview.dok) of the package essentials including modules, containers and training;
+ * [Training](training.md#nn.traningneuralnet.dok) : how to train a neural network using [StochasticGradient](training.md#nn.StochasticGradient);
+ * [Testing](testing.md) : how to test your modules.
+ * [Experimental Modules](https://github.com/clementfarabet/lua---nnx/blob/master/README.md) : a package containing experimental modules and criteria.
+
diff --git a/doc/module.md b/doc/module.md
index 50090c4..97e14a0 100755
--- a/doc/module.md
+++ b/doc/module.md
@@ -1,4 +1,4 @@
-<a name="nn.Module"/>
+<a name="nn.Module"></a>
## Module ##
`Module` is an abstract class which defines fundamental methods necessary
@@ -7,7 +7,7 @@ for a training a neural network. Modules are [serializable](https://github.com/t
Modules contain two states variables: [output](#output) and
[gradInput](#gradinput).
-<a name="nn.Module.forward"/>
+<a name="nn.Module.forward"></a>
### [output] forward(input) ###
Takes an `input` object, and computes the corresponding `output` of the
@@ -24,7 +24,7 @@ implement [updateOutput(input)](#nn.Module.updateOutput)
function. The forward module in the abstract parent class
[Module](#nn.Module) will call `updateOutput(input)`.
-<a name="nn.Module.backward"/>
+<a name="nn.Module.backward"></a>
### [gradInput] backward(input, gradOutput) ###
Performs a _backpropagation step_ through the module, with respect to the
@@ -52,14 +52,14 @@ is better to override
[accGradParameters(input, gradOutput,scale)](#nn.Module.accGradParameters)
functions.
-<a name="nn.Module.updateOutput"/>
+<a name="nn.Module.updateOutput"></a>
### updateOutput(input) ###
Computes the output using the current parameter set of the class and
input. This function returns the result which is stored in the
[output](#output) field.
-<a name="nn.Module.updateGradInput"/>
+<a name="nn.Module.updateGradInput"></a>
### updateGradInput(input, gradOutput) ###
Computing the gradient of the module with respect to its own
@@ -67,7 +67,7 @@ input. This is returned in `gradInput`. Also, the
[gradInput](#gradinput) state variable is updated
accordingly.
-<a name="nn.Module.accGradParameters"/>
+<a name="nn.Module.accGradParameters"></a>
### accGradParameters(input, gradOutput, scale) ###
Computing the gradient of the module with respect to its
@@ -83,7 +83,7 @@ Zeroing this accumulation is achieved with
the parameters according to this accumulation is done with
[updateParameters()](#nn.Module.updateParameters).
-<a name="nn.Module.zeroGradParameters"/>
+<a name="nn.Module.zeroGradParameters"></a>
### zeroGradParameters() ###
If the module has parameters, this will zero the accumulation of the
@@ -91,7 +91,7 @@ gradients with respect to these parameters, accumulated through
[accGradParameters(input, gradOutput,scale)](#nn.Module.accGradParameters)
calls. Otherwise, it does nothing.
-<a name="nn.Module.updateParameters"/>
+<a name="nn.Module.updateParameters"></a>
### updateParameters(learningRate) ###
If the module has parameters, this will update these parameters, according
@@ -104,7 +104,7 @@ parameters = parameters - learningRate * gradients_wrt_parameters
```
If the module does not have parameters, it does nothing.
-<a name="nn.Module.accUpdateGradParameters"/>
+<a name="nn.Module.accUpdateGradParameters"></a>
### accUpdateGradParameters(input, gradOutput, learningRate) ###
This is a convenience module that performs two functions at
@@ -136,7 +136,7 @@ As it can be seen, the gradients are accumulated directly into
weights. This assumption may not be true for a module that computes a
nonlinear operation.
-<a name="nn.Module.share"/>
+<a name="nn.Module.share"></a>
### share(mlp,s1,s2,...,sn) ###
This function modifies the parameters of the module named
@@ -174,7 +174,7 @@ print(mlp2:get(1).bias[1])
```
-<a name="nn.Module.clone"/>
+<a name="nn.Module.clone"></a>
### clone(mlp,...) ###
Creates a deep copy of (i.e. not just a pointer to) the module,
@@ -205,29 +205,29 @@ print(mlp2:get(1).bias[1])
```
-<a name="nn.Module.type"/>
+<a name="nn.Module.type"></a>
### type(type) ###
This function converts all the parameters of a module to the given
`type`. The `type` can be one of the types defined for
[torch.Tensor](https://github.com/torch/torch7/blob/master/doc/tensor.md).
-<a name="nn.Module.float"/>
+<a name="nn.Module.float"></a>
### float() ###
Convenience method for calling [module:type('torch.FloatTensor')](#nn.Module.type)
-<a name="nn.Module.double"/>
+<a name="nn.Module.double"></a>
### double() ###
Convenience method for calling [module:type('torch.DoubleTensor')](#nn.Module.type)
-<a name="nn.Module.cuda"/>
+<a name="nn.Module.cuda"></a>
### cuda() ###
Convenience method for calling [module:type('torch.CudaTensor')](#nn.Module.type)
-<a name="nn.statevars.dok"/>
+<a name="nn.statevars.dok"></a>
### State Variables ###
These state variables are useful objects if one wants to check the guts of
@@ -240,13 +240,13 @@ However, some special sub-classes
like [table layers](table.md#nn.TableLayers) contain something else. Please,
refer to each module specification for further information.
-<a name="nn.Module.output"/>
+<a name="nn.Module.output"></a>
#### output ####
This contains the output of the module, computed with the last call of
[forward(input)](#nn.Module.forward).
-<a name="nn.Module.gradInput"/>
+<a name="nn.Module.gradInput"></a>
#### gradInput ####
This contains the gradients with respect to the inputs of the module, computed with the last call of
@@ -258,7 +258,7 @@ Some modules contain parameters (the ones that we actually want to
train!). The name of these parameters, and gradients w.r.t these parameters
are module dependent.
-<a name="nn.Module.parameters"/>
+<a name="nn.Module.parameters"></a>
### [{weights}, {gradWeights}] parameters() ###
This function should returns two tables. One for the learnable
@@ -268,7 +268,7 @@ wrt to the learnable parameters `{gradWeights}`.
Custom modules should override this function if they use learnable
parameters that are stored in tensors.
-<a name="nn.Module.getParameters"/>
+<a name="nn.Module.getParameters"></a>
### [flatParameters, flatGradParameters] getParameters() ###
This function returns two tensors. One for the flattened learnable
@@ -279,15 +279,15 @@ Custom modules should not override this function. They should instead override [
This function will go over all the weights and gradWeights and make them view into a single tensor (one for weights and one for gradWeights). Since the storage of every weight and gradWeight is changed, this function should be called only once on a given network.
-<a name="nn.Module.training"/>
+<a name="nn.Module.training"></a>
### training() ###
This sets the mode of the Module (or sub-modules) to `train=true`. This is useful for modules like [Dropout](simple.md#nn.Dropout) that have a different behaviour during training vs evaluation.
-<a name="nn.Module.evaluate"/>
+<a name="nn.Module.evaluate"></a>
### evaluate() ###
This sets the mode of the Module (or sub-modules) to `train=false`. This is useful for modules like [Dropout](simple.md#nn.Dropout) that have a different behaviour during training vs evaluation.
-<a name="nn.Module.findModules"/>
+<a name="nn.Module.findModules"></a>
### findModules(typename) ###
Find all instances of modules in the network of a certain `typename`. It returns a flattened list of the matching nodes, as well as a flattened list of the container modules for each matching node.
@@ -331,7 +331,7 @@ for i = 1, #threshold_nodes do
end
```
-<a name="nn.Module.listModules"/>
+<a name="nn.Module.listModules"></a>
### listModules() ###
List all Modules instances in a network. Returns a flattened list of modules,
diff --git a/doc/overview.md b/doc/overview.md
index c9eedae..6aec321 100644
--- a/doc/overview.md
+++ b/doc/overview.md
@@ -1,4 +1,4 @@
-<a name="nn.overview.dok"/>
+<a name="nn.overview.dok"></a>
# Overview #
Each module of a network is composed of [Modules](module.md#nn.Modules) and there
@@ -23,31 +23,35 @@ easy with a simple for loop to [train a neural network yourself](training.md#nn.
## Detailed Overview ##
This section provides a detailed overview of the neural network package. First the omnipresent [Module](#nn.overview.module) is examined, followed by some examples for [combining modules](#nn.overview.plugandplay) together. The last part explores facilities for [training a neural network](#nn.overview.training).
-<a name="nn.overview.module"/>
+<a name="nn.overview.module"></a>
### Module ###
A neural network is called a [Module](module.md#nn.Module) (or simply
_module_ in this documentation) in Torch. `Module` is an abstract
class which defines four main methods:
+
* [forward(input)](module.md#nn.Module.forward) which computes the output of the module given the `input` [Tensor](https://github.com/torch/torch7/blob/master/doc/tensor.md).
* [backward(input, gradOutput)](module.md#nn.Module.backward) which computes the gradients of the module with respect to its own parameters, and its own inputs.
* [zeroGradParameters()](module.md#nn.Module.zeroGradParameters) which zeroes the gradient with respect to the parameters of the module.
* [updateParameters(learningRate)](module.md#nn.Module.updateParameters) which updates the parameters after one has computed the gradients with `backward()`
It also declares two members:
+
* [output](module.md#nn.Module.output) which is the output returned by `forward()`.
* [gradInput](module.md#nn.Module.gradInput) which contains the gradients with respect to the input of the module, computed in a `backward()`.
Two other perhaps less used but handy methods are also defined:
+
* [share(mlp,s1,s2,...,sn)](module.md#nn.Module.share) which makes this module share the parameters s1,..sn of the module `mlp`. This is useful if you want to have modules that share the same weights.
* [clone(...)](module.md#nn.Module.clone) which produces a deep copy of (i.e. not just a pointer to) this Module, including the current state of its parameters (if any).
Some important remarks:
+
* `output` contains only valid values after a [forward(input)](module.md#nn.Module.forward).
* `gradInput` contains only valid values after a [backward(input, gradOutput)](module.md#nn.Module.backward).
* [backward(input, gradOutput)](module.md#nn.Module.backward) uses certain computations obtained during [forward(input)](module.md#nn.Module.forward). You _must_ call `forward()` before calling a `backward()`, on the _same_ `input`, or your gradients are going to be incorrect!
-<a name="nn.overview.plugandplay"/>
+<a name="nn.overview.plugandplay"></a>
### Plug and play ###
Building a simple neural network can be achieved by constructing an available layer.
@@ -75,7 +79,7 @@ Of course, `Sequential` and `Concat` can contains other
networks you ever dreamt of! See the [[#nn.Modules|complete list of
available modules]].
-<a name="nn.overview.training"/>
+<a name="nn.overview.training"></a>
### Training a neural network ###
Once you built your neural network, you have to choose a particular
@@ -114,7 +118,7 @@ are implemented. [See an example](containers.md#nn.DoItStochasticGradient).
to cut-and-paste it and create a variant to it adapted to your needs
(if the constraints of `StochasticGradient` do not satisfy you).
-<a name="nn.overview.lowlevel"/>
+<a name="nn.overview.lowlevel"></a>
#### Low Level Training ####
If you want to program the `StochasticGradient` by hand, you
diff --git a/doc/simple.md b/doc/simple.md
index 6ef7ed2..ebb2d2f 100755
--- a/doc/simple.md
+++ b/doc/simple.md
@@ -1,42 +1,43 @@
-<a name="nn.simplelayers.dok"/>
+<a name="nn.simplelayers.dok"></a>
# Simple layers #
Simple Modules are used for various tasks like adapting Tensor methods and providing affine transformations :
- * Parameterized Modules :
- * [Linear](#nn.Linear) : a linear transformation ;
- * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ;
- * [Add](#nn.Add) : adds a bias term to the incoming data ;
- * [Mul](#nn.Mul) : multiply a single scalar factor to the incoming data ;
- * [CMul](#nn.CMul) : a component-wise multiplication to the incoming data ;
- * [CDiv](#nn.CDiv) : a component-wise division to the incoming data ;
- * [Euclidean](#nn.Euclidean) : the euclidean distance of the input to `k` mean centers ;
- * [WeightedEuclidean](#nn.WeightedEuclidean) : similar to [Euclidean](#nn.Euclidean), but additionally learns a diagonal covariance matrix ;
- * Modules that adapt basic Tensor methods :
- * [Copy](#nn.Copy) : a [copy](https://github.com/torch/torch7/blob/master/doc/tensor.md#torch.Tensor.copy) of the input with [type](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-or-string-typetype) casting ;
- * [Narrow](#nn.Narrow) : a [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation over a given dimension ;
- * [Replicate](#nn.Replicate) : [repeats](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-repeattensorresult-sizes) input `n` times along its first dimension ;
- * [Reshape](#nn.Reshape) : a [reshape](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchreshaperes-x-m-n) of the inputs ;
- * [View](#nn.View) : a [view](https://github.com/torch/torch7/blob/master/doc/tensor.md#result-viewresult-tensor-sizes) of the inputs ;
- * [Select](#nn.Select) : a [select](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-selectdim-index) over a given dimension ;
- * Modules that adapt mathematical Tensor methods :
- * [Max](#nn.Max) : a [max](https://github.com/torch/torch7/blob/master/doc/maths.md#torch.max) operation over a given dimension ;
- * [Min](#nn.Min) : a [min](https://github.com/torch/torch7/blob/master/doc/maths.md#torchminresval-resind-x) operation over a given dimension ;
- * [Mean](#nn.Mean) : a [mean](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchmeanres-x-dim) operation over a given dimension ;
- * [Sum](#nn.Sum) : a [sum](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsumres-x) operation over a given dimension ;
- * [Exp](#nn.Exp) : an element-wise [exp](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchexpres-x) operation ;
- * [Abs](#nn.Abs) : an element-wise [abs](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchabsres-x) operation ;
- * [Power](#nn.Power) : an element-wise [pow](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchpowres-x) operation ;
- * [Square](#nn.Square) : an element-wise square operation ;
- * [Sqrt](#nn.Sqrt) : an element-wise [sqrt](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsqrtres-x) operation ;
- * [MM](#nn.MM) : matrix-matrix multiplication (also supports batches of matrices) ;
- * Miscellaneous Modules :
- * [BatchNormalization](#nn.BatchNormalization) - mean/std normalization over the mini-batch inputs (with an optional affine transform) ;
- * [Identity](#nn.Identity) : forward input as-is to output (useful with [ParallelTable](table.md#nn.ParallelTable));
- * [Dropout](#nn.Dropout) : masks parts of the `input` using binary samples from a [bernoulli](http://en.wikipedia.org/wiki/Bernoulli_distribution) distribution ;
- * [SpatialDropout](#nn.SpatialDropout) : Same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ;
- * [Padding](#nn.Padding) : adds padding to a dimension ;
- * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity);
-
-<a name="nn.Linear"/>
+
+ * Parameterized Modules :
+ * [Linear](#nn.Linear) : a linear transformation ;
+ * [SparseLinear](#nn.SparseLinear) : a linear transformation with sparse inputs ;
+ * [Add](#nn.Add) : adds a bias term to the incoming data ;
+ * [Mul](#nn.Mul) : multiply a single scalar factor to the incoming data ;
+ * [CMul](#nn.CMul) : a component-wise multiplication to the incoming data ;
+ * [CDiv](#nn.CDiv) : a component-wise division to the incoming data ;
+ * [Euclidean](#nn.Euclidean) : the euclidean distance of the input to `k` mean centers ;
+ * [WeightedEuclidean](#nn.WeightedEuclidean) : similar to [Euclidean](#nn.Euclidean), but additionally learns a diagonal covariance matrix ;
+ * Modules that adapt basic Tensor methods :
+ * [Copy](#nn.Copy) : a [copy](https://github.com/torch/torch7/blob/master/doc/tensor.md#torch.Tensor.copy) of the input with [type](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-or-string-typetype) casting ;
+ * [Narrow](#nn.Narrow) : a [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation over a given dimension ;
+ * [Replicate](#nn.Replicate) : [repeats](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-repeattensorresult-sizes) input `n` times along its first dimension ;
+ * [Reshape](#nn.Reshape) : a [reshape](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchreshaperes-x-m-n) of the inputs ;
+ * [View](#nn.View) : a [view](https://github.com/torch/torch7/blob/master/doc/tensor.md#result-viewresult-tensor-sizes) of the inputs ;
+ * [Select](#nn.Select) : a [select](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-selectdim-index) over a given dimension ;
+ * Modules that adapt mathematical Tensor methods :
+ * [Max](#nn.Max) : a [max](https://github.com/torch/torch7/blob/master/doc/maths.md#torch.max) operation over a given dimension ;
+ * [Min](#nn.Min) : a [min](https://github.com/torch/torch7/blob/master/doc/maths.md#torchminresval-resind-x) operation over a given dimension ;
+ * [Mean](#nn.Mean) : a [mean](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchmeanres-x-dim) operation over a given dimension ;
+ * [Sum](#nn.Sum) : a [sum](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsumres-x) operation over a given dimension ;
+ * [Exp](#nn.Exp) : an element-wise [exp](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchexpres-x) operation ;
+ * [Abs](#nn.Abs) : an element-wise [abs](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchabsres-x) operation ;
+ * [Power](#nn.Power) : an element-wise [pow](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchpowres-x) operation ;
+ * [Square](#nn.Square) : an element-wise square operation ;
+ * [Sqrt](#nn.Sqrt) : an element-wise [sqrt](https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchsqrtres-x) operation ;
+ * [MM](#nn.MM) : matrix-matrix multiplication (also supports batches of matrices) ;
+ * Miscellaneous Modules :
+ * [BatchNormalization](#nn.BatchNormalization) - mean/std normalization over the mini-batch inputs (with an optional affine transform) ;
+ * [Identity](#nn.Identity) : forward input as-is to output (useful with [ParallelTable](table.md#nn.ParallelTable));
+ * [Dropout](#nn.Dropout) : masks parts of the `input` using binary samples from a [bernoulli](http://en.wikipedia.org/wiki/Bernoulli_distribution) distribution ;
+ * [SpatialDropout](#nn.SpatialDropout) : Same as Dropout but for spatial inputs where adjacent pixels are strongly correlated ;
+ * [Padding](#nn.Padding) : adds padding to a dimension ;
+ * [L1Penalty](#nn.L1Penalty) : adds an L1 penalty to an input (for sparsity);
+
+<a name="nn.Linear"></a>
## Linear ##
```lua
@@ -79,7 +80,7 @@ x = torch.Tensor(10) -- 10 inputs
y = module:forward(x)
```
-<a name="nn.SparseLinear"/>
+<a name="nn.SparseLinear"></a>
## SparseLinear ##
```lua
@@ -113,7 +114,7 @@ x = torch.Tensor({ {1, 0.1}, {2, 0.3}, {10, 0.3}, {31, 0.2} })
The first column contains indices, the second column contains values in a a vector where all other elements are zeros. The indices should not exceed the stated dimensions of the input to the layer (10000 in the example).
-<a name="nn.Dropout"/>
+<a name="nn.Dropout"></a>
## Dropout ##
```lua
@@ -183,7 +184,7 @@ We can return to training our model by first calling [Module:training()](module.
When used, `Dropout` should normally be applied to the input of parameterized [Modules](module.md#nn.Module) like [Linear](#nn.Linear) or [SpatialConvolution](convolution.md#nn.SpatialConvolution). A `p` of `0.5` (the default) is usually okay for hidden layers. `Dropout` can sometimes be used successfully on the dataset inputs with a `p` around `0.2`. It sometimes works best following [Transfer](transfer.md) Modules like [ReLU](transfer.md#nn.ReLU). All this depends a great deal on the dataset so its up to the user to try different combinations.
-<a name="nn.SpatialDropout"/>
+<a name="nn.SpatialDropout"></a>
## SpatialDropout ##
`module` = `nn.SpatialDropout(p)`
@@ -194,7 +195,7 @@ As described in the paper "Efficient Object Localization Using Convolutional Net
```nn.SpatialDropout``` accepts 3D or 4D inputs. If the input is 3D than a layout of (features x height x width) is assumed and for 4D (batch x features x height x width) is assumed.
-<a name="nn.Abs"/>
+<a name="nn.Abs"></a>
## Abs ##
```lua
@@ -214,7 +215,7 @@ gnuplot.grid(true)
![](image/abs.png)
-<a name='nn.Add'/>
+<a name='nn.Add'></a>
## Add ##
```lua
@@ -264,7 +265,7 @@ gives the output:
i.e. the network successfully learns the input `x` has been shifted to produce the output `y`.
-<a name="nn.Mul"/>
+<a name="nn.Mul"></a>
## Mul ##
```lua
@@ -309,7 +310,7 @@ gives the output:
i.e. the network successfully learns the input `x` has been scaled by pi.
-<a name='nn.CMul'/>
+<a name='nn.CMul'></a>
## CMul ##
```lua
@@ -362,7 +363,7 @@ gives the output:
i.e. the network successfully learns the input `x` has been scaled by those scaling factors to produce the output `y`.
-<a name="nn.Max"/>
+<a name="nn.Max"></a>
## Max ##
```lua
@@ -373,7 +374,7 @@ Applies a max operation over dimension `dimension`.
Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
-<a name="nn.Min"/>
+<a name="nn.Min"></a>
## Min ##
```lua
@@ -384,7 +385,7 @@ Applies a min operation over dimension `dimension`.
Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
-<a name="nn.Mean"/>
+<a name="nn.Mean"></a>
## Mean ##
```lua
@@ -394,7 +395,7 @@ module = nn.Mean(dimension)
Applies a mean operation over dimension `dimension`.
Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
-<a name="nn.Sum"/>
+<a name="nn.Sum"></a>
## Sum ##
```lua
@@ -405,7 +406,7 @@ Applies a sum operation over dimension `dimension`.
Hence, if an `nxpxq` Tensor was given as input, and `dimension` = `2` then an `nxq` matrix would be output.
-<a name="nn.Euclidean"/>
+<a name="nn.Euclidean"></a>
## Euclidean ##
```lua
@@ -416,7 +417,7 @@ Outputs the Euclidean distance of the input to `outputSize` centers, i.e. this l
The distance `y_j` between center `j` and input `x` is formulated as `y_j = || w_j - x ||`.
-<a name="nn.WeightedEuclidean"/>
+<a name="nn.WeightedEuclidean"></a>
## WeightedEuclidean ##
```lua
@@ -429,7 +430,7 @@ In other words, for each of the `outputSize` centers `w_j`, there is a diagonal
The distance `y_j` between center `j` and input `x` is formulated as `y_j = || c_j * (w_j - x) ||`.
-<a name="nn.Identity"/>
+<a name="nn.Identity"></a>
## Identity ##
```lua
@@ -488,7 +489,7 @@ for i = 1, 100 do -- Do a few training iterations
end
```
-<a name="nn.Copy"/>
+<a name="nn.Copy"></a>
## Copy ##
```lua
@@ -498,7 +499,7 @@ module = nn.Copy(inputType, outputType, [forceCopy, dontCast])
This layer copies the input to output with type casting from input type from `inputType` to `outputType`. Unless `forceCopy` is true, when the first two arguments are the same, the input isn't copied, only transfered as the output. The default `forceCopy` is false.
When `dontCast` is true, a call to `nn.Copy:type(type)` will not cast the module's `output` and `gradInput` Tensors to the new type. The default is false.
-<a name="nn.Narrow"/>
+<a name="nn.Narrow"></a>
## Narrow ##
```lua
@@ -507,7 +508,7 @@ module = nn.Narrow(dimension, offset, length)
Narrow is application of [narrow](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor-narrowdim-index-size) operation in a module.
-<a name="nn.Replicate"/>
+<a name="nn.Replicate"></a>
## Replicate ##
```lua
@@ -552,7 +553,7 @@ This allows the module to replicate the same non-batch dimension `dim` for both
```
-<a name="nn.Reshape"/>
+<a name="nn.Reshape"></a>
## Reshape ##
```lua
@@ -640,7 +641,7 @@ Example:
```
-<a name="nn.View"/>
+<a name="nn.View"></a>
## View ##
```lua
@@ -723,7 +724,7 @@ Example 2:
[torch.LongStorage of size 2]
```
-<a name="nn.Select"/>
+<a name="nn.Select"></a>
## Select ##
```lua
@@ -798,7 +799,7 @@ for i = 1, 10000 do -- Train for a few iterations
end
```
-<a name="nn.Exp"/>
+<a name="nn.Exp"></a>
## Exp ##
```lua
@@ -820,7 +821,7 @@ gnuplot.grid(true)
![](image/exp.png)
-<a name="nn.Square"/>
+<a name="nn.Square"></a>
## Square ##
```lua
@@ -842,7 +843,7 @@ gnuplot.grid(true)
![](image/square.png)
-<a name="nn.Sqrt"/>
+<a name="nn.Sqrt"></a>
## Sqrt ##
```lua
@@ -864,7 +865,7 @@ gnuplot.grid(true)
![](image/sqrt.png)
-<a name="nn.Power"/>
+<a name="nn.Power"></a>
## Power ##
```lua
@@ -886,7 +887,7 @@ gnuplot.grid(true)
![](image/power.png)
-<a name="nn.MM"/>
+<a name="nn.MM"></a>
## MM ##
```lua
@@ -905,7 +906,7 @@ C = model.forward({A, B}) -- C will be of size `b x m x n`
```
-<a name="nn.BatchNormalization"/>
+<a name="nn.BatchNormalization"></a>
## BatchNormalization ##
```lua
@@ -945,7 +946,7 @@ A = torch.randn(b, m)
C = model.forward(A) -- C will be of size `b x m`
```
-<a name="nn.Padding"/>
+<a name="nn.Padding"></a>
## Padding ##
`module` = `nn.Padding(dim, pad [, nInputDim, value])`
@@ -978,7 +979,7 @@ module:forward(torch.randn(2, 3)) --batch input
```
-<a name="nn.L1Penalty"/>
+<a name="nn.L1Penalty"></a>
## L1Penalty ##
```lua
diff --git a/doc/table.md b/doc/table.md
index 91ea209..61d1085 100755
--- a/doc/table.md
+++ b/doc/table.md
@@ -1,29 +1,30 @@
-<a name="nn.TableLayers"/>
+<a name="nn.TableLayers"></a>
# Table Layers #
This set of modules allows the manipulation of `table`s through the layers of a neural network.
This allows one to build very rich architectures:
- * `table` Container Modules encapsulate sub-Modules:
- * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`;
- * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`;
- * Table Conversion Modules convert between `table`s and `Tensor`s or `table`s:
- * [`SplitTable`](#nn.SplitTable): splits a `Tensor` into a `table` of `Tensor`s;
- * [`JoinTable`](#nn.JoinTable): joins a `table` of `Tensor`s into a `Tensor`;
- * [`MixtureTable`](#nn.MixtureTable): mixture of experts weighted by a gater;
- * [`SelectTable`](#nn.SelectTable): select one element from a `table`;
- * [`NarrowTable`](#nn.NarrowTable): select a slice of elements from a `table`;
- * [`FlattenTable`](#nn.FlattenTable): flattens a nested `table` hierarchy;
- * Pair Modules compute a measure like distance or similarity from a pair (`table`) of input `Tensor`s:
- * [`PairwiseDistance`](#nn.PairwiseDistance): outputs the `p`-norm. distance between inputs;
- * [`DotProduct`](#nn.DotProduct): outputs the dot product (similarity) between inputs;
- * [`CosineDistance`](#nn.CosineDistance): outputs the cosine distance between inputs;
- * CMath Modules perform element-wise operations on a `table` of `Tensor`s:
- * [`CAddTable`](#nn.CAddTable): addition of input `Tensor`s;
- * [`CSubTable`](#nn.CSubTable): substraction of input `Tensor`s;
- * [`CMulTable`](#nn.CMulTable): multiplication of input `Tensor`s;
- * [`CDivTable`](#nn.CDivTable): division of input `Tensor`s;
- * `Table` of Criteria:
- * [`CriterionTable`](#nn.CriterionTable): wraps a [Criterion](criterion.md#nn.Criterion) so that it can accept a `table` of inputs.
+
+ * `table` Container Modules encapsulate sub-Modules:
+ * [`ConcatTable`](#nn.ConcatTable): applies each member module to the same input [`Tensor`](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) and outputs a `table`;
+ * [`ParallelTable`](#nn.ParallelTable): applies the `i`-th member module to the `i`-th input and outputs a `table`;
+ * Table Conversion Modules convert between `table`s and `Tensor`s or `table`s:
+ * [`SplitTable`](#nn.SplitTable): splits a `Tensor` into a `table` of `Tensor`s;
+ * [`JoinTable`](#nn.JoinTable): joins a `table` of `Tensor`s into a `Tensor`;
+ * [`MixtureTable`](#nn.MixtureTable): mixture of experts weighted by a gater;
+ * [`SelectTable`](#nn.SelectTable): select one element from a `table`;
+ * [`NarrowTable`](#nn.NarrowTable): select a slice of elements from a `table`;
+ * [`FlattenTable`](#nn.FlattenTable): flattens a nested `table` hierarchy;
+ * Pair Modules compute a measure like distance or similarity from a pair (`table`) of input `Tensor`s:
+ * [`PairwiseDistance`](#nn.PairwiseDistance): outputs the `p`-norm. distance between inputs;
+ * [`DotProduct`](#nn.DotProduct): outputs the dot product (similarity) between inputs;
+ * [`CosineDistance`](#nn.CosineDistance): outputs the cosine distance between inputs;
+ * CMath Modules perform element-wise operations on a `table` of `Tensor`s:
+ * [`CAddTable`](#nn.CAddTable): addition of input `Tensor`s;
+ * [`CSubTable`](#nn.CSubTable): substraction of input `Tensor`s;
+ * [`CMulTable`](#nn.CMulTable): multiplication of input `Tensor`s;
+ * [`CDivTable`](#nn.CDivTable): division of input `Tensor`s;
+ * `Table` of Criteria:
+ * [`CriterionTable`](#nn.CriterionTable): wraps a [Criterion](criterion.md#nn.Criterion) so that it can accept a `table` of inputs.
`table`-based modules work by supporting `forward()` and `backward()` methods that can accept `table`s as inputs.
It turns out that the usual [`Sequential`](containers.md#nn.Sequential) module can do this, so all that is needed is other child modules that take advantage of such `table`s.
@@ -35,7 +36,7 @@ pred = mlp:forward(t)
pred = mlp:forward{x, y, z} -- This is equivalent to the line before
```
-<a name="nn.ConcatTable"/>
+<a name="nn.ConcatTable"></a>
## ConcatTable ##
```lua
@@ -115,7 +116,7 @@ which gives the output (using [th](https://github.com/torch/trepl)):
```
-<a name="nn.ParallelTable"/>
+<a name="nn.ParallelTable"></a>
## ParallelTable ##
```lua
@@ -164,7 +165,7 @@ which gives the output:
```
-<a name="nn.SplitTable"/>
+<a name="nn.SplitTable"></a>
## SplitTable ##
```lua
@@ -399,7 +400,7 @@ end
```
-<a name="nn.JoinTable"/>
+<a name="nn.JoinTable"></a>
## JoinTable ##
```lua
@@ -534,7 +535,7 @@ end
```
-<a name='nn.MixtureTable'/>
+<a name='nn.MixtureTable'></a>
## MixtureTable ##
`module` = `MixtureTable([dim])`
@@ -632,7 +633,7 @@ Forwarding a batch of 2 examples gives us something like this:
```
-<a name="nn.SelectTable"/>
+<a name="nn.SelectTable"></a>
## SelectTable ##
`module` = `SelectTable(index)`
@@ -725,7 +726,7 @@ Example 2:
```
-<a name="nn.NarrowTable"/>
+<a name="nn.NarrowTable"></a>
## NarrowTable ##
`module` = `NarrowTable(offset [, length])`
@@ -765,7 +766,7 @@ Example:
```
-<a name="nn.FlattenTable"/>
+<a name="nn.FlattenTable"></a>
## FlattenTable ##
`module` = `FlattenTable()`
@@ -802,7 +803,7 @@ gives the output:
}
```
-<a name="nn.PairwiseDistance"/>
+<a name="nn.PairwiseDistance"></a>
## PairwiseDistance ##
`module` = `PairwiseDistance(p)` creates a module that takes a `table` of two vectors as input and outputs the distance between them using the `p`-norm.
@@ -885,7 +886,7 @@ end
```
-<a name="nn.DotProduct"/>
+<a name="nn.DotProduct"></a>
## DotProduct ##
`module` = `DotProduct()` creates a module that takes a `table` of two vectors as input and outputs the dot product between them.
@@ -978,7 +979,7 @@ end
```
-<a name="nn.CosineDistance"/>
+<a name="nn.CosineDistance"></a>
## CosineDistance ##
`module` = `CosineDistance()` creates a module that takes a `table` of two vectors (or matrices if in batch mode) as input and outputs the cosine distance between them.
@@ -1065,7 +1066,7 @@ end
-<a name="nn.CriterionTable"/>
+<a name="nn.CriterionTable"></a>
## CriterionTable ##
`module` = `CriterionTable(criterion)`
@@ -1115,7 +1116,7 @@ for i = 1, 20 do -- Train for a few iterations
end
```
-<a name="nn.CAddTable"/>
+<a name="nn.CAddTable"></a>
## CAddTable ##
Takes a `table` of `Tensor`s and outputs summation of all `Tensor`s.
@@ -1157,7 +1158,7 @@ m = nn.CAddTable()
```
-<a name="nn.CSubTable"/>
+<a name="nn.CSubTable"></a>
## CSubTable ##
Takes a `table` with two `Tensor` and returns the component-wise
@@ -1174,7 +1175,7 @@ m = nn.CSubTable()
[torch.DoubleTensor of dimension 5]
```
-<a name="nn.CMulTable"/>
+<a name="nn.CMulTable"></a>
## CMulTable ##
Takes a `table` of `Tensor`s and outputs the multiplication of all of them.
@@ -1192,7 +1193,7 @@ m = nn.CMulTable()
```
-<a name="nn.CDivTable"/>
+<a name="nn.CDivTable"></a>
## CDivTable ##
Takes a `table` with two `Tensor` and returns the component-wise
diff --git a/doc/training.md b/doc/training.md
index 016c7c1..1a126d3 100644
--- a/doc/training.md
+++ b/doc/training.md
@@ -1,4 +1,4 @@
-<a name="nn.traningneuralnet.dok"/>
+<a name="nn.traningneuralnet.dok"></a>
# Training a neural network #
Training a neural network is easy with a [simple `for` loop](#nn.DoItYourself).
@@ -7,19 +7,19 @@ want sometimes a quick way of training neural
networks. [StochasticGradient](#nn.StochasticGradient), a simple class
which does the job for you is provided as standard.
-<a name="nn.StochasticGradient.dok"/>
+<a name="nn.StochasticGradient.dok"></a>
## StochasticGradient ##
`StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient
algorithm. This class is [serializable](https://github.com/torch/torch7/blob/master/doc/serialization.md#serialization).
-<a name="nn.StochasticGradient"/>
+<a name="nn.StochasticGradient"></a>
### StochasticGradient(module, criterion) ###
Create a `StochasticGradient` class, using the given [Module](module.md#nn.Module) and [Criterion](criterion.md#nn.Criterion).
The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization.
-<a name="nn.StochasticGradientTrain"/>
+<a name="nn.StochasticGradientTrain"></a>
### train(dataset) ###
Train the module and criterion given in the
@@ -42,7 +42,7 @@ Such a dataset is easily constructed by using Lua tables, but it could any `C` o
for example, as long as required operators/methods are implemented.
[See an example](#nn.DoItStochasticGradient).
-<a name="nn.StochasticGradientParameters"/>
+<a name="nn.StochasticGradientParameters"></a>
### Parameters ###
`StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain).
@@ -54,7 +54,7 @@ for example, as long as required operators/methods are implemented.
* `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`.
* `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration)` as parameters. Default is `nil`.
-<a name="nn.DoItStochasticGradient"/>
+<a name="nn.DoItStochasticGradient"></a>
## Example of training using StochasticGradient ##
We show an example here on a classical XOR problem.
@@ -134,7 +134,7 @@ You should see something like:
[torch.Tensor of dimension 1]
```
-<a name="nn.DoItYourself"/>
+<a name="nn.DoItYourself"></a>
## Example of manual training of a neural network ##
We show an example here on a classical XOR problem.
diff --git a/doc/transfer.md b/doc/transfer.md
index c03017d..6b3be00 100755
--- a/doc/transfer.md
+++ b/doc/transfer.md
@@ -1,8 +1,8 @@
-<a name="nn.transfer.dok"/>
+<a name="nn.transfer.dok"></a>
# Transfer Function Layers #
Transfer functions are normally used to introduce a non-linearity after a parameterized layer like [Linear](simple.md#nn.Linear) and [SpatialConvolution](convolution.md#nn.SpatialConvolution). Non-linearities allows for dividing the problem space into more complex regions than what a simple logistic regressor would permit.
-<a name="nn.HardTanh"/>
+<a name="nn.HardTanh"></a>
## HardTanh ##
Applies the `HardTanh` function element-wise to the input Tensor,
@@ -26,7 +26,7 @@ gnuplot.grid(true)
![](image/htanh.png)
-<a name="nn.HardShrink"/>
+<a name="nn.HardShrink"></a>
## HardShrink ##
`module = nn.HardShrink(lambda)`
@@ -51,7 +51,7 @@ gnuplot.grid(true)
```
![](image/hshrink.png)
-<a name="nn.SoftShrink"/>
+<a name="nn.SoftShrink"></a>
## SoftShrink ##
`module = nn.SoftShrink(lambda)`
@@ -77,7 +77,7 @@ gnuplot.grid(true)
![](image/sshrink.png)
-<a name="nn.SoftMax"/>
+<a name="nn.SoftMax"></a>
## SoftMax ##
Applies the `Softmax` function to an n-dimensional input Tensor,
@@ -99,7 +99,7 @@ gnuplot.grid(true)
Note that this module doesn't work directly with [ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion), which expects the `nn.Log` to be computed between the `SoftMax` and itself. Use [LogSoftMax](#nn.LogSoftMax) instead (it's faster).
-<a name="nn.SoftMin"/>
+<a name="nn.SoftMin"></a>
## SoftMin ##
Applies the `Softmin` function to an n-dimensional input Tensor,
@@ -119,7 +119,7 @@ gnuplot.grid(true)
```
![](image/softmin.png)
-<a name="nn.SoftPlus"/>
+<a name="nn.SoftPlus"></a>
### SoftPlus ###
Applies the `SoftPlus` function to an n-dimensioanl input Tensor.
@@ -138,7 +138,7 @@ gnuplot.grid(true)
```
![](image/softplus.png)
-<a name="nn.SoftSign"/>
+<a name="nn.SoftSign"></a>
## SoftSign ##
Applies the `SoftSign` function to an n-dimensioanl input Tensor.
@@ -156,7 +156,7 @@ gnuplot.grid(true)
```
![](image/softsign.png)
-<a name="nn.LogSigmoid"/>
+<a name="nn.LogSigmoid"></a>
## LogSigmoid ##
Applies the `LogSigmoid` function to an n-dimensional input Tensor.
@@ -176,7 +176,7 @@ gnuplot.grid(true)
![](image/logsigmoid.png)
-<a name="nn.LogSoftMax"/>
+<a name="nn.LogSoftMax"></a>
## LogSoftMax ##
Applies the `LogSoftmax` function to an n-dimensional input Tensor.
@@ -195,7 +195,7 @@ gnuplot.grid(true)
```
![](image/logsoftmax.png)
-<a name="nn.Sigmoid"/>
+<a name="nn.Sigmoid"></a>
## Sigmoid ##
Applies the `Sigmoid` function element-wise to the input Tensor,
@@ -214,7 +214,7 @@ gnuplot.grid(true)
```
![](image/sigmoid.png)
-<a name="nn.Tanh"/>
+<a name="nn.Tanh"></a>
## Tanh ##
Applies the `Tanh` function element-wise to the input Tensor,
@@ -231,7 +231,7 @@ gnuplot.grid(true)
```
![](image/tanh.png)
-<a name="nn.ReLU"/>
+<a name="nn.ReLU"></a>
## ReLU ##
Applies the rectified linear unit (`ReLU`) function element-wise to the input Tensor,
@@ -253,7 +253,7 @@ gnuplot.grid(true)
```
![](image/relu.png)
-<a name="nn.PReLU"/>
+<a name="nn.PReLU"></a>
## PReLU ##
Applies parametric ReLU, which parameter varies the slope of the negative part:
@@ -267,7 +267,7 @@ Note that weight decay should not be used on it. For reference see http://arxiv.
![](image/prelu.png)
-<a name="nn.AddConstant"/>
+<a name="nn.AddConstant"></a>
## AddConstant ##
Adds a (non-learnable) scalar constant. This module is sometimes useful for debuggging purposes: `f(x)` = `x + k`, where `k` is a scalar.
@@ -278,7 +278,7 @@ m=nn.AddConstant(k,true) -- true = in-place, false = keeping separate state.
```
In-place mode restores the original input value after the backward pass, allowing it's use after other in-place modules, like [MulConstant](#nn.MulConstant).
-<a name="nn.MulConstant"/>
+<a name="nn.MulConstant"></a>
## MulConstant ##
Multiplies input tensor by a (non-learnable) scalar constant. This module is sometimes useful for debuggging purposes: `f(x)` = `k * x`, where `k` is a scalar.
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..f38456d
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,18 @@
+site_name: nn
+theme : simplex
+repo_url : https://github.com/torch/nn
+use_directory_urls : false
+markdown_extensions: [extra]
+docs_dir : doc
+pages:
+- [index.md, Home]
+- [module.md, Modules, Module Interface]
+- [containers.md, Modules, Containers]
+- [transfer.md, Modules, Transfer Functions]
+- [simple.md, Modules, Simple Layers]
+- [table.md, Modules, Table Layers]
+- [convolution.md, Modules, Convolution Layers]
+- [criterion.md, Criterion, Criterions]
+- [overview.md, Additional Documentation, Overview]
+- [training.md, Additional Documentation, Training]
+- [testing.md, Additional Documentation, Testing]