diff options
author | Hugh Perkins <hughperkins@gmail.com> | 2016-05-16 16:51:52 +0300 |
---|---|---|
committer | Hugh Perkins <hughperkins@gmail.com> | 2016-05-16 16:52:44 +0300 |
commit | 0594d54443b22c43bd8b5e9e5fcc80361f6e923a (patch) | |
tree | 2a641b9a13741f39186962f51f35fa81875d8d54 | |
parent | 9f64ffa3e20c73ada3ab8c1564eb821e719f0155 (diff) |
replace the StochasticGradient example with optim example
-rw-r--r-- | doc/training.md | 151 |
1 files changed, 75 insertions, 76 deletions
diff --git a/doc/training.md b/doc/training.md index 8125ee0..063e08b 100644 --- a/doc/training.md +++ b/doc/training.md @@ -4,102 +4,101 @@ Training a neural network is easy with a [simple `for` loop](#nn.DoItYourself). While doing your own loop provides great flexibility, you might want sometimes a quick way of training neural -networks. [StochasticGradient](#nn.StochasticGradient), a simple class -which does the job for you is provided as standard. +networks. [optim](https://github.com/torch/optim) is the standard way of training Torch7 neural networks. -<a name="nn.StochasticGradient.dok"></a> -## StochasticGradient ## +`optim` is a quite general optimizer, for minimizing any function that outputs a loss. In our case, our +function will be the loss of our network, given an input, and a set of weights. The goal of training +a neural net is to +optimize the weights to give the lowest loss over our training set of input data. So, we are going to use optim +to minimize the loss with respect to the weights, over our training set. We will feed the data to +`optim` in minibatches. For this particular example, we will use just one minibatch, but in your own training +you will almost certainly want to break your training set into minibatches, and feed each minibatch to `optim`, +one by one. -`StochasticGradient` is a high-level class for training [neural networks](#nn.Module), using a stochastic gradient -algorithm. This class is [serializable](https://github.com/torch/torch7/blob/master/doc/serialization.md#serialization). +We need to give `optim` a function that will output the loss and the derivative of the loss with respect to the +weights, given a set of input weights. The function will have access to our training minibatch, and use this +to calculate the loss, for this minibatch. Typically, the function would be defined inside our loop over +batches, and therefore have access to the current minibatch data. -<a name="nn.StochasticGradient"></a> -### StochasticGradient(module, criterion) ### +Here's how this looks: -Create a `StochasticGradient` class, using the given [Module](module.md#nn.Module) and [Criterion](criterion.md#nn.Criterion). -The class contains [several parameters](#nn.StochasticGradientParameters) you might want to set after initialization. - -<a name="nn.StochasticGradientTrain"></a> -### train(dataset) ### - -Train the module and criterion given in the -[constructor](#nn.StochasticGradient) over `dataset`, using the -internal [parameters](#nn.StochasticGradientParameters). - -StochasticGradient expect as a `dataset` an object which implements the operator -`dataset[index]` and implements the method `dataset:size()`. The `size()` methods -returns the number of examples and `dataset[i]` has to return the i-th example. - -An `example` has to be an object which implements the operator -`example[field]`, where `field` might take the value `1` (input features) -or `2` (corresponding label which will be given to the criterion). -The input is usually a Tensor (except if you use special kind of gradient modules, -like [table layers](table.md#nn.TableLayers)). The label type depends of the criterion. -For example, the [MSECriterion](criterion.md#nn.MSECriterion) expects a Tensor, but the -[ClassNLLCriterion](criterion.md#nn.ClassNLLCriterion) except a integer number (the class). +__Neural Network__ -Such a dataset is easily constructed by using Lua tables, but it could any `C` object -for example, as long as required operators/methods are implemented. -[See an example](#nn.DoItStochasticGradient). +We create a simple neural network with one hidden layer. +```lua +require 'nn' -<a name="nn.StochasticGradientParameters"></a> -### Parameters ### +local model = nn.Sequential(); -- make a multi-layer perceptron +local inputs = 2; outputs = 1; HUs = 20; -- parameters +model:add(nn.Linear(inputs, HUs)) +model:add(nn.Tanh()) +model:add(nn.Linear(HUs, outputs)) +``` -`StochasticGradient` has several field which have an impact on a call to [train()](#nn.StochasticGradientTrain). +__Criterion__ - * `learningRate`: This is the learning rate used during training. The update of the parameters will be `parameters = parameters - learningRate * parameters_gradient`. Default value is `0.01`. - * `learningRateDecay`: The learning rate decay. If non-zero, the learning rate (note: the field learningRate will not change value) will be computed after each iteration (pass over the dataset) with: `current_learning_rate =learningRate / (1 + iteration * learningRateDecay)` - * `maxIteration`: The maximum number of iteration (passes over the dataset). Default is `25`. - * `shuffleIndices`: Boolean which says if the examples will be randomly sampled or not. Default is `true`. If `false`, the examples will be taken in the order of the dataset. - * `hookExample`: A possible hook function which will be called (if non-nil) during training after each example forwarded and backwarded through the network. The function takes `(self, example)` as parameters. Default is `nil`. - * `hookIteration`: A possible hook function which will be called (if non-nil) during training after a complete pass over the dataset. The function takes `(self, iteration, currentError)` as parameters. Default is `nil`. +We choose the Mean Squared Error criterion and train the dataset. +```lua +local criterion = nn.MSECriterion() +``` -<a name="nn.DoItStochasticGradient"></a> -## Example of training using StochasticGradient ## +__Dataset__ -We show an example here on a classical XOR problem. +We will just create one minibatch of 128 examples. In your own networks, you'd want to break down your +rather larger dataset into multiple minibatches, of around 32-512 examples each. -__Dataset__ +``` +local batchSize = 128 +local batchInputs = torch.Tensor(batchSize, inputs) +local batchLabels = torch.ByteTensor(batchSize) -We first need to create a dataset, following the conventions described in -[StochasticGradient](#nn.StochasticGradientTrain). -```lua -dataset={}; -function dataset:size() return 100 end -- 100 examples -for i=1,dataset:size() do - local input = torch.randn(2); -- normally distributed example in 2d - local output = torch.Tensor(1); +for i=1,batchSize do + local input = torch.randn(2) -- normally distributed example in 2d + local label = 1 if input[1]*input[2]>0 then -- calculate label for XOR function - output[1] = -1; - else - output[1] = 1 + label = -1; end - dataset[i] = {input, output} + batchInputs[i]:copy(input) + batchLabels[i] = label end ``` -__Neural Network__ - -We create a simple neural network with one hidden layer. -```lua -require "nn" -mlp = nn.Sequential(); -- make a multi-layer perceptron -inputs = 2; outputs = 1; HUs = 20; -- parameters -mlp:add(nn.Linear(inputs, HUs)) -mlp:add(nn.Tanh()) -mlp:add(nn.Linear(HUs, outputs)) -``` - __Training__ -We choose the Mean Squared Error criterion and train the dataset. -```lua -criterion = nn.MSECriterion() -trainer = nn.StochasticGradient(mlp, criterion) -trainer.learningRate = 0.01 -trainer:train(dataset) -``` +`optim` provides []various training algorithms](https://github.com/torch/optim/blob/master/doc/index.md). We +will use [Stochastic Gradient Descent](https://github.com/torch/optim/blob/master/doc/index.md#x-sgdopfunc-x-state). We +need to provide the learning rate, via an optimization state table: +``` +require 'optim' + +local optimState = {learningRate=0.01} + +-- retrieve the weights and biases from the model, as 1-dimensional flattened tensors +-- these are views onto the underlying weights and biases, and we will give them to optim +-- When optim updates these params, it is implicitly updating the weights and biases of our +-- models +local params, gradParams = model:getParameters() +for epoch=1,50 do + -- local function we give to optim + -- it takes current weights as input, and outputs the loss + -- and the gradient of the loss with respect to the weights + -- gradParams is calculated implicitly by calling 'backward' + -- because gradParams is a view onto the model's weight and bias + -- gradients tensor + local function feval(params) + gradParams:zero() + + local outputs = model:forward(batchInputs) + local loss = criterion:forward(outputs, batchLabels) + local dloss_doutput = criterion:backward(outputs, batchLabels) + model:backward(batchInputs, dloss_doutput) + + return loss,gradParams + end + optim.sgd(feval, params, optimState) +end +``` __Test the network__ ```lua |