diff options
author | Soumith Chintala <soumith@gmail.com> | 2017-05-21 20:48:19 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2017-05-21 20:48:19 +0300 |
commit | 78aac1a015ebba0655a7fdad8a4a09419b68da67 (patch) | |
tree | e534fc3f1f1192102fd4b5b25974abe6d4d7f9f2 /doc | |
parent | 482537275df7fde77cc4dcc1d93de33cbfafde9f (diff) |
Revert "Revert "ClassNLLCriterion supports missing targets""revert-1217-revert-1215-ClassNLLCriterion-missing-target
Diffstat (limited to 'doc')
-rw-r--r-- | doc/criterion.md | 22 |
1 files changed, 16 insertions, 6 deletions
diff --git a/doc/criterion.md b/doc/criterion.md index 0883b24..a3e1b2e 100644 --- a/doc/criterion.md +++ b/doc/criterion.md @@ -95,10 +95,10 @@ criterion.sizeAverage = false ## ClassNLLCriterion ## ```lua -criterion = nn.ClassNLLCriterion([weights]) +criterion = nn.ClassNLLCriterion([weights, sizeAverage, ignoreIndex]) ``` -The negative log likelihood criterion. It is useful to train a classification problem with `n` classes. +The negative log likelihood (NLL) criterion. It is useful to train a classification problem with `n` classes. If provided, the optional argument `weights` should be a 1D `Tensor` assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set. @@ -113,11 +113,21 @@ The loss can be described as: loss(x, class) = -x[class] ``` -or in the case of the `weights` argument it is specified as follows: +or in the case of the `weights` argument, it is specified as follows: ```lua loss(x, class) = -weights[class] * x[class] ``` -Due to the behaviour of the backend code, it is necessary to set sizeAverage to false when calculating losses *in non-batch mode*. + +or in the case of the `ignoreIndex` argument: +``` +loss(x, class) = class != ignoreIndex ? -weights[class] * x[class] : 0 +``` + +Indeed, the `ignoreIndex` (defaults to -100) specifies a value for targets to be ignored. +The commensurate `gradInput` for that target will be zero. +When `sizeAverage=true` (the default), the `gradInput` and `output` are averaged over non-ignored targets. + +Due to the behaviour of the backend code, it is necessary to set `sizeAverage` to false when calculating losses *in non-batch mode*. The following is a code fragment showing how to make a gradient step given an input `x`, a desired output `y` (an integer `1` to `n`, in this case `n = 2` classes), a network `mlp` and a learning rate `learningRate`: @@ -133,7 +143,7 @@ function gradUpdate(mlp, x, y, learningRate) end ``` -By default, the losses are averaged over observations for each minibatch. However, if the field `sizeAverage` is set to `false`, the losses are instead summed for each minibatch. +By default, the losses are averaged over observations for each minibatch. However, if the argument `sizeAverage` is set to `false`, the losses are instead summed for each minibatch. <a name="nn.CrossEntropyCriterion"></a> @@ -758,7 +768,7 @@ Sample example tripleModel = nn.ParallelTable() tripleModel:add(embeddingModel) - tripleModel:add(embeddingModel:clone('weight', 'bias', + tripleModel:add(embeddingModel:clone('weight', 'bias', 'gradWeight', 'gradBias')) tripleModel:add(embeddingModel:clone('weight', 'bias', 'gradWeight', 'gradBias')) |