diff options
author | Nicholas Leonard <nleonard@twitter.com> | 2017-05-12 22:15:39 +0300 |
---|---|---|
committer | Nicholas Leonard <nleonard@twitter.com> | 2017-05-12 22:15:39 +0300 |
commit | a6088d4c08258d211aa1e37519f535ad7af9f08b (patch) | |
tree | e534fc3f1f1192102fd4b5b25974abe6d4d7f9f2 /doc | |
parent | 5f1f7a267405d86dc3c392dc0d8e5442a5e7908c (diff) |
ClassNLLCriterion supports missing targets
Diffstat (limited to 'doc')
-rw-r--r-- | doc/criterion.md | 22 |
1 files changed, 16 insertions, 6 deletions
diff --git a/doc/criterion.md b/doc/criterion.md index 0883b24..a3e1b2e 100644 --- a/doc/criterion.md +++ b/doc/criterion.md @@ -95,10 +95,10 @@ criterion.sizeAverage = false ## ClassNLLCriterion ## ```lua -criterion = nn.ClassNLLCriterion([weights]) +criterion = nn.ClassNLLCriterion([weights, sizeAverage, ignoreIndex]) ``` -The negative log likelihood criterion. It is useful to train a classification problem with `n` classes. +The negative log likelihood (NLL) criterion. It is useful to train a classification problem with `n` classes. If provided, the optional argument `weights` should be a 1D `Tensor` assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set. @@ -113,11 +113,21 @@ The loss can be described as: loss(x, class) = -x[class] ``` -or in the case of the `weights` argument it is specified as follows: +or in the case of the `weights` argument, it is specified as follows: ```lua loss(x, class) = -weights[class] * x[class] ``` -Due to the behaviour of the backend code, it is necessary to set sizeAverage to false when calculating losses *in non-batch mode*. + +or in the case of the `ignoreIndex` argument: +``` +loss(x, class) = class != ignoreIndex ? -weights[class] * x[class] : 0 +``` + +Indeed, the `ignoreIndex` (defaults to -100) specifies a value for targets to be ignored. +The commensurate `gradInput` for that target will be zero. +When `sizeAverage=true` (the default), the `gradInput` and `output` are averaged over non-ignored targets. + +Due to the behaviour of the backend code, it is necessary to set `sizeAverage` to false when calculating losses *in non-batch mode*. The following is a code fragment showing how to make a gradient step given an input `x`, a desired output `y` (an integer `1` to `n`, in this case `n = 2` classes), a network `mlp` and a learning rate `learningRate`: @@ -133,7 +143,7 @@ function gradUpdate(mlp, x, y, learningRate) end ``` -By default, the losses are averaged over observations for each minibatch. However, if the field `sizeAverage` is set to `false`, the losses are instead summed for each minibatch. +By default, the losses are averaged over observations for each minibatch. However, if the argument `sizeAverage` is set to `false`, the losses are instead summed for each minibatch. <a name="nn.CrossEntropyCriterion"></a> @@ -758,7 +768,7 @@ Sample example tripleModel = nn.ParallelTable() tripleModel:add(embeddingModel) - tripleModel:add(embeddingModel:clone('weight', 'bias', + tripleModel:add(embeddingModel:clone('weight', 'bias', 'gradWeight', 'gradBias')) tripleModel:add(embeddingModel:clone('weight', 'bias', 'gradWeight', 'gradBias')) |