ClassNLLCriterion supports missing targets

author: Nicholas Leonard <nleonard@twitter.com> 2017-05-12 22:15:39 +0300
committer: Nicholas Leonard <nleonard@twitter.com> 2017-05-12 22:15:39 +0300
commit: a6088d4c08258d211aa1e37519f535ad7af9f08b (patch)
tree: e534fc3f1f1192102fd4b5b25974abe6d4d7f9f2 /doc
parent: 5f1f7a267405d86dc3c392dc0d8e5442a5e7908c (diff)
1 files changed, 16 insertions, 6 deletions
diff --git a/doc/criterion.md b/doc/criterion.md
index 0883b24..a3e1b2e 100644
--- a/doc/criterion.md
+++ b/doc/criterion.md
@@ -95,10 +95,10 @@ criterion.sizeAverage = false
 ## ClassNLLCriterion ##
 
 ```lua
-criterion = nn.ClassNLLCriterion([weights])
+criterion = nn.ClassNLLCriterion([weights, sizeAverage, ignoreIndex])
 ```
 
-The negative log likelihood criterion. It is useful to train a classification problem with `n` classes.
+The negative log likelihood (NLL) criterion. It is useful to train a classification problem with `n` classes.
 If provided, the optional argument `weights` should be a 1D `Tensor` assigning weight to each of the classes.
 This is particularly useful when you have an unbalanced training set.
 
@@ -113,11 +113,21 @@ The loss can be described as:
 loss(x, class) = -x[class]
 ```
 
-or in the case of the `weights` argument it is specified as follows:
+or in the case of the `weights` argument, it is specified as follows:
 ```lua
 loss(x, class) = -weights[class] * x[class]
 ```
-Due to the behaviour of the backend code, it is necessary to set sizeAverage to false when calculating losses *in non-batch mode*.
+
+or in the case of the `ignoreIndex` argument:
+```
+loss(x, class) = class != ignoreIndex ? -weights[class] * x[class] : 0
+```
+
+Indeed, the `ignoreIndex` (defaults to -100) specifies a value for targets to be ignored.
+The commensurate `gradInput` for that target will be zero.
+When `sizeAverage=true` (the default), the `gradInput` and `output` are averaged over non-ignored targets.
+
+Due to the behaviour of the backend code, it is necessary to set `sizeAverage` to false when calculating losses *in non-batch mode*.
 
 The following is a code fragment showing how to make a gradient step given an input `x`, a desired output `y` (an integer `1` to `n`, in this case `n = 2` classes), a network `mlp` and a learning rate `learningRate`:
 
@@ -133,7 +143,7 @@ function gradUpdate(mlp, x, y, learningRate)
 end
 ```
 
-By default, the losses are averaged over observations for each minibatch. However, if the field `sizeAverage` is set to `false`, the losses are instead summed for each minibatch.
+By default, the losses are averaged over observations for each minibatch. However, if the argument `sizeAverage` is set to `false`, the losses are instead summed for each minibatch.
 
 
 <a name="nn.CrossEntropyCriterion"></a>
@@ -758,7 +768,7 @@ Sample example
 
    tripleModel = nn.ParallelTable()
    tripleModel:add(embeddingModel)
-   tripleModel:add(embeddingModel:clone('weight', 'bias', 
+   tripleModel:add(embeddingModel:clone('weight', 'bias',
                                         'gradWeight', 'gradBias'))
    tripleModel:add(embeddingModel:clone('weight', 'bias',
                                         'gradWeight', 'gradBias'))
author	Nicholas Leonard <nleonard@twitter.com>	2017-05-12 22:15:39 +0300
committer	Nicholas Leonard <nleonard@twitter.com>	2017-05-12 22:15:39 +0300
commit	a6088d4c08258d211aa1e37519f535ad7af9f08b (patch)
tree	e534fc3f1f1192102fd4b5b25974abe6d4d7f9f2 /doc
parent	5f1f7a267405d86dc3c392dc0d8e5442a5e7908c (diff)