Age | Commit message (Collapse) | Author |
|
This PR turns the LSH index and search into a set of operators that live in the expression graph. This makes creation etc. thread-safe (one index per graph) and allows to later implement GPU versions.
This allows to mmap the LSH as a Marian parameter since now we only need to turn the index into something that can be saved to disk using the existing tensors. This happens in marian_conv or the equivalent interface function in the Quicksand interface.
|
|
|
|
* Doxygen structure for expression graph operators
* Document arithmetic expression operations
* Document comparison expression operations
* Document exp/log and trig operations
* Add missing implementation for cos/tan
* Document expression manipulation operations
* Document misc math operations
* Overview of operators
* Document activation functions
* Document element-wise min/max
* Document debugging/checkpoint operators
* Document topk/argmin/argmax operations
* Document index-based operations
* Document reduction operations
* Document lambda expression operators
* Document product operations
* Document softmax, cross-entropy, unlikelihood operations
* Document dropout operations
* Document scalar product and weighted average operations
* Document layer normalization, highway and pooling operations
* Document shift expression operator
* Extra details on rules for adding specializations to .inc files
* Add SinNodeOp example for specialization documentation
* Additional details in tensor operator documentation
* Remove brief command from doxygen comments
* Prefer @ style doxygen functions to \
* Document n-ary function macros
* Enable .cu and .inc files in documentation
* Add a comment about ONNX mapping
* Remove empty lines in doxygen
* Update CHANGELOG
Co-authored-by: Roman Grundkiewicz <rgrundkiewicz@gmail.com>
|
|
This PR refactors the training graph groups and optimizers to enable and simplify things for fp16 support.
Deprecates old unused graph groups and fixes a couple of MPI issues.
|
|
This branch adds functionality to export ONNX models (with limitations).
|
|
These are minor comments/fixes I found when doing my ONNX prototype, would be good to get them out of the way
|
|
LSH-based short-list replacement
* Add tuple nodes via views and trickery
* Add `topk` operator, currently unused outside unit tests
* Add `abs` operator, currently unused outside unit tests
* Change return type of `Node::allocate()` to `void`. This used to return the number of allocated elements, but isn't really used anywhere. To avoid future confusion of elements and bytes, removed for now.
|
|
* Clears cache for RNN object in transformer, otherwise stale tensor might be kept around.
* Add missing `hash()` and `equal` functions everywhere.
* Fixes bug from deployment test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
search must reshape first step correctly
|
|
|
|
|
|
This PR adds the `logsumexp()` reduction, that is,
y = log(sum_j exp(x_i))
With this, `logsoftmax(z, ax)` can now be written as `z - logsumexp(z, ax)`. I need this for factored projections.
The PR merges the near-duplicates `sum()` and `mean()` into a single `ReduceNodeOpCode`, which, for good measure, I extended to also implement additional reductions. Since now we need additional reduction operations besides the sum, this PR changes the current `functional::Add()` operation into an `functional::Aggregate()` operation that takes a second `Functor` for the reduction operation.
This made it straight-forward to implement a whole range of reduction operations (the names are the same as Numpy):
* `sum()`
* `mean()`
* `std()`
* `var()`
* `min()`
* `max()`
* `logsumexp()`
I just noticed that I forgot the gradient for `prod()`.
Operator tests have been added and pass.
NOTE: There are no gradient tests. Please review the gradients carefully. I will test `logsumexp()` by replacing `logsoftmax` by the above formula in training.
Related work items: #98143
|
|
|
|
(axis before arg)
|
|
https://machinetranslation.visualstudio.com/DefaultCollection/Marian/_git/marian-dev into fseide/factoredembeddings
|
|
|
|
them to RowsNodeOp or ColsNodeOp;
tests updated accordingly;
bug fix: missed an axis normalization;
bug fix: ReshapeNodeOp should pass on the value_type as to allow reshaping IndexType tensors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
bug fix: SliceViewNodeOp should use correct size for memory piece;
new operation stopGradient()
|
|
minibatches are now fed in GPU-sized chunks rather than a massive joint batch for all GPUs in the update;
Adam hyper-parameter adjustment limited to learning rate, as momentum adjustment is counterproductive for MB scaling;
log output now includes the last batch size;
log output now shows current best for stalled validation metrics;
bug fix: Adam optimizer should persist denominators;
bug fix: Adam and Adagrad should use correct element size when persisting;
min and max renamed to minimum and maximum, for consistency with other toolkits;
pathie now compiles in manual VS Project
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|