Age | Commit message (Collapse) | Author |
|
* Rename TranslatorPool -> Translator and GeneratorPool -> Generator
* Fix option help
* Fix include
* Add ModelLoader helper class
* Reduce diff
|
|
* Remove options normalize_scores and allow_early_exit
* Update test output
* Also apply penalties in greedy search to get consistent scores
* Remove commented code
|
|
* Integrate the Whisper model from OpenAI
* Reformat test file
* Install cuDNN devel package
* Return Numpy tensors in example
* Push missing __init__.py file
* Add datasets module in test requirements
* Add decoded audio test file in the repo
* Enable REORDER primitive
|
|
* Add Conv1D operator and layer
* Fix typo in CMakeLists.txt
* Skip the tests when the backend is not available
* Add a separate CMake option for cuDNN
* Minor code cleanup
|
|
|
|
|
|
* Generalize disable_unk implementation to support a list of tokens
* Update test
|
|
|
|
* Get more candidates from topk to replace finished hypotheses
* Consider when the number of candidates is larger than the vocab size
|
|
* Improve support of restricted vocabulary in decoding
* Move mapping to the decoder
* Check before updating ids
* Verify that prefix tokens are included in the restricted vocab
* Restore comment
|
|
|
|
* Add option to not expand unlikely alternatives
* Reword
* Revert docstring change
* Simply resize state
|
|
|
|
* Add option no_repeat_ngram_size to prevent repetitions of ngrams
* Remove debug argument
* Update blocks/threads configuration
* Implement a more generic primitive
|
|
* Factorize layer creation
* Fix function signature
* Add tests
* Minor diff cleanup
|
|
* Define float16_t type in public headers
The type was hidden in commit
https://github.com/OpenNMT/CTranslate2/commit/b3038b859ed014058033d576157dcf4ad4ece9e4,
but it is actually required by some public functions.
* Cleanup install rule
|
|
|
|
|
|
* Fix crash of scoring methods on empty inputs
* Fix target check
|
|
* Define ModelReplica to dissociate the model runtime from the weights
* Keep cached attributes
* Fix diff
* Remove duplicated include
|
|
* Cache output layer transformations when possible
* Update removal of excluded ids
* Fix when the output size is the same when the exclude ids changed
* Minor cleanup
|
|
* Manage empty inputs in sample method
* Cleanup JobCreator methods
* Update test
* Also handle return_alternatives case in sample method
|
|
* Improve correctness of the scoring output
* Include score of EOS token
* Return the actual tokens that were scored (after vocabulary lookup
and sequence truncation)
* Update doc
|
|
Only the decoding logic cannot run in batch mode at the moment.
|
|
* Add Swish activation
* Dispatch inside parallel for
* Remove benchmark code
|
|
* Round value before cast in quantization
* Update quantization formula in documentation
* Add missing variables in lambda capture
* Save binary version in named variable
* Defer implicit cast to value assignment
* Fix Python test
* Remove non needed changes
|
|
|
|
* Generalize batching to support additional input streams
* Use vector as argument
|
|
|
|
* Add a CUDA kernel for Multinomial op
The current implementation can only return a single sample per batch.
* Install libcurand-devel to build the Python wheels
|
|
* Reduce multi-head attention based on options alignment_{layer,heads}
* Factorize number of threads
* Improve variable name
|
|
* Improve implementation of the repetition penalty
* Do not support repetition penalty with vmap for now
* Apply penalty on logits during greedy search to match HF implementation
* Also apply penalty on logits during beam search for consistency
* Add test case
* Cleanup test case
|
|
|
|
|
|
* Add repetition penalty in beam search
* Fix test compilation
* Remove parallel for in CPU primitive
|
|
|
|
The normalized score was incorrect in the following case:
* in greedy search when `max_decoding_length` is reached
* in `return_alternatives` mode
|
|
The in-place implementation should only be selected when the ids are
strictly increasing, not just increasing.
|
|
* Add CUDA implementation for the Tile operator
* Simplify shape manipulation in expand_to_beam_size
* Cleanup index computation
|
|
|
|
* Add a scoring API
* Add missing include
* Adapt initial state for non iterative decoding
* Add missing include
* Pass missing argument
* Revert ScopedDeviceSetter changes in this branch
* Add methods to get the aggregated score in ScoringResult
* Rename function to get_results_from_futures
* Update profile name to match function name
* Move check in while loop condition
* Factorize readers function
* Factorize Python callables wrapper
* Assign default values to sample method arguments
* Add JobCreator class to specialize job creation
* Factorize logic of job creation
* Return statistics from file scoring
* Fix indent
* Fix indent
* Update class comments
|
|
The SoftMax op should not assume how to broadcast the length vector.
|
|
|
|
|
|
|
|
|
|
* Add a translation wrapper that buffers and batches incoming inputs
* Notify outside the lock
|
|
This reverts commit 6f949c07991a41a2663008201d3ffb225276e753.
|
|
|
|
In some cases this can improve the beam search output.
|