Bump version to 1.16.1v1.16.1

author: Guillaume Klein <guillaume.klein@systrangroup.com> 2020-11-23 14:14:16 +0300
committer: Guillaume Klein <guillaume.klein@systrangroup.com> 2020-11-23 14:14:16 +0300
commit: 129047ea7975f73747c6448211db24c293e44da0 (patch)
tree: 0e887b5245d95982dddc9f4873216d034ff08a5e
parent: 7c54f53242da9b79e2d61452e9f057a320a60ad1 (diff)
2 files changed, 10 insertions, 1 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 867c7318..fbfffec7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,15 @@
 
 ### Fixes and improvements
 
+## [v1.16.1](https://github.com/OpenNMT/CTranslate2/releases/tag/v1.16.1) (2020-11-23)
+
+### Fixes and improvements
+
+* Fuse dequantization and bias addition on GPU for improved INT8 performance
+* Improve performance of masked softmax on GPU
+* Fix error when building the CentOS 7 GPU Docker image
+* The previous version listed "Pad size of INT8 matrices to a multiple of 16 when the GPU has INT8 Tensor Cores". However, the padding was not applied due to a bug and fixing it degraded the performance, so this behavior is not implemented for now.
+
 ## [v1.16.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v1.16.0) (2020-11-18)
 
 ### Changes
diff --git a/python/setup.py b/python/setup.py
index 4df18aba..918ce75e 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -35,7 +35,7 @@ ctranslate2_module = Extension(
 
 setup(
     name="ctranslate2",
-    version="1.16.0",
+    version="1.16.1",
     license="MIT",
     description="Fast inference engine for OpenNMT models",
     long_description=_get_long_description(),
author	Guillaume Klein <guillaume.klein@systrangroup.com>	2020-11-23 14:14:16 +0300
committer	Guillaume Klein <guillaume.klein@systrangroup.com>	2020-11-23 14:14:16 +0300
commit	129047ea7975f73747c6448211db24c293e44da0 (patch)
tree	0e887b5245d95982dddc9f4873216d034ff08a5e
parent	7c54f53242da9b79e2d61452e9f057a320a60ad1 (diff)