Documentation!

author: kpu <github@kheafield.com> 2018-06-23 20:58:44 +0300
committer: kpu <github@kheafield.com> 2018-06-23 20:58:44 +0300
commit: 128745c0f00b2fb1fb4c164256400cc585524e32 (patch)
tree: 19994cd10db9b9e4befad8fefa48cb6b0497d94e /README.md
parent: c4830ebb8fde2f2079118b7999fe2c03fa6f4a4a (diff)
1 files changed, 74 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..45cb3ae
--- /dev/null
+++ b/README.md
@@ -0,0 +1,74 @@
+# Integer Matrix Multiplication
+
+This repository implements 8-bit and 16-bit matrix multiplication:
+
+C = A * B
+
+It's designed with neural network inference in mind: A is typically activations, B is typically fixed parameters, and C is activations for the next layer.
+
+A can have any number of rows.  Typically this is a batch size.
+The shared dimension, A's columns and B's rows, must be a multiple of 32 (for 16-bit) or 64 (for 8-bit).
+B's columns must be a multiple of 8.
+
+## Accuracy
+16-bit multiplication accumulates into 32-bit integers WITHOUT SATURATION (because there is no 32-bit add with saturation). If width is too large (i.e. >2048) or many 16-bit values are large, there is substantial risk of overflow.  Choose a smaller quantization multiplier to scale things down or implement periodic upcasting to 64-bit for me.
+
+8-bit multiplication accumulates into 16-bit integers with saturation.  This saturates for larger widths (~1024) and is worst on SSSE3 because it accumulates in fewer values.  It's possible to upcast to 32-bit every so often, but this has not been implemented yet.
+
+## Usage
+
+A full example appears in [example.cc](example.cc).
+
+Both A and B should be prepared before multiplication.
+```C++
+#include "intgemm.h"
+
+/* Not shown: allocate 64-byte aligned memory with e.g. aligned_alloc.
+ * Fill A and B.
+ */
+/* Prepare A for multiplication.  This might be offline or on the fly. */
+intgemm::Generic_16bit::PrepareA(A, A_prepared, quant_mult, A_rows, width);
+/* Prepare B for multiplication.  This is typically done offline. */
+intgemm::Generic_16bit::PrepareB(B, B_prepared, quant_mult, width, B_cols);
+/* Multiply and produce results in C */
+intgemm::Generic_16bit::Multiply(A_prepared, B_prepared, C, 1.0 / (quant_mult * quant_mult), A_rows, width, B_cols);
+```
+For 8-bit, use `Generic_8bit` instead of `Generic_16bit`.
+
+When repesented as floats, all of A, B, and C are in row-major format.
+
+## Quantization
+Floating-point values are multiplied by a user-specified constant then rounded to an integer.  
+
+In 16 bit, Jacob Devlin recommends 1024.0 for neural networks to prevent the aforementioned overflow.
+
+In 8 bit, use 127.0 / the largest value.  Quantization will saturate so it's possible to use larger multipliers to obtain clipping.
+
+## Acknowledgments
+The original 16-bit SSE2 code came from:
+
+Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU by Jacob Devlin
+https://arxiv.org/abs/1705.01991
+
+Under a license:
+
+Copyright (c) 2017 Microsoft Corporation
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
author	kpu <github@kheafield.com>	2018-06-23 20:58:44 +0300
committer	kpu <github@kheafield.com>	2018-06-23 20:58:44 +0300
commit	128745c0f00b2fb1fb4c164256400cc585524e32 (patch)
tree	19994cd10db9b9e4befad8fefa48cb6b0497d94e /README.md
parent	c4830ebb8fde2f2079118b7999fe2c03fa6f4a4a (diff)