int8 specialization for AVX2 Quantize routine (#120)

Summary: This adds a specialization for `int8` to the AVX2 `Quantize` routine. I tried also adding a specialization for `int32` (the final datatype we support in PyTorch quantization), but it seemed to introduce numerical issues stemming from the difference in implementations: https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L63 vs https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtilsAvx2.cc#L82 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/120 Reviewed By: driazati Differential Revision: D17115198 Pulled By: jamesr66a fbshipit-source-id: 119145bb99235a7545389afa61483060200cc2b7
author: James Reed <jamesreed@fb.com> 2019-08-29 21:11:31 +0300
committer: Facebook Github Bot <facebook-github-bot@users.noreply.github.com> 2019-08-29 21:26:05 +0300
commit: d4bfa96cdaab5c69dfee4d4c9daa9d565435cb2d (patch)
tree: a438ba751cf7495bea71711a374f90ef965f0561 /include
parent: 280fa17349b763eb474c423a6d1172f81df29103 (diff)
1 files changed, 2 insertions, 1 deletions
diff --git a/include/fbgemm/QuantUtilsAvx2.h b/include/fbgemm/QuantUtilsAvx2.h
index 47f33a8..a001004 100644
--- a/include/fbgemm/QuantUtilsAvx2.h
+++ b/include/fbgemm/QuantUtilsAvx2.h
@@ -40,9 +40,10 @@ struct FBGEMM_API RequantizationParams {
 ////////////////////////////////////////////////////////////////////////////////
 // Utility functions
 
+template <typename T=std::uint8_t>
 void QuantizeAvx2(
     const float* src,
-    std::uint8_t* dst,
+    T* dst,
     int len,
     const TensorQuantizationParams& qparams);
author	James Reed <jamesreed@fb.com>	2019-08-29 21:11:31 +0300
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>	2019-08-29 21:26:05 +0300
commit	d4bfa96cdaab5c69dfee4d4c9daa9d565435cb2d (patch)
tree	a438ba751cf7495bea71711a374f90ef965f0561 /include
parent	280fa17349b763eb474c423a6d1172f81df29103 (diff)