diff options
author | James Reed <jamesreed@fb.com> | 2019-08-29 21:11:31 +0300 |
---|---|---|
committer | Facebook Github Bot <facebook-github-bot@users.noreply.github.com> | 2019-08-29 21:26:05 +0300 |
commit | d4bfa96cdaab5c69dfee4d4c9daa9d565435cb2d (patch) | |
tree | a438ba751cf7495bea71711a374f90ef965f0561 /include | |
parent | 280fa17349b763eb474c423a6d1172f81df29103 (diff) |
int8 specialization for AVX2 Quantize routine (#120)
Summary:
This adds a specialization for `int8` to the AVX2 `Quantize` routine.
I tried also adding a specialization for `int32` (the final datatype we support in PyTorch quantization), but it seemed to introduce numerical issues stemming from the difference in implementations:
https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L63
vs
https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtilsAvx2.cc#L82
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/120
Reviewed By: driazati
Differential Revision: D17115198
Pulled By: jamesr66a
fbshipit-source-id: 119145bb99235a7545389afa61483060200cc2b7
Diffstat (limited to 'include')
-rw-r--r-- | include/fbgemm/QuantUtilsAvx2.h | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/include/fbgemm/QuantUtilsAvx2.h b/include/fbgemm/QuantUtilsAvx2.h index 47f33a8..a001004 100644 --- a/include/fbgemm/QuantUtilsAvx2.h +++ b/include/fbgemm/QuantUtilsAvx2.h @@ -40,9 +40,10 @@ struct FBGEMM_API RequantizationParams { //////////////////////////////////////////////////////////////////////////////// // Utility functions +template <typename T=std::uint8_t> void QuantizeAvx2( const float* src, - std::uint8_t* dst, + T* dst, int len, const TensorQuantizationParams& qparams); |