github.com/marian-nmt/FBGEMM.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-09-25	fix linux build error	Young Jin Kim

2019-09-25	Fix windows build errors	Young Jin Kim

2019-09-25	Merge remote-tracking branch 'upstream/master' into youki/win-jit-debug-int8	Young Jin Kim
	Fix for windows build errors
2019-09-25	Fix jit code (AVX512) on windows	Young Jin Kim

2019-09-25	JIT code working on windows (AVX512)	Young Jin Kim

2019-09-24	remove template parameter from PackedDepthWiseConvMatrix (#128)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/128 We don't really need to have KERNEL_PROD as a compile time constant template parameter in PackedDepthWiseConvMatrix for performance. Removing the template parameter will make generalizing depth-wise convolution to non 3x3 cases easier. This diff only changes fbgemm while maintaining the old interface. The follow-up diff will change Caffe2 code using the old interface and remove the old interface. This diff also splits FbgemmI8DepthwiseAvx2.cc into FbgemmI8Depthwise3DAvx2.cc and PackDepthwiseConvMatrixAvx2.cc to avoid compilation timeouts in OSS build tests. Reviewed By: dskhudia Differential Revision: D17514003 fbshipit-source-id: 2214637ac0762a585f619f0035d3449cc4f7669e
2019-09-14	Minor changes in initialization of dilation (#126)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/126 Default value for dilation is in function definition itself. Reviewed By: protonu Differential Revision: D17371791 fbshipit-source-id: c3430dfa3faccf549dc066aa8dcd422b910dbcaa
2019-09-13	add missing instantiation for float bias for gconv (#127)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/127 float bias was going through a slow path. Adding a missing specialization. Reviewed By: protonu, jianyuh Differential Revision: D17346881 fbshipit-source-id: dd6b40d80c3c429b438ea6b4e1520b935e582c4a
2019-09-11	ReQuantization with FP32 bias	Daya Khudia
	Summary: There is an issue in eager mode if we quantize bias using input_scale*weight_scale. See the following doc. https://fb.quip.com/ru2eAqzsjwXc Reviewed By: jianyuh Differential Revision: D16948098 fbshipit-source-id: ff2c2bc560c2c14da1941d65a15c96e18f407569
2019-09-11	API changes to take unquantized bias for depthwise conv	Daya Khudia
	Summary: Changing interface for on the fly bias quantization Also adding code to quantize bias on the fly Reviewed By: jianyuh Differential Revision: D17099709 fbshipit-source-id: 5cca79189c00710e703044350260a9fcaca77bb3
2019-09-04	remove dw conv refs and use conv_ref instead (#122)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/122 To prepare depth-wise convolution other than 3x3. The existing reference depth-wise convolution is limited to 3x3 and we should reuse conv_ref implementation for easier maintenance. Reviewed By: dskhudia Differential Revision: D17176591 fbshipit-source-id: 9f6f90a801a0ad95091f1d085e66861f86c3a8f1
2019-09-04	Adding Support for dilations in the conv_param_t constructor	Protonu Basu
	Summary: (PART 1) Adding support for convolutions with dilation -- Modifications to the constructor Reviewed By: jianyuh Differential Revision: D17165387 fbshipit-source-id: e005c416683d9d40a4413f8aba1b5f21a7afc156
2019-08-29	int8 specialization for AVX2 Quantize routine (#120)	James Reed
	Summary: This adds a specialization for `int8` to the AVX2 `Quantize` routine. I tried also adding a specialization for `int32` (the final datatype we support in PyTorch quantization), but it seemed to introduce numerical issues stemming from the difference in implementations: https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L63 vs https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtilsAvx2.cc#L82 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/120 Reviewed By: driazati Differential Revision: D17115198 Pulled By: jamesr66a fbshipit-source-id: 119145bb99235a7545389afa61483060200cc2b7
2019-08-21	Per channel support in fbgemmConv (#119)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/119 Some paths in fbgemmConv had missing support for per channel quantization. Adding support for per channel as well as groupwise quantization support with this diff. Reviewed By: jianyuh Differential Revision: D16894740 fbshipit-source-id: 43a2c08d1c8d1b01775f875224774c39fae280bc
2019-08-15	Merge branch 'upstream/master' into youki/prepack_constrcopyPublic	Young Jin Kim

2019-08-12	fix error message (#117)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/117 Fixes error message with mismatching parameters. Before: ``` [FBGEMM_CONV_ERROR] Prepacked weights can't be used with these convolution parameters! ``` After ``` [FBGEMM_CONV_ERROR] Convolution parameters mismatch between pre-packed weights and conv invocation! stride [1, 1] vs [2, 1]; Please pack weights using the same parameters with which convolution operation is invoked! ``` Reviewed By: jianyuh Differential Revision: D16749007 fbshipit-source-id: 7a3083f2955b798ae28d25ce1963c7de63654551
2019-08-09	Integrate VNNI into FBGEMM master branch (#114)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/114 Adding the VNNI support in FBGEMM. Previously, we have the issue on CMake version. Currently PyTorch and FBGEMM OSS test has the CMake 3.5 test, while ASMJIT requires CMake to be 3.8+. This caused the build failure for some platforms. Now the CMake version issue is resolved by a PR to ASMJIT to downgrade the CMake requirement: https://github.com/asmjit/asmjit/pull/252. Reviewed By: dskhudia Differential Revision: D16720839 fbshipit-source-id: e5e5f2d26f924df8d9fb955f4a3758561fa73288
2019-08-09	Add unpack to PackedGemmMatrixFP16 (#112)	Yinghai Lu
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/112 We need to unpack the layout to support non-CPU arch. Reviewed By: jianyuh Differential Revision: D16584449 fbshipit-source-id: 309acaf8f2406e39d6975c0e9fef3e849a6d3950
2019-08-06	Back out "[fbgemm] Integrate VNNI into FBGEMM master branch"	Jianyu Huang
	Summary: Original commit changeset: fcaa13cc3159 ASMJIT requires the CMake version to be 3.8 However, FBGEMM and PyTorch only need the CMake version to be 3.5+. This caused the build failure in FBGEMM: https://circleci.com/gh/pytorch/FBGEMM/122#build-timing/containers/0 Reviewed By: dskhudia Differential Revision: D16670547 fbshipit-source-id: 506714c3db1cb82cf98895f58f82f235128f5285
2019-08-06	Integrate VNNI into FBGEMM master branch (#113)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/113 Adding the VNNI support in FBGEMM. Reviewed By: dskhudia Differential Revision: D16276574 fbshipit-source-id: 832ccdb27339489ebc138f3b2678e53d107c1b79
2019-08-02	Pass blocking param pointer into packedBufferSize() in PackBMatrix.cc	Mike Tsai
	Summary: Pass blocking params in to compute correct buffer size for each group. Fix the bug for this CONV shape: `conv_param_t<2>(1, 32, 16, {12, 14}, 4, {3, 3}, {1, 1}, {0, 0, 0, 0})` Corresponding M, N, K = 120, 4, 288 with these params: BlockingFactors params; params.MCB = 48; params.NCB = 16; params.KCB = 256; params.MR = 1; params.NR = 16; params.ROW_INTERLEAVE = 4; params.NR_MIN = 16; Reviewed By: jianyuh Differential Revision: D16571367 fbshipit-source-id: 27c9b003d37c4d3d13767227e8343d44668823d6
2019-08-01	Merge upstream master	Young Jin Kim

2019-08-01	adding a constructor for PackBMatrix with pre-packed data	Young Jin Kim

2019-07-19	Support pointwise with unified convolution interface as well (#108)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/108 Pointwise gets converted to direct GEMM Reviewed By: jianyuh Differential Revision: D16296356 fbshipit-source-id: 68c88df90e5de669bfcddf426c6488e2a04d55d6
2019-07-17	While calling fbgemmConv with packed weights, packed weights should be ↵	Daya Khudia
	compliant with convolution parameters Summary: This is to detect inadvertent calling for fbgemmConv with one set of conv parameters while packing was done with another set of parameters. Reviewed By: jspark1105 Differential Revision: D16269293 fbshipit-source-id: 9a166f5298d8246047e40fc880dd87e1037e0456
2019-07-16	Add functions needed for unpacking in PackWeightsForConv (#106)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/106 The values returned by these functions is needed while unpacking weights. Reviewed By: jianyuh Differential Revision: D16193425 fbshipit-source-id: 8ee3a0dc46768d7cb572bf383be1ce2b450c44c9
2019-07-16	unpack through unified convolution interface (#105)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/105 Support for calling unpack using unified interface for packing convolution weights Reviewed By: jianyuh Differential Revision: D16190534 fbshipit-source-id: daebd7b6d1846921232f8391c816e2f0678d813f
2019-07-10	Refactoring unpack weight function (#103)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/103 In the same spirit of D16085552, we do the following in this Diff: - Refactor the pack/unpack code for PackB: use the same ```pack_unpack_``` function for both ```pack``` and ```unpack``` function. - Add a unit test. Reviewed By: dskhudia Differential Revision: D16160767 fbshipit-source-id: 7fb7006750537b0705a180f2014c786298a1c615
2019-07-06	Unpack data for 3x3 (and 3x3x3) depthwise convolution	Daya Khudia
	Summary: unpack weight for 3x3 depthwise and 3x3x3 depthwise convolutions. Reviewed By: jspark1105 Differential Revision: D16076463 fbshipit-source-id: 767749c1a10caefef4c76c2c51323d1a3041621a
2019-07-06	Implement ::unpack() for PackWeightMatrixForGConv	Jaewon Lee
	Summary: Implement ::unpack() for PackWeightMatrixForGConv. Unpack index calculation is the inverse of ::pack(). Reviewed By: dskhudia Differential Revision: D16085552 fbshipit-source-id: b8866365dc425fee2cb985b3e48c627198ebc29a
2019-06-20	Per channel and groupwise quantization (#99)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/99 A function to do per channel and groupwise quantization Reviewed By: jspark1105 Differential Revision: D15567272 fbshipit-source-id: e2f326ea7c7463b5c47b3f590e003344a9e41960
2019-06-15	Update the logic of checking valid parameters.	Mike Tsai
	Summary: Add the check on NR_MIN and fix ymm/zmm register checks. Reviewed By: dskhudia Differential Revision: D15772144 fbshipit-source-id: 11e2c67fb3d47c5570b38ceaf9828ced0e60e65b
2019-06-15	Fix memory allocation bug	Young Jin Kim

2019-06-14	Improve some memroy allocation codes	Young Jin Kim

2019-06-13	Compile both on windows and linux	Young Jin Kim

2019-06-07	Remove duplicated header and undo some changes in D15399811	Daya Khudia
	Summary: Delete duplicated header Remove #ifndef and replace with pragma once. Reviewed By: jianyuh Differential Revision: D15669744 fbshipit-source-id: 8895f6c867e626ac5813a8952837435e76b09370
2019-06-05	Unified convolution interface	Daya Khudia
	Summary: We want to combine three different convolution interfaces under one top level function. Reviewed By: protonu Differential Revision: D15399811 fbshipit-source-id: 7390616d92783506fc156f0f6017f10b5f7f8e30
2019-06-04	Add quantized::fbgemm_linear_unpack operator for serialization (#97)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/97 Pull Request resolved: https://github.com/pytorch/pytorch/pull/20721 - FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix). - PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization. Reviewed By: zafartahirov Differential Revision: D15314568 fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
2019-04-02	Exposing tuning parameters in FBGEMM (MCB, NCB, KCB, MR, NR, Row Interleave) ↵	Protonu Basu
	(#90) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/90 Exposing tuning parameters in FBGEMM (MCB, NCB, KCB, MR, NR, Row Interleave) Reviewed By: dskhudia Differential Revision: D14358148 fbshipit-source-id: 783fb4653fd696dbbd4075ad56cb8682db3011a5
2019-03-21	Improves small N cases back to what they were	Daya S Khudia
	Summary: In D14507536 and D14516232 small N cases suffered if we increased the NR. This fixes those cases. Reviewed By: jianyuh Differential Revision: D14529494 fbshipit-source-id: 6f53797948de760d6ed24b767cbbe8d27768660f
2019-03-21	Allocate some registers for B matrix loading and reuse loaded results	Daya S Khudia
	Summary: Instead of loading B matrix values with every vpmaddubsw instruction, load once and reuse. The downside is we need to use some register for holding these B matrix values which could have been otherwise used for C accumulations. Reviewed By: jianyuh Differential Revision: D14529495 fbshipit-source-id: 54bd4bcdcf14ac2f25a433ac60bfc08b7359453f
2019-03-21	Further optimize acc16 kernel and cache blocking dimension for B matrix is ↵	Daya S Khudia
	now free to be autotuned (#88) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/88 acc16 version We have one more loop (over NR tiles in NCB block) in the generated assembly kernel. This change also frees NCB as an independent dimension that can be auto-tuned. Reviewed By: jianyuh Differential Revision: D14516232 fbshipit-source-id: f9bac9e7cdd3c89135d35a61c59a275c9a76562b
2019-03-21	Further optimize acc32 kernel and cache blocking dimension for B matrix is ↵	Daya S Khudia
	now free to be autotuned (#89) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/89 We have one more loop (over NR tiles in NCB block) in the generated assembly kernel. This change also frees NCB as an independent dimension that can be auto-tuned. ~~TODO: Similar changes for acc16 kernel. ~~ D14516232 Reviewed By: jspark1105 Differential Revision: D14507536 fbshipit-source-id: 6843fffdd0bcf9bb7cd0231163fbefd6e52d5bf7
2019-03-13	optimize requantize for float out processing (#85)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/85 Optimizing performance of output processing when output is dequantized right away. Reviewed By: protonu Differential Revision: D14433141 fbshipit-source-id: f99a8d82000c43e554461acf036462a4e8f7e300
2019-03-08	Fixes for FBGEMM FP16 performance (#82)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/82 This is a quick fix for matching FBGEMM FP16 performance with SKINNY GEMM FP16. Basically, this Diff switches the register layout in C accumulation buffer inside micro-kernel from MR * 1 to MR * 2. Check the reasons in T40816746. Reviewed By: zhengwy888 Differential Revision: D14278430 fbshipit-source-id: 961dd681deee69e2b7fec6bcdba7920e0b09134a
2019-03-06	Add Avx512BW/VL/DQ check (#84)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/84 Add AVX512BW Check: AVX-512 Byte and Word Instructions add support for for 8-bit and 16-bit integer operations such as vpmaddubsw. Similarly, add AVX512VL/DQ check. Reviewed By: jspark1105 Differential Revision: D14321050 fbshipit-source-id: bd34745fd488ce4efe3248aeb78c54e1c2d91d47
2019-03-01	Add documentations for the cache/register blocking parameters (#81)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/81 Add the documentations for choosing the current blocking parameters. Reviewed By: dskhudia Differential Revision: D14256809 fbshipit-source-id: e9a355e4611d6cb22791f2585313edc0d1b30ad2
2019-02-20	optimize PackAWithIm2Col for symmetric b quant	Jongsoo Park
	Summary: Add additional option b_symmetric and skip row offset computation if it's true Reviewed By: jianyuh Differential Revision: D14119128 fbshipit-source-id: fa079347562b7f75727b3a1414e9bdda3f9c65dd
2019-02-13	optimize gconv for b symmetric quantization (#70)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/70 Skip row offset computation if B_zero_point == 0 . Reviewed By: jianyuh Differential Revision: D14020675 fbshipit-source-id: 88a6e225671762c67afefc15538b79f879d125a6
2019-02-13	no need to subtract col offset if a_zp is 0 (#69)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/69 This diff prepares for D14013931 that folds column offsets into bias. In depthwise convolution, we allow passing column_offsets == nullptr which means column_offsets are folded into bias. We bypass adding column_offset * A_zero_point if either column_offset == nullptr or A_zero_point == 0 Reviewed By: jianyuh Differential Revision: D14017772 fbshipit-source-id: ad4a79402f43cbf78dbad68e1bff6d07c19dded0