github.com/marian-nmt/FBGEMM.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-12-07	Fixing merge erroryouki/fp16avx512	Young Jin Kim

2019-09-30	fp16 gemm using avx512 (#135)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/135 fp16 GEMM was not using avx512 falling behind fp32 performance for large m cases. This diff enables using avx512. Further tuning for register blocking size may be needed. Longer term we would also need to use JIT'ing for fp16. Reviewed By: dskhudia Differential Revision: D17623727 fbshipit-source-id: 6605bcecf391141c457f257415b7ffb30d68fb29
2019-09-27	remove old depthwise conv interface (#133)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/133 Follow-up of D17515368 . Remove the old depthwise convolution interface in fbgemm. Reviewed By: jianyuh Differential Revision: D17515379 fbshipit-source-id: 73c34df4d4332064aaeded556f8e00f6d520d5e3
2019-09-26	Linux memory fix	Young Jin Kim

2019-09-26	5x5 depthwise conv calling from unified conv interface (#130)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/130 As title Reviewed By: dskhudia Differential Revision: D17515308 fbshipit-source-id: e92d0c68ec4933e3472c263b01cdfbba21583f82
2019-09-26	debugging linux unit tests	Young Jin Kim

2019-09-26	5x5 depthwise convolution (#131)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/131 Generalize depthwise convolution to support kernel shape other than 3x3. For now only expose 5x5 interface additionally. Reviewed By: dskhudia Differential Revision: D17181878 fbshipit-source-id: f3940e703c977bd81b2f7afe2d7ede626bf35ced
2019-09-25	All functions are running well on windows	Young Jin Kim

2019-09-25	Fix windows build errors	Young Jin Kim

2019-09-25	Merge remote-tracking branch 'upstream/master' into youki/win-jit-debug-int8	Young Jin Kim
	Fix for windows build errors
2019-09-25	Execute groupwise in single thread mode (#26692)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26692 Groupwise conv always in single thread mode. ghstack-source-id: 90712828 Reviewed By: jspark1105 Differential Revision: D17560273 fbshipit-source-id: 221c37b4d94fda1d34d8335228dc8a53a40eab73
2019-09-25	Fix jit code (AVX512) on windows	Young Jin Kim

2019-09-25	JIT code working on windows (AVX512)	Young Jin Kim

2019-09-24	remove template parameter from PackedDepthWiseConvMatrix (#128)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/128 We don't really need to have KERNEL_PROD as a compile time constant template parameter in PackedDepthWiseConvMatrix for performance. Removing the template parameter will make generalizing depth-wise convolution to non 3x3 cases easier. This diff only changes fbgemm while maintaining the old interface. The follow-up diff will change Caffe2 code using the old interface and remove the old interface. This diff also splits FbgemmI8DepthwiseAvx2.cc into FbgemmI8Depthwise3DAvx2.cc and PackDepthwiseConvMatrixAvx2.cc to avoid compilation timeouts in OSS build tests. Reviewed By: dskhudia Differential Revision: D17514003 fbshipit-source-id: 2214637ac0762a585f619f0035d3449cc4f7669e
2019-09-16	A bit more refactoring	Aleks Zi
	Summary: Small refactor of the avx2 acc32 generator Reviewed By: dskhudia Differential Revision: D17138005 fbshipit-source-id: 06ded92c5bebb35070a45578feb96e418f8d8489
2019-09-16	Small refactoring of FBGEMM GenerateKernel class	Aleks Zi
	Summary: Removed unnecessary member variables, using sstream instead of strings. Reviewed By: dskhudia Differential Revision: D17134969 fbshipit-source-id: 147d0b39cde9edf5fb70762558e90dced5ba0ab1
2019-09-13	add missing instantiation for float bias for gconv (#127)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/127 float bias was going through a slow path. Adding a missing specialization. Reviewed By: protonu, jianyuh Differential Revision: D17346881 fbshipit-source-id: dd6b40d80c3c429b438ea6b4e1520b935e582c4a
2019-09-11	fbgemmPacked and fbgemmConv apis with float bias + tests	Daya Khudia
	Summary: fbgemmPacked and fbgemmConv api changes to take float bias. Reviewed By: jianyuh Differential Revision: D17244262 fbshipit-source-id: 0531c829190d20e31cb957a3f1861d4a65645cee
2019-09-11	ReQuantization with FP32 bias	Daya Khudia
	Summary: There is an issue in eager mode if we quantize bias using input_scale*weight_scale. See the following doc. https://fb.quip.com/ru2eAqzsjwXc Reviewed By: jianyuh Differential Revision: D16948098 fbshipit-source-id: ff2c2bc560c2c14da1941d65a15c96e18f407569
2019-09-11	API changes to take unquantized bias for depthwise conv	Daya Khudia
	Summary: Changing interface for on the fly bias quantization Also adding code to quantize bias on the fly Reviewed By: jianyuh Differential Revision: D17099709 fbshipit-source-id: 5cca79189c00710e703044350260a9fcaca77bb3
2019-09-11	Add assert to ensure the divisor is not 0 (#25960)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25960 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/124 Reviewed By: dskhudia Differential Revision: D17292372 fbshipit-source-id: 71a72f87b99c65b3b956bd8361694b1de05fc333
2019-09-05	CodeCache implemented with correct initialization of Static variables (#123)	Aleks Zi
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/123 Same as D16968373 but fixed the static initialization dependencies problem (https://isocpp.org/wiki/faq/ctors#static-init-order). Reviewed By: dskhudia Differential Revision: D17194751 fbshipit-source-id: 274f111996ab4f1c4386bd3b9ee8f3790739fdcd
2019-09-05	Revert D16968373: Introduced CodeCache container to share the microkernels ↵	Edward Yang
	among different threads. Differential Revision: D16968373 Original commit changeset: 22d66e50d9b3 fbshipit-source-id: 6163979bdb36cb0b1b95bfa1caeab67e7d23eee5
2019-09-05	Modifying PackAWithIm2Col to support dilated convolution and adding test cases	Protonu Basu
	Summary: Modifying PackAWithIm2Col to support dilated convolution and adding test cases Reviewed By: dskhudia Differential Revision: D17184638 fbshipit-source-id: e2935b1e1577505440019f732d03be630d1be040
2019-09-04	remove dw conv refs and use conv_ref instead (#122)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/122 To prepare depth-wise convolution other than 3x3. The existing reference depth-wise convolution is limited to 3x3 and we should reuse conv_ref implementation for easier maintenance. Reviewed By: dskhudia Differential Revision: D17176591 fbshipit-source-id: 9f6f90a801a0ad95091f1d085e66861f86c3a8f1
2019-09-04	Introduced CodeCache container to share the microkernels among different ↵	Aleks Zi
	threads. Summary: CodeCache is thread safe and ensures single creation of each microkernel. Uses a single jitRuntiume written to under a lock. The CodeHolder was removed from the class members as it is only a tmporary class, and can be created/destroyed on demand - no need to keep the metadata of the last generated microkernel. Reviewed By: dskhudia Differential Revision: D16968373 fbshipit-source-id: 22d66e50d9b3173c542e28daa322e7869eb52b14
2019-09-04	Modifying reference conv2d/3d, im2col2d.3d to support dilated convolutions	Protonu Basu
	Summary: Modifying reference conv2d/3d, im2col2d.3d to support dilated convolutions Reviewed By: dskhudia Differential Revision: D17169707 fbshipit-source-id: f6862f79d9cf10f0b72df1b6feafc3d35ba7e5d5
2019-09-03	disable clang formatting in a few array definitions (#121)	Jongsoo Park
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/121 By adding "// clang-format off" and "// clang-format on" we can still apply clang-format to these files. Reviewed By: jianyuh Differential Revision: D17159312 fbshipit-source-id: de523536df4c33f0efe332f9bc7b0290cdac1ba0
2019-08-29	int8 specialization for AVX2 Quantize routine (#120)	James Reed
	Summary: This adds a specialization for `int8` to the AVX2 `Quantize` routine. I tried also adding a specialization for `int32` (the final datatype we support in PyTorch quantization), but it seemed to introduce numerical issues stemming from the difference in implementations: https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L63 vs https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtilsAvx2.cc#L82 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/120 Reviewed By: driazati Differential Revision: D17115198 Pulled By: jamesr66a fbshipit-source-id: 119145bb99235a7545389afa61483060200cc2b7
2019-08-21	Per channel support in fbgemmConv (#119)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/119 Some paths in fbgemmConv had missing support for per channel quantization. Adding support for per channel as well as groupwise quantization support with this diff. Reviewed By: jianyuh Differential Revision: D16894740 fbshipit-source-id: 43a2c08d1c8d1b01775f875224774c39fae280bc
2019-08-15	Merge branch 'upstream/master' into youki/prepack_constrcopyPublic	Young Jin Kim

2019-08-12	fix error message (#117)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/117 Fixes error message with mismatching parameters. Before: ``` [FBGEMM_CONV_ERROR] Prepacked weights can't be used with these convolution parameters! ``` After ``` [FBGEMM_CONV_ERROR] Convolution parameters mismatch between pre-packed weights and conv invocation! stride [1, 1] vs [2, 1]; Please pack weights using the same parameters with which convolution operation is invoked! ``` Reviewed By: jianyuh Differential Revision: D16749007 fbshipit-source-id: 7a3083f2955b798ae28d25ce1963c7de63654551
2019-08-09	Integrate VNNI into FBGEMM master branch (#114)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/114 Adding the VNNI support in FBGEMM. Previously, we have the issue on CMake version. Currently PyTorch and FBGEMM OSS test has the CMake 3.5 test, while ASMJIT requires CMake to be 3.8+. This caused the build failure for some platforms. Now the CMake version issue is resolved by a PR to ASMJIT to downgrade the CMake requirement: https://github.com/asmjit/asmjit/pull/252. Reviewed By: dskhudia Differential Revision: D16720839 fbshipit-source-id: e5e5f2d26f924df8d9fb955f4a3758561fa73288
2019-08-06	Back out "[fbgemm] Integrate VNNI into FBGEMM master branch"	Jianyu Huang
	Summary: Original commit changeset: fcaa13cc3159 ASMJIT requires the CMake version to be 3.8 However, FBGEMM and PyTorch only need the CMake version to be 3.5+. This caused the build failure in FBGEMM: https://circleci.com/gh/pytorch/FBGEMM/122#build-timing/containers/0 Reviewed By: dskhudia Differential Revision: D16670547 fbshipit-source-id: 506714c3db1cb82cf98895f58f82f235128f5285
2019-08-06	Integrate VNNI into FBGEMM master branch (#113)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/113 Adding the VNNI support in FBGEMM. Reviewed By: dskhudia Differential Revision: D16276574 fbshipit-source-id: 832ccdb27339489ebc138f3b2678e53d107c1b79
2019-08-02	Pass blocking param pointer into packedBufferSize() in PackBMatrix.cc	Mike Tsai
	Summary: Pass blocking params in to compute correct buffer size for each group. Fix the bug for this CONV shape: `conv_param_t<2>(1, 32, 16, {12, 14}, 4, {3, 3}, {1, 1}, {0, 0, 0, 0})` Corresponding M, N, K = 120, 4, 288 with these params: BlockingFactors params; params.MCB = 48; params.NCB = 16; params.KCB = 256; params.MR = 1; params.NR = 16; params.ROW_INTERLEAVE = 4; params.NR_MIN = 16; Reviewed By: jianyuh Differential Revision: D16571367 fbshipit-source-id: 27c9b003d37c4d3d13767227e8343d44668823d6
2019-08-01	Merge upstream master	Young Jin Kim

2019-08-01	adding a constructor for PackBMatrix with pre-packed data	Young Jin Kim

2019-07-19	Fix fbgemm OSS failure	Jianyu Huang
	Summary: std::multiplier is not found. Reviewed By: jspark1105 Differential Revision: D16373256 fbshipit-source-id: ae273a3f447f95e4b26d3f1a43e7ddad288b78ab
2019-07-19	Support pointwise with unified convolution interface as well (#108)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/108 Pointwise gets converted to direct GEMM Reviewed By: jianyuh Differential Revision: D16296356 fbshipit-source-id: 68c88df90e5de669bfcddf426c6488e2a04d55d6
2019-07-18	Fix missing blocking params in conv im2col code path.	Mike Tsai
	Summary: Add blocking params as argument of rowOffsetBufferSize() so the allocated vector will be sized correctlly. Reviewed By: dskhudia, jianyuh Differential Revision: D16348913 fbshipit-source-id: c70a05f2f69db3ce71ec2c27a8db4d143649ddd6
2019-07-17	While calling fbgemmConv with packed weights, packed weights should be ↵	Daya Khudia
	compliant with convolution parameters Summary: This is to detect inadvertent calling for fbgemmConv with one set of conv parameters while packing was done with another set of parameters. Reviewed By: jspark1105 Differential Revision: D16269293 fbshipit-source-id: 9a166f5298d8246047e40fc880dd87e1037e0456
2019-07-16	changes to remove warnings when building in opt mode	Protonu Basu
	Summary: Changes to remove warnings when building FBGEMM in opt mode. Cleanup to address initialization of MCB, KCB, NCBX Reviewed By: jianyuh Differential Revision: D16283443 fbshipit-source-id: 0829aee45ed1d262a18bcf4dd294393ef018a688
2019-07-16	Add functions needed for unpacking in PackWeightsForConv (#106)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/106 The values returned by these functions is needed while unpacking weights. Reviewed By: jianyuh Differential Revision: D16193425 fbshipit-source-id: 8ee3a0dc46768d7cb572bf383be1ce2b450c44c9
2019-07-16	unpack through unified convolution interface (#105)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/105 Support for calling unpack using unified interface for packing convolution weights Reviewed By: jianyuh Differential Revision: D16190534 fbshipit-source-id: daebd7b6d1846921232f8391c816e2f0678d813f
2019-07-16	Assume input weights to be in transposed format for convUnified (#104)	Daya Khudia
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/104 For consistency, we always assume that weights to PackWeightsForConv are in format K R S C/G, which is same as G K/G R S C/G cc: Huihan Liu: Please note this change. Reviewed By: jianyuh Differential Revision: D16186932 fbshipit-source-id: 9ca2562f213d6b296ef8bd2eca1e5b6e98c436ec
2019-07-11	Add cpuinfo_initialize() into the isa query functions	Young Jin Kim

2019-07-10	Refactoring unpack weight function (#103)	Jianyu Huang
	Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/103 In the same spirit of D16085552, we do the following in this Diff: - Refactor the pack/unpack code for PackB: use the same ```pack_unpack_``` function for both ```pack``` and ```unpack``` function. - Add a unit test. Reviewed By: dskhudia Differential Revision: D16160767 fbshipit-source-id: 7fb7006750537b0705a180f2014c786298a1c615
2019-07-06	Unpack data for 3x3 (and 3x3x3) depthwise convolution	Daya Khudia
	Summary: unpack weight for 3x3 depthwise and 3x3x3 depthwise convolutions. Reviewed By: jspark1105 Differential Revision: D16076463 fbshipit-source-id: 767749c1a10caefef4c76c2c51323d1a3041621a
2019-07-06	Implement ::unpack() for PackWeightMatrixForGConv	Jaewon Lee
	Summary: Implement ::unpack() for PackWeightMatrixForGConv. Unpack index calculation is the inverse of ::pack(). Reviewed By: dskhudia Differential Revision: D16085552 fbshipit-source-id: b8866365dc425fee2cb985b3e48c627198ebc29a