Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/FBGEMM.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-19make sure cpuinfo_initialize called before fbgemmHasAvx2/512Support (#94)Jongsoo Park
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/94 If we don't call cpuinfo_initialize before hand, fbgemmHasAvx2/512Support will always return false. We should really careful about this. Reviewed By: jianyuh Differential Revision: D14994129 fbshipit-source-id: b78028f0543d05595caaa627be2feb743d0694b1
2019-03-13optimize requantize for float out processing (#85)Jongsoo Park
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/85 Optimizing performance of output processing when output is dequantized right away. Reviewed By: protonu Differential Revision: D14433141 fbshipit-source-id: f99a8d82000c43e554461acf036462a4e8f7e300
2019-02-13group conv optimized for 16 channels per group (#68)Jongsoo Park
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/68 Continuing optimizations for group convolution. Even though op-level speedup for 16 channels per group is lower compared to 4 or 8-channel cases, we have a nice overall speedup in resnext101-32x4d because it has many Conv operators with 16 channels per group. Reviewed By: protonu Differential Revision: D13949873 fbshipit-source-id: 1dff4b1acfdabe23616e7df365daf2b7f6e8aea9
2019-02-02gconv optimized for 8 channels per group (#65)Jongsoo Park
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/65 As title says Reviewed By: jianyuh Differential Revision: D13834287 fbshipit-source-id: ff174fdfcc27bcc227e435ff27e5c2a7024bf736
2019-02-01specialized requantization for gconv (#61)Jongsoo Park
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/61 requantization was the bottleneck of group conv with 4 channels per group. This diff implements a version of requantization specialized for group conv with 4 channels per group. TODO: generalize for different group conv Reviewed By: dskhudia Differential Revision: D13831466 fbshipit-source-id: 1ac7225d3133a2304c5b07730374584afc6ec259
2019-01-31optimize requantization remainder (#64)Jongsoo Park
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/64 Use mask instead of scalar code Reviewed By: dskhudia Differential Revision: D13893809 fbshipit-source-id: 8e33c85d65b2dcf0cdb8e92372c44dcc9bcf6824
2018-12-06avx2 intrinsic separation from OutputProcessing-inl.h (#38)Daya S Khudia
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/38 Moves intrinsics code from OutputProcessing-inl.h (included in Fbgemm.h) to src/QuantUtilsAvx2.cc Reviewed By: Maratyszcza Differential Revision: D13328841 fbshipit-source-id: 0a5c7b065ba9d69573390f3fbcd68df8d82827a0
2018-12-05clean up PackAWithQuantRowOffset from avx2 intrinsics (#36)Daya S Khudia
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/36 Isolate avx2 usage from quantization fusion with packing. Reviewed By: jianyuh Differential Revision: D13311108 fbshipit-source-id: 3be39aa9c84efb6b4f2cc06d7abcab97c232098b
2018-12-05avx2 specific code in a separate file for QuantUtils (#29)Daya S Khudia
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/29 avx2 code separation for QuantUtils Reviewed By: jianyuh Differential Revision: D13269041 fbshipit-source-id: df798cc0d93e0f2081cb832f4341fb2effa68294