Age | Commit message (Collapse) | Author |
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/94
If we don't call cpuinfo_initialize before hand, fbgemmHasAvx2/512Support will always return false. We should really careful about this.
Reviewed By: jianyuh
Differential Revision: D14994129
fbshipit-source-id: b78028f0543d05595caaa627be2feb743d0694b1
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/85
Optimizing performance of output processing when output is dequantized right away.
Reviewed By: protonu
Differential Revision: D14433141
fbshipit-source-id: f99a8d82000c43e554461acf036462a4e8f7e300
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/68
Continuing optimizations for group convolution. Even though op-level speedup for 16 channels per group is lower compared to 4 or 8-channel cases, we have a nice overall speedup in resnext101-32x4d because it has many Conv operators with 16 channels per group.
Reviewed By: protonu
Differential Revision: D13949873
fbshipit-source-id: 1dff4b1acfdabe23616e7df365daf2b7f6e8aea9
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/65
As title says
Reviewed By: jianyuh
Differential Revision: D13834287
fbshipit-source-id: ff174fdfcc27bcc227e435ff27e5c2a7024bf736
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/61
requantization was the bottleneck of group conv with 4 channels per group. This diff implements a version of requantization specialized for group conv with 4 channels per group.
TODO: generalize for different group conv
Reviewed By: dskhudia
Differential Revision: D13831466
fbshipit-source-id: 1ac7225d3133a2304c5b07730374584afc6ec259
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/64
Use mask instead of scalar code
Reviewed By: dskhudia
Differential Revision: D13893809
fbshipit-source-id: 8e33c85d65b2dcf0cdb8e92372c44dcc9bcf6824
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/38
Moves intrinsics code from OutputProcessing-inl.h (included in Fbgemm.h) to src/QuantUtilsAvx2.cc
Reviewed By: Maratyszcza
Differential Revision: D13328841
fbshipit-source-id: 0a5c7b065ba9d69573390f3fbcd68df8d82827a0
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/36
Isolate avx2 usage from quantization fusion with packing.
Reviewed By: jianyuh
Differential Revision: D13311108
fbshipit-source-id: 3be39aa9c84efb6b4f2cc06d7abcab97c232098b
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/29
avx2 code separation for QuantUtils
Reviewed By: jianyuh
Differential Revision: D13269041
fbshipit-source-id: df798cc0d93e0f2081cb832f4341fb2effa68294
|