Age | Commit message (Collapse) | Author |
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/135
fp16 GEMM was not using avx512 falling behind fp32 performance for large m cases.
This diff enables using avx512. Further tuning for register blocking size may be needed.
Longer term we would also need to use JIT'ing for fp16.
Reviewed By: dskhudia
Differential Revision: D17623727
fbshipit-source-id: 6605bcecf391141c457f257415b7ffb30d68fb29
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/133
Follow-up of D17515368 . Remove the old depthwise convolution interface in fbgemm.
Reviewed By: jianyuh
Differential Revision: D17515379
fbshipit-source-id: 73c34df4d4332064aaeded556f8e00f6d520d5e3
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/130
As title
Reviewed By: dskhudia
Differential Revision: D17515308
fbshipit-source-id: e92d0c68ec4933e3472c263b01cdfbba21583f82
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/131
Generalize depthwise convolution to support kernel shape other than 3x3.
For now only expose 5x5 interface additionally.
Reviewed By: dskhudia
Differential Revision: D17181878
fbshipit-source-id: f3940e703c977bd81b2f7afe2d7ede626bf35ced
|
|
|
|
|
|
Fix for windows build errors
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26692
Groupwise conv always in single thread mode.
ghstack-source-id: 90712828
Reviewed By: jspark1105
Differential Revision: D17560273
fbshipit-source-id: 221c37b4d94fda1d34d8335228dc8a53a40eab73
|
|
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/128
We don't really need to have KERNEL_PROD as a compile time constant template parameter in PackedDepthWiseConvMatrix for performance. Removing the template parameter will make generalizing depth-wise convolution to non 3x3 cases easier.
This diff only changes fbgemm while maintaining the old interface. The follow-up diff will change Caffe2 code using the old interface and remove the old interface.
This diff also splits FbgemmI8DepthwiseAvx2.cc into FbgemmI8Depthwise3DAvx2.cc and PackDepthwiseConvMatrixAvx2.cc to avoid compilation timeouts in OSS build tests.
Reviewed By: dskhudia
Differential Revision: D17514003
fbshipit-source-id: 2214637ac0762a585f619f0035d3449cc4f7669e
|
|
Summary: Small refactor of the avx2 acc32 generator
Reviewed By: dskhudia
Differential Revision: D17138005
fbshipit-source-id: 06ded92c5bebb35070a45578feb96e418f8d8489
|
|
Summary: Removed unnecessary member variables, using sstream instead of strings.
Reviewed By: dskhudia
Differential Revision: D17134969
fbshipit-source-id: 147d0b39cde9edf5fb70762558e90dced5ba0ab1
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/127
float bias was going through a slow path. Adding a missing specialization.
Reviewed By: protonu, jianyuh
Differential Revision: D17346881
fbshipit-source-id: dd6b40d80c3c429b438ea6b4e1520b935e582c4a
|
|
Summary: fbgemmPacked and fbgemmConv api changes to take float bias.
Reviewed By: jianyuh
Differential Revision: D17244262
fbshipit-source-id: 0531c829190d20e31cb957a3f1861d4a65645cee
|
|
Summary:
There is an issue in eager mode if we quantize bias using input_scale*weight_scale. See the following doc.
https://fb.quip.com/ru2eAqzsjwXc
Reviewed By: jianyuh
Differential Revision: D16948098
fbshipit-source-id: ff2c2bc560c2c14da1941d65a15c96e18f407569
|
|
Summary:
Changing interface for on the fly bias quantization
Also adding code to quantize bias on the fly
Reviewed By: jianyuh
Differential Revision: D17099709
fbshipit-source-id: 5cca79189c00710e703044350260a9fcaca77bb3
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25960
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/124
Reviewed By: dskhudia
Differential Revision: D17292372
fbshipit-source-id: 71a72f87b99c65b3b956bd8361694b1de05fc333
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/123
Same as D16968373 but fixed the static initialization dependencies problem (https://isocpp.org/wiki/faq/ctors#static-init-order).
Reviewed By: dskhudia
Differential Revision: D17194751
fbshipit-source-id: 274f111996ab4f1c4386bd3b9ee8f3790739fdcd
|
|
among different threads.
Differential Revision:
D16968373
Original commit changeset: 22d66e50d9b3
fbshipit-source-id: 6163979bdb36cb0b1b95bfa1caeab67e7d23eee5
|
|
Summary: Modifying PackAWithIm2Col to support dilated convolution and adding test cases
Reviewed By: dskhudia
Differential Revision: D17184638
fbshipit-source-id: e2935b1e1577505440019f732d03be630d1be040
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/122
To prepare depth-wise convolution other than 3x3.
The existing reference depth-wise convolution is limited to 3x3 and we should reuse conv_ref implementation for easier maintenance.
Reviewed By: dskhudia
Differential Revision: D17176591
fbshipit-source-id: 9f6f90a801a0ad95091f1d085e66861f86c3a8f1
|
|
threads.
Summary: CodeCache is thread safe and ensures single creation of each microkernel. Uses a single jitRuntiume written to under a lock. The CodeHolder was removed from the class members as it is only a tmporary class, and can be created/destroyed on demand - no need to keep the metadata of the last generated microkernel.
Reviewed By: dskhudia
Differential Revision: D16968373
fbshipit-source-id: 22d66e50d9b3173c542e28daa322e7869eb52b14
|
|
Summary: Modifying reference conv2d/3d, im2col2d.3d to support dilated convolutions
Reviewed By: dskhudia
Differential Revision: D17169707
fbshipit-source-id: f6862f79d9cf10f0b72df1b6feafc3d35ba7e5d5
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/121
By adding "// clang-format off" and "// clang-format on" we can still apply clang-format to these files.
Reviewed By: jianyuh
Differential Revision: D17159312
fbshipit-source-id: de523536df4c33f0efe332f9bc7b0290cdac1ba0
|
|
Summary:
This adds a specialization for `int8` to the AVX2 `Quantize` routine.
I tried also adding a specialization for `int32` (the final datatype we support in PyTorch quantization), but it seemed to introduce numerical issues stemming from the difference in implementations:
https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L63
vs
https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtilsAvx2.cc#L82
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/120
Reviewed By: driazati
Differential Revision: D17115198
Pulled By: jamesr66a
fbshipit-source-id: 119145bb99235a7545389afa61483060200cc2b7
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/119
Some paths in fbgemmConv had missing support for per channel quantization. Adding support for per channel as well as groupwise quantization support with this diff.
Reviewed By: jianyuh
Differential Revision: D16894740
fbshipit-source-id: 43a2c08d1c8d1b01775f875224774c39fae280bc
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/117
Fixes error message with mismatching parameters.
Before:
```
[FBGEMM_CONV_ERROR] Prepacked weights can't be used with these convolution parameters!
```
After
```
[FBGEMM_CONV_ERROR] Convolution parameters mismatch between pre-packed weights and conv invocation! stride [1, 1] vs [2, 1]; Please pack weights using the same parameters with which convolution operation is invoked!
```
Reviewed By: jianyuh
Differential Revision: D16749007
fbshipit-source-id: 7a3083f2955b798ae28d25ce1963c7de63654551
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/114
Adding the VNNI support in FBGEMM.
Previously, we have the issue on CMake version. Currently PyTorch and FBGEMM OSS test has the CMake 3.5 test, while ASMJIT requires CMake to be 3.8+. This caused the build failure for some platforms. Now the CMake version issue is resolved by a PR to ASMJIT to downgrade the CMake requirement: https://github.com/asmjit/asmjit/pull/252.
Reviewed By: dskhudia
Differential Revision: D16720839
fbshipit-source-id: e5e5f2d26f924df8d9fb955f4a3758561fa73288
|
|
Summary:
Original commit changeset: fcaa13cc3159
ASMJIT requires the CMake version to be 3.8
However, FBGEMM and PyTorch only need the CMake version to be 3.5+.
This caused the build failure in FBGEMM:
https://circleci.com/gh/pytorch/FBGEMM/122#build-timing/containers/0
Reviewed By: dskhudia
Differential Revision: D16670547
fbshipit-source-id: 506714c3db1cb82cf98895f58f82f235128f5285
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/113
Adding the VNNI support in FBGEMM.
Reviewed By: dskhudia
Differential Revision: D16276574
fbshipit-source-id: 832ccdb27339489ebc138f3b2678e53d107c1b79
|
|
Summary:
Pass blocking params in to compute correct buffer size for each group.
Fix the bug for this CONV shape:
`conv_param_t<2>(1, 32, 16, {12, 14}, 4, {3, 3}, {1, 1}, {0, 0, 0, 0})`
Corresponding M, N, K = 120, 4, 288
with these params:
BlockingFactors params;
params.MCB = 48;
params.NCB = 16;
params.KCB = 256;
params.MR = 1;
params.NR = 16;
params.ROW_INTERLEAVE = 4;
params.NR_MIN = 16;
Reviewed By: jianyuh
Differential Revision: D16571367
fbshipit-source-id: 27c9b003d37c4d3d13767227e8343d44668823d6
|
|
|
|
|
|
Summary: std::multiplier is not found.
Reviewed By: jspark1105
Differential Revision: D16373256
fbshipit-source-id: ae273a3f447f95e4b26d3f1a43e7ddad288b78ab
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/108
Pointwise gets converted to direct GEMM
Reviewed By: jianyuh
Differential Revision: D16296356
fbshipit-source-id: 68c88df90e5de669bfcddf426c6488e2a04d55d6
|
|
Summary: Add blocking params as argument of rowOffsetBufferSize() so the allocated vector will be sized correctlly.
Reviewed By: dskhudia, jianyuh
Differential Revision: D16348913
fbshipit-source-id: c70a05f2f69db3ce71ec2c27a8db4d143649ddd6
|
|
compliant with convolution parameters
Summary: This is to detect inadvertent calling for fbgemmConv with one set of conv parameters while packing was done with another set of parameters.
Reviewed By: jspark1105
Differential Revision: D16269293
fbshipit-source-id: 9a166f5298d8246047e40fc880dd87e1037e0456
|
|
Summary:
Changes to remove warnings when building FBGEMM in opt mode.
Cleanup to address initialization of MCB, KCB, NCBX
Reviewed By: jianyuh
Differential Revision: D16283443
fbshipit-source-id: 0829aee45ed1d262a18bcf4dd294393ef018a688
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/106
The values returned by these functions is needed while unpacking weights.
Reviewed By: jianyuh
Differential Revision: D16193425
fbshipit-source-id: 8ee3a0dc46768d7cb572bf383be1ce2b450c44c9
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/105
Support for calling unpack using unified interface for packing convolution weights
Reviewed By: jianyuh
Differential Revision: D16190534
fbshipit-source-id: daebd7b6d1846921232f8391c816e2f0678d813f
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/104
For consistency, we always assume that weights to PackWeightsForConv are in format K R S C/G, which is same as G K/G R S C/G
cc: Huihan Liu: Please note this change.
Reviewed By: jianyuh
Differential Revision: D16186932
fbshipit-source-id: 9ca2562f213d6b296ef8bd2eca1e5b6e98c436ec
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/103
In the same spirit of D16085552, we do the following in this Diff:
- Refactor the pack/unpack code for PackB: use the same ```pack_unpack_``` function for both ```pack``` and ```unpack``` function.
- Add a unit test.
Reviewed By: dskhudia
Differential Revision: D16160767
fbshipit-source-id: 7fb7006750537b0705a180f2014c786298a1c615
|
|
Summary: unpack weight for 3x3 depthwise and 3x3x3 depthwise convolutions.
Reviewed By: jspark1105
Differential Revision: D16076463
fbshipit-source-id: 767749c1a10caefef4c76c2c51323d1a3041621a
|
|
Summary: Implement ::unpack() for PackWeightMatrixForGConv. Unpack index calculation is the inverse of ::pack().
Reviewed By: dskhudia
Differential Revision: D16085552
fbshipit-source-id: b8866365dc425fee2cb985b3e48c627198ebc29a
|