Age | Commit message (Collapse) | Author |
|
|
|
|
|
Fix for windows build errors
|
|
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/128
We don't really need to have KERNEL_PROD as a compile time constant template parameter in PackedDepthWiseConvMatrix for performance. Removing the template parameter will make generalizing depth-wise convolution to non 3x3 cases easier.
This diff only changes fbgemm while maintaining the old interface. The follow-up diff will change Caffe2 code using the old interface and remove the old interface.
This diff also splits FbgemmI8DepthwiseAvx2.cc into FbgemmI8Depthwise3DAvx2.cc and PackDepthwiseConvMatrixAvx2.cc to avoid compilation timeouts in OSS build tests.
Reviewed By: dskhudia
Differential Revision: D17514003
fbshipit-source-id: 2214637ac0762a585f619f0035d3449cc4f7669e
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/126
Default value for dilation is in function definition itself.
Reviewed By: protonu
Differential Revision: D17371791
fbshipit-source-id: c3430dfa3faccf549dc066aa8dcd422b910dbcaa
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/127
float bias was going through a slow path. Adding a missing specialization.
Reviewed By: protonu, jianyuh
Differential Revision: D17346881
fbshipit-source-id: dd6b40d80c3c429b438ea6b4e1520b935e582c4a
|
|
Summary:
There is an issue in eager mode if we quantize bias using input_scale*weight_scale. See the following doc.
https://fb.quip.com/ru2eAqzsjwXc
Reviewed By: jianyuh
Differential Revision: D16948098
fbshipit-source-id: ff2c2bc560c2c14da1941d65a15c96e18f407569
|
|
Summary:
Changing interface for on the fly bias quantization
Also adding code to quantize bias on the fly
Reviewed By: jianyuh
Differential Revision: D17099709
fbshipit-source-id: 5cca79189c00710e703044350260a9fcaca77bb3
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/122
To prepare depth-wise convolution other than 3x3.
The existing reference depth-wise convolution is limited to 3x3 and we should reuse conv_ref implementation for easier maintenance.
Reviewed By: dskhudia
Differential Revision: D17176591
fbshipit-source-id: 9f6f90a801a0ad95091f1d085e66861f86c3a8f1
|
|
Summary: (PART 1) Adding support for convolutions with dilation -- Modifications to the constructor
Reviewed By: jianyuh
Differential Revision: D17165387
fbshipit-source-id: e005c416683d9d40a4413f8aba1b5f21a7afc156
|
|
Summary:
This adds a specialization for `int8` to the AVX2 `Quantize` routine.
I tried also adding a specialization for `int32` (the final datatype we support in PyTorch quantization), but it seemed to introduce numerical issues stemming from the difference in implementations:
https://github.com/pytorch/FBGEMM/blob/master/include/fbgemm/QuantUtils.h#L63
vs
https://github.com/pytorch/FBGEMM/blob/master/src/QuantUtilsAvx2.cc#L82
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/120
Reviewed By: driazati
Differential Revision: D17115198
Pulled By: jamesr66a
fbshipit-source-id: 119145bb99235a7545389afa61483060200cc2b7
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/119
Some paths in fbgemmConv had missing support for per channel quantization. Adding support for per channel as well as groupwise quantization support with this diff.
Reviewed By: jianyuh
Differential Revision: D16894740
fbshipit-source-id: 43a2c08d1c8d1b01775f875224774c39fae280bc
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/117
Fixes error message with mismatching parameters.
Before:
```
[FBGEMM_CONV_ERROR] Prepacked weights can't be used with these convolution parameters!
```
After
```
[FBGEMM_CONV_ERROR] Convolution parameters mismatch between pre-packed weights and conv invocation! stride [1, 1] vs [2, 1]; Please pack weights using the same parameters with which convolution operation is invoked!
```
Reviewed By: jianyuh
Differential Revision: D16749007
fbshipit-source-id: 7a3083f2955b798ae28d25ce1963c7de63654551
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/114
Adding the VNNI support in FBGEMM.
Previously, we have the issue on CMake version. Currently PyTorch and FBGEMM OSS test has the CMake 3.5 test, while ASMJIT requires CMake to be 3.8+. This caused the build failure for some platforms. Now the CMake version issue is resolved by a PR to ASMJIT to downgrade the CMake requirement: https://github.com/asmjit/asmjit/pull/252.
Reviewed By: dskhudia
Differential Revision: D16720839
fbshipit-source-id: e5e5f2d26f924df8d9fb955f4a3758561fa73288
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/112
We need to unpack the layout to support non-CPU arch.
Reviewed By: jianyuh
Differential Revision: D16584449
fbshipit-source-id: 309acaf8f2406e39d6975c0e9fef3e849a6d3950
|
|
Summary:
Original commit changeset: fcaa13cc3159
ASMJIT requires the CMake version to be 3.8
However, FBGEMM and PyTorch only need the CMake version to be 3.5+.
This caused the build failure in FBGEMM:
https://circleci.com/gh/pytorch/FBGEMM/122#build-timing/containers/0
Reviewed By: dskhudia
Differential Revision: D16670547
fbshipit-source-id: 506714c3db1cb82cf98895f58f82f235128f5285
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/113
Adding the VNNI support in FBGEMM.
Reviewed By: dskhudia
Differential Revision: D16276574
fbshipit-source-id: 832ccdb27339489ebc138f3b2678e53d107c1b79
|
|
Summary:
Pass blocking params in to compute correct buffer size for each group.
Fix the bug for this CONV shape:
`conv_param_t<2>(1, 32, 16, {12, 14}, 4, {3, 3}, {1, 1}, {0, 0, 0, 0})`
Corresponding M, N, K = 120, 4, 288
with these params:
BlockingFactors params;
params.MCB = 48;
params.NCB = 16;
params.KCB = 256;
params.MR = 1;
params.NR = 16;
params.ROW_INTERLEAVE = 4;
params.NR_MIN = 16;
Reviewed By: jianyuh
Differential Revision: D16571367
fbshipit-source-id: 27c9b003d37c4d3d13767227e8343d44668823d6
|
|
|
|
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/108
Pointwise gets converted to direct GEMM
Reviewed By: jianyuh
Differential Revision: D16296356
fbshipit-source-id: 68c88df90e5de669bfcddf426c6488e2a04d55d6
|
|
compliant with convolution parameters
Summary: This is to detect inadvertent calling for fbgemmConv with one set of conv parameters while packing was done with another set of parameters.
Reviewed By: jspark1105
Differential Revision: D16269293
fbshipit-source-id: 9a166f5298d8246047e40fc880dd87e1037e0456
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/106
The values returned by these functions is needed while unpacking weights.
Reviewed By: jianyuh
Differential Revision: D16193425
fbshipit-source-id: 8ee3a0dc46768d7cb572bf383be1ce2b450c44c9
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/105
Support for calling unpack using unified interface for packing convolution weights
Reviewed By: jianyuh
Differential Revision: D16190534
fbshipit-source-id: daebd7b6d1846921232f8391c816e2f0678d813f
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/103
In the same spirit of D16085552, we do the following in this Diff:
- Refactor the pack/unpack code for PackB: use the same ```pack_unpack_``` function for both ```pack``` and ```unpack``` function.
- Add a unit test.
Reviewed By: dskhudia
Differential Revision: D16160767
fbshipit-source-id: 7fb7006750537b0705a180f2014c786298a1c615
|
|
Summary: unpack weight for 3x3 depthwise and 3x3x3 depthwise convolutions.
Reviewed By: jspark1105
Differential Revision: D16076463
fbshipit-source-id: 767749c1a10caefef4c76c2c51323d1a3041621a
|
|
Summary: Implement ::unpack() for PackWeightMatrixForGConv. Unpack index calculation is the inverse of ::pack().
Reviewed By: dskhudia
Differential Revision: D16085552
fbshipit-source-id: b8866365dc425fee2cb985b3e48c627198ebc29a
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/99
A function to do per channel and groupwise quantization
Reviewed By: jspark1105
Differential Revision: D15567272
fbshipit-source-id: e2f326ea7c7463b5c47b3f590e003344a9e41960
|
|
Summary: Add the check on NR_MIN and fix ymm/zmm register checks.
Reviewed By: dskhudia
Differential Revision: D15772144
fbshipit-source-id: 11e2c67fb3d47c5570b38ceaf9828ced0e60e65b
|
|
|
|
|
|
|
|
Summary:
Delete duplicated header
Remove #ifndef and replace with pragma once.
Reviewed By: jianyuh
Differential Revision: D15669744
fbshipit-source-id: 8895f6c867e626ac5813a8952837435e76b09370
|
|
Summary: We want to combine three different convolution interfaces under one top level function.
Reviewed By: protonu
Differential Revision: D15399811
fbshipit-source-id: 7390616d92783506fc156f0f6017f10b5f7f8e30
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/97
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20721
- FBGEMM: Add unpack function for PackBMatrix class: Unpack pmat buffer to the origin_buf (Used for the serialization to recover weight matrix).
- PyTorch Quantizer: Add quantized::fbgemm_linear_unpack operator for serialization.
Reviewed By: zafartahirov
Differential Revision: D15314568
fbshipit-source-id: 12080c8887ce31dc849d23e132ae1766ac319407
|
|
(#90)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/90
Exposing tuning parameters in FBGEMM (MCB, NCB, KCB, MR, NR, Row Interleave)
Reviewed By: dskhudia
Differential Revision: D14358148
fbshipit-source-id: 783fb4653fd696dbbd4075ad56cb8682db3011a5
|
|
Summary: In D14507536 and D14516232 small N cases suffered if we increased the NR. This fixes those cases.
Reviewed By: jianyuh
Differential Revision: D14529494
fbshipit-source-id: 6f53797948de760d6ed24b767cbbe8d27768660f
|
|
Summary: Instead of loading B matrix values with every vpmaddubsw instruction, load once and reuse. The downside is we need to use some register for holding these B matrix values which could have been otherwise used for C accumulations.
Reviewed By: jianyuh
Differential Revision: D14529495
fbshipit-source-id: 54bd4bcdcf14ac2f25a433ac60bfc08b7359453f
|
|
now free to be autotuned (#88)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/88
acc16 version
We have one more loop (over NR tiles in NCB block) in the generated assembly kernel. This change also frees NCB as an independent dimension that can be auto-tuned.
Reviewed By: jianyuh
Differential Revision: D14516232
fbshipit-source-id: f9bac9e7cdd3c89135d35a61c59a275c9a76562b
|
|
now free to be autotuned (#89)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/89
We have one more loop (over NR tiles in NCB block) in the generated assembly kernel. This change also frees NCB as an independent dimension that can be auto-tuned.
~~TODO: Similar changes for acc16 kernel. ~~ D14516232
Reviewed By: jspark1105
Differential Revision: D14507536
fbshipit-source-id: 6843fffdd0bcf9bb7cd0231163fbefd6e52d5bf7
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/85
Optimizing performance of output processing when output is dequantized right away.
Reviewed By: protonu
Differential Revision: D14433141
fbshipit-source-id: f99a8d82000c43e554461acf036462a4e8f7e300
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/82
This is a quick fix for matching FBGEMM FP16 performance with SKINNY GEMM FP16.
Basically, this Diff switches the register layout in C accumulation buffer inside micro-kernel from MR * 1 to MR * 2. Check the reasons in T40816746.
Reviewed By: zhengwy888
Differential Revision: D14278430
fbshipit-source-id: 961dd681deee69e2b7fec6bcdba7920e0b09134a
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/84
Add AVX512BW Check:
AVX-512 Byte and Word Instructions add support for for 8-bit and 16-bit integer operations such as vpmaddubsw.
Similarly, add AVX512VL/DQ check.
Reviewed By: jspark1105
Differential Revision: D14321050
fbshipit-source-id: bd34745fd488ce4efe3248aeb78c54e1c2d91d47
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/81
Add the documentations for choosing the current blocking parameters.
Reviewed By: dskhudia
Differential Revision: D14256809
fbshipit-source-id: e9a355e4611d6cb22791f2585313edc0d1b30ad2
|
|
Summary: Add additional option b_symmetric and skip row offset computation if it's true
Reviewed By: jianyuh
Differential Revision: D14119128
fbshipit-source-id: fa079347562b7f75727b3a1414e9bdda3f9c65dd
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/70
Skip row offset computation if B_zero_point == 0 .
Reviewed By: jianyuh
Differential Revision: D14020675
fbshipit-source-id: 88a6e225671762c67afefc15538b79f879d125a6
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/69
This diff prepares for D14013931 that folds column offsets into bias.
In depthwise convolution, we allow passing column_offsets == nullptr which means column_offsets are folded into bias. We bypass adding column_offset * A_zero_point if either column_offset == nullptr or A_zero_point == 0
Reviewed By: jianyuh
Differential Revision: D14017772
fbshipit-source-id: ad4a79402f43cbf78dbad68e1bff6d07c19dded0
|