Age | Commit message (Collapse) | Author |
|
|
|
Summary: Removed unnecessary member variables, using sstream instead of strings.
Reviewed By: dskhudia
Differential Revision: D17134969
fbshipit-source-id: 147d0b39cde9edf5fb70762558e90dced5ba0ab1
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/123
Same as D16968373 but fixed the static initialization dependencies problem (https://isocpp.org/wiki/faq/ctors#static-init-order).
Reviewed By: dskhudia
Differential Revision: D17194751
fbshipit-source-id: 274f111996ab4f1c4386bd3b9ee8f3790739fdcd
|
|
among different threads.
Differential Revision:
D16968373
Original commit changeset: 22d66e50d9b3
fbshipit-source-id: 6163979bdb36cb0b1b95bfa1caeab67e7d23eee5
|
|
threads.
Summary: CodeCache is thread safe and ensures single creation of each microkernel. Uses a single jitRuntiume written to under a lock. The CodeHolder was removed from the class members as it is only a tmporary class, and can be created/destroyed on demand - no need to keep the metadata of the last generated microkernel.
Reviewed By: dskhudia
Differential Revision: D16968373
fbshipit-source-id: 22d66e50d9b3173c542e28daa322e7869eb52b14
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/114
Adding the VNNI support in FBGEMM.
Previously, we have the issue on CMake version. Currently PyTorch and FBGEMM OSS test has the CMake 3.5 test, while ASMJIT requires CMake to be 3.8+. This caused the build failure for some platforms. Now the CMake version issue is resolved by a PR to ASMJIT to downgrade the CMake requirement: https://github.com/asmjit/asmjit/pull/252.
Reviewed By: dskhudia
Differential Revision: D16720839
fbshipit-source-id: e5e5f2d26f924df8d9fb955f4a3758561fa73288
|
|
Summary:
Original commit changeset: fcaa13cc3159
ASMJIT requires the CMake version to be 3.8
However, FBGEMM and PyTorch only need the CMake version to be 3.5+.
This caused the build failure in FBGEMM:
https://circleci.com/gh/pytorch/FBGEMM/122#build-timing/containers/0
Reviewed By: dskhudia
Differential Revision: D16670547
fbshipit-source-id: 506714c3db1cb82cf98895f58f82f235128f5285
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/113
Adding the VNNI support in FBGEMM.
Reviewed By: dskhudia
Differential Revision: D16276574
fbshipit-source-id: 832ccdb27339489ebc138f3b2678e53d107c1b79
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/101
Some code cleanup:
- Both ```leadingDimCReg``` and ```leadingDimCRegAssign``` are used in ```GenerateKernelU8S8S32ACC32.c```. We should unify them to only use one variable name.
- Remove some redundant register variable ```asmjit::X86Ymm tmpReg = x86::ymm14;```.
Reviewed By: dskhudia
Differential Revision: D15673269
fbshipit-source-id: 81eb3673d0ff97391557413a13f1972561a1f2db
|
|
(#90)
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/90
Exposing tuning parameters in FBGEMM (MCB, NCB, KCB, MR, NR, Row Interleave)
Reviewed By: dskhudia
Differential Revision: D14358148
fbshipit-source-id: 783fb4653fd696dbbd4075ad56cb8682db3011a5
|
|
Summary: Dump generated kernels in files for debugging purposes.
Reviewed By: jianyuh
Differential Revision: D14449803
fbshipit-source-id: 58d2b5bc8402ef800a6eeaf573abd2a9ee4f95f4
|
|
multiple of NR
Summary:
Before this Diff:
we pass into the JIT kernel with nc = NCB ( packedB_.blockColSize() ) instead of nc = leftover size (packedB_.lastBcol() ) for the last block of B (diffusion/FBS/browse/master/fbcode/deeplearning/fbgemm/src/ExecuteKernelU8S8.cc;1adfe7977ef7ea2a1aee0ed785bd3fed5b7c4a20$102), which cause the additional computation when n is small.
After this Diff:
we pass into the JIT kernel with a small portion of NCB (still multiple of NR) for the last block of B.
The main performance gain is for Acc16, because NCB = 4 * NR for Acc16 and NCB = NR for Acc32 in our current settings (AVX2 and AVX512).
Reviewed By: jspark1105
Differential Revision: D14063628
fbshipit-source-id: 5829d06553daf617e2fefa7d26cb2d761af402c1
|
|
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/51
Use Clang formatting with "clang-format -i *.cc *.h".
Reviewed By: dskhudia
Differential Revision: D13532121
fbshipit-source-id: 6792d008f3295c128942f4896e8221aebbf2566e
|
|
Summary:
Build would fail on my mac because these TLS variables were being instantiated in both the regular and AVX512 compilation units of GenerateKernel*.cc. This moves them into the regular version, and just lets the AVX512 version link to those
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/43
Reviewed By: jianyuh
Differential Revision: D13362476
Pulled By: dskhudia
fbshipit-source-id: e15ae957a38533df0565a1262267616dbd4ad88f
|
|
|
|
|
|
|