Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/FBGEMM.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-09-25All functions are running well on windowsYoung Jin Kim
2019-09-16Small refactoring of FBGEMM GenerateKernel classAleks Zi
Summary: Removed unnecessary member variables, using sstream instead of strings. Reviewed By: dskhudia Differential Revision: D17134969 fbshipit-source-id: 147d0b39cde9edf5fb70762558e90dced5ba0ab1
2019-09-05CodeCache implemented with correct initialization of Static variables (#123)Aleks Zi
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/123 Same as D16968373 but fixed the static initialization dependencies problem (https://isocpp.org/wiki/faq/ctors#static-init-order). Reviewed By: dskhudia Differential Revision: D17194751 fbshipit-source-id: 274f111996ab4f1c4386bd3b9ee8f3790739fdcd
2019-09-05Revert D16968373: Introduced CodeCache container to share the microkernels ↵Edward Yang
among different threads. Differential Revision: D16968373 Original commit changeset: 22d66e50d9b3 fbshipit-source-id: 6163979bdb36cb0b1b95bfa1caeab67e7d23eee5
2019-09-04Introduced CodeCache container to share the microkernels among different ↵Aleks Zi
threads. Summary: CodeCache is thread safe and ensures single creation of each microkernel. Uses a single jitRuntiume written to under a lock. The CodeHolder was removed from the class members as it is only a tmporary class, and can be created/destroyed on demand - no need to keep the metadata of the last generated microkernel. Reviewed By: dskhudia Differential Revision: D16968373 fbshipit-source-id: 22d66e50d9b3173c542e28daa322e7869eb52b14
2019-08-09Integrate VNNI into FBGEMM master branch (#114)Jianyu Huang
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/114 Adding the VNNI support in FBGEMM. Previously, we have the issue on CMake version. Currently PyTorch and FBGEMM OSS test has the CMake 3.5 test, while ASMJIT requires CMake to be 3.8+. This caused the build failure for some platforms. Now the CMake version issue is resolved by a PR to ASMJIT to downgrade the CMake requirement: https://github.com/asmjit/asmjit/pull/252. Reviewed By: dskhudia Differential Revision: D16720839 fbshipit-source-id: e5e5f2d26f924df8d9fb955f4a3758561fa73288
2019-08-06Back out "[fbgemm] Integrate VNNI into FBGEMM master branch"Jianyu Huang
Summary: Original commit changeset: fcaa13cc3159 ASMJIT requires the CMake version to be 3.8 However, FBGEMM and PyTorch only need the CMake version to be 3.5+. This caused the build failure in FBGEMM: https://circleci.com/gh/pytorch/FBGEMM/122#build-timing/containers/0 Reviewed By: dskhudia Differential Revision: D16670547 fbshipit-source-id: 506714c3db1cb82cf98895f58f82f235128f5285
2019-08-06Integrate VNNI into FBGEMM master branch (#113)Jianyu Huang
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/113 Adding the VNNI support in FBGEMM. Reviewed By: dskhudia Differential Revision: D16276574 fbshipit-source-id: 832ccdb27339489ebc138f3b2678e53d107c1b79
2019-07-01Clean up some code for JIT code generator (#101)Jianyu Huang
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/101 Some code cleanup: - Both ```leadingDimCReg``` and ```leadingDimCRegAssign``` are used in ```GenerateKernelU8S8S32ACC32.c```. We should unify them to only use one variable name. - Remove some redundant register variable ```asmjit::X86Ymm tmpReg = x86::ymm14;```. Reviewed By: dskhudia Differential Revision: D15673269 fbshipit-source-id: 81eb3673d0ff97391557413a13f1972561a1f2db
2019-04-02Exposing tuning parameters in FBGEMM (MCB, NCB, KCB, MR, NR, Row Interleave) ↵Protonu Basu
(#90) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/90 Exposing tuning parameters in FBGEMM (MCB, NCB, KCB, MR, NR, Row Interleave) Reviewed By: dskhudia Differential Revision: D14358148 fbshipit-source-id: 783fb4653fd696dbbd4075ad56cb8682db3011a5
2019-03-19Dump generated kernels in filesDaya S Khudia
Summary: Dump generated kernels in files for debugging purposes. Reviewed By: jianyuh Differential Revision: D14449803 fbshipit-source-id: 58d2b5bc8402ef800a6eeaf573abd2a9ee4f95f4
2019-02-14JIT kernel should only handle a small portion of NCB for the last block: ↵Jianyu Huang
multiple of NR Summary: Before this Diff: we pass into the JIT kernel with nc = NCB ( packedB_.blockColSize() ) instead of nc = leftover size (packedB_.lastBcol() ) for the last block of B (diffusion/FBS/browse/master/fbcode/deeplearning/fbgemm/src/ExecuteKernelU8S8.cc;1adfe7977ef7ea2a1aee0ed785bd3fed5b7c4a20$102), which cause the additional computation when n is small. After this Diff: we pass into the JIT kernel with a small portion of NCB (still multiple of NR) for the last block of B. The main performance gain is for Acc16, because NCB = 4 * NR for Acc16 and NCB = NR for Acc32 in our current settings (AVX2 and AVX512). Reviewed By: jspark1105 Differential Revision: D14063628 fbshipit-source-id: 5829d06553daf617e2fefa7d26cb2d761af402c1
2018-12-21Update with clang format (#51)Jianyu Huang
Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/51 Use Clang formatting with "clang-format -i *.cc *.h". Reviewed By: dskhudia Differential Revision: D13532121 fbshipit-source-id: 6792d008f3295c128942f4896e8221aebbf2566e
2018-12-06Fix duplicate symbols for thread local member variables (#43)James Reed
Summary: Build would fail on my mac because these TLS variables were being instantiated in both the regular and AVX512 compilation units of GenerateKernel*.cc. This moves them into the regular version, and just lets the AVX512 version link to those Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/43 Reviewed By: jianyuh Differential Revision: D13362476 Pulled By: dskhudia fbshipit-source-id: e15ae957a38533df0565a1262267616dbd4ad88f
2018-11-08Sync with internal copy: Asymmetric padding; fbgemm2 -> fbgemmJianyu Huang
2018-11-04Syncing with internal version. Fixes for Mac/clang build. Other minor fixesdskhudia
2018-10-31Initial commitDaya S Khudia