Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/intgemm.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-04-20Rename and fix interfaceabsolute_stdNikolay Bogoychev
2020-04-20Rename and move the if outside the hot loopNikolay Bogoychev
2020-04-20Merge branch 'master' into absolute_stdNikolay Bogoychev
2020-04-20Fix OMP parallel wrap typing for ShiftKenneth Heafield
2020-04-20Workaround gcc bug producing extra move instructionsKenneth Heafield
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94663 Improvement ranges from 3% (1x64x8) to 35% (8x2048x256) and is often 21-25%. Benchmark program output: BEFORE AFTER Multiply 1 64 8 Samples=75 8-bit AVX512VNNI 64 65.4933 0.875698 8-bit AVX512VNNI 62 64.8533 1.36256 Multiply 8 256 256 Samples=75 8-bit AVX512VNNI 13296 13385.3 36.0012 8-bit AVX512VNNI 10754 10873.9 31.3479 Multiply 8 2048 256 Samples=75 8-bit AVX512VNNI 86800 86974.3 59.9597 8-bit AVX512VNNI 64222 65428.6 222.893 Multiply 8 256 2048 Samples=75 8-bit AVX512VNNI 106780 107392 232.955 8-bit AVX512VNNI 86176 88366.1 402.335 Multiply 320 256 256 Samples=75 8-bit AVX512VNNI 531720 533687 1419.3 8-bit AVX512VNNI 436536 437186 352.487 Multiply 472 256 256 Samples=75 8-bit AVX512VNNI 785026 787784 2068.05 8-bit AVX512VNNI 646240 647382 416.252 Multiply 248 256 256 Samples=75 8-bit AVX512VNNI 412282 413484 971.843 8-bit AVX512VNNI 338368 338656 141.354 Multiply 200 256 256 Samples=75 8-bit AVX512VNNI 332578 333463 742.297 8-bit AVX512VNNI 272890 273103 77.2789 Multiply 256 256 256 Samples=75 8-bit AVX512VNNI 425654 427240 1095.53 8-bit AVX512VNNI 349418 349580 80.8586 Multiply 512 512 512 Samples=75 8-bit AVX512VNNI 3122382 3.13179e+06 4215.88 8-bit AVX512VNNI 2493984 2.51602e+06 6052.1 Multiply 1024 1024 1024 Samples=3 8-bit AVX512VNNI 24927622 2.49795e+07 44940.9 8-bit AVX512VNNI 19210646 1.9229e+07 17037 Multiply 4096 4096 128 Samples=3 8-bit AVX512VNNI 49870840 4.99655e+07 133057 8-bit AVX512VNNI 46146812 4.62847e+07 205448
2020-04-13Juse use posix_memalign everywhereKenneth Heafield
2020-04-06Merge pull request #77 from kpuatamazon/masterKenneth Heafield
OMP parallelization for Multiply
2020-04-02Merge pull request #76 from kpu/static-loop-empty-iteratorMateusz Chudyk
Add support for empty iterator to static loop
2020-04-02Add support for empty iterator to static loopMateusz Chudyk
2020-03-27Merge pull request #74 from kpu/unify-reference-mulitplies-funsKenneth Heafield
Unify reference mulitplies funs
2020-03-27Add round_up functionMateusz Chudyk
2020-03-27Add missing static keyword in utils.hMateusz Chudyk
2020-03-27Unify references::MultiplyFF and references::MultiplyMateusz Chudyk
2020-03-26Add option for absolute value STDNikolay Bogoychev
2020-03-26Merge branch 'warning_fail' of https://github.com/kpu/intgemm into warning_failNikolay Bogoychev
2020-03-26Make it inlineNikolay Bogoychev
2020-03-26Add inlineKenneth Heafield
2020-03-26Merge remote-tracking branch 'origin/master' into warning_failNikolay Bogoychev
2020-03-26Merge branch 'master' of https://github.com/kpu/intgemmNikolay Bogoychev
2020-03-26do not use fake intrinsicsNikolay Bogoychev
2020-03-26Move QuantizerStd outside of the 8bitNikolay Bogoychev
2020-03-25Merge pull request #54 from kpu/stdQuantizerKenneth Heafield
Add standard deviation quantizer
2020-03-25Address commentsNikolay Bogoychev
2020-03-25merge WiPNikolay Bogoychev
2020-03-23OMP parallelization for MultiplyKenneth Heafield
2020-03-17Add tmmintrin.h headerKenneth Heafield
2020-03-17Merge pull request #71 from kpuatamazon/masterKenneth Heafield
Improve compiler support
2020-03-17Merge branch 'master' of github.com:kpu/intgemmKenneth Heafield
2020-03-08Change to INTGEMM_COMPILER_SUPPORTS_AVX512BW and update testKenneth Heafield
2020-03-06clang unused parameterKenneth Heafield
2020-03-06__clang__ instead of __CLANG__Kenneth Heafield
2020-03-06Change include for CPUIDKenneth Heafield
2020-03-06Fix kName for unsupported casesKenneth Heafield
2020-03-06Merge branch 'master' of github.com:kpu/intgemmKenneth Heafield
2020-03-06Fix assertKenneth Heafield
2020-03-06Fix OpenMP compilation on gcc 7.4.0Kenneth Heafield
The #pragma omp parallel is implemented by creating another function for the thread to launch. gcc 7.4.0 fails to carry the target attributes to that new function, so intrinstics were not working. Copying register types causes an internal compiler error. These for loops need constants initialized in registers like -127 and just moving the constants into the for loop was generating code that initializes them every iteration (no cross-loop constant extraction). The workaround is to split #pragma omp parallel to launch a function with target attributes which initializes the constants then does #pragma omp for to just divvy up the work.
2020-03-06fabsf to fix std::maxKenneth Heafield
2020-03-06Merge branch 'master' of github.com:kpu/intgemmKenneth Heafield
2020-03-05Make multiply_sat_test compile on older compilersKenneth Heafield
2020-03-05Change to option for OpenMPKenneth Heafield
2020-03-05Change CPUID dispatch for gcc and clangKenneth Heafield
2020-03-05clang warningKenneth Heafield
2020-03-05Incorrect assertKenneth Heafield
2020-03-05Merge branch 'master' of github.com:kpu/intgemmKenneth Heafield
2020-03-04Tiny tolerance for failing tests on newer compilersKenneth Heafield
2020-03-04Get at least the PrepareA tests workingKenneth Heafield
2020-03-03Cap unsigned quantized value no good reason?Kenneth Heafield
2020-03-03Use _mm512_mask_cvtusepi32_storeu_epi8Kenneth Heafield
2020-03-03Fix output address for quantizerKenneth Heafield
2020-03-03Fix compiler warning for compilers without VNNIKenneth Heafield