Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-04-20 | Fix XXXCustomTile functionsmultiply-tiling-8x | Mateusz Chudyk | |
2020-04-20 | Merge remote-tracking branch 'origin/master' into multiply-tiling-8x | Mateusz Chudyk | |
2020-04-20 | Fix OMP parallel wrap typing for Shift | Kenneth Heafield | |
2020-04-20 | Workaround gcc bug producing extra move instructions | Kenneth Heafield | |
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94663 Improvement ranges from 3% (1x64x8) to 35% (8x2048x256) and is often 21-25%. Benchmark program output: BEFORE AFTER Multiply 1 64 8 Samples=75 8-bit AVX512VNNI 64 65.4933 0.875698 8-bit AVX512VNNI 62 64.8533 1.36256 Multiply 8 256 256 Samples=75 8-bit AVX512VNNI 13296 13385.3 36.0012 8-bit AVX512VNNI 10754 10873.9 31.3479 Multiply 8 2048 256 Samples=75 8-bit AVX512VNNI 86800 86974.3 59.9597 8-bit AVX512VNNI 64222 65428.6 222.893 Multiply 8 256 2048 Samples=75 8-bit AVX512VNNI 106780 107392 232.955 8-bit AVX512VNNI 86176 88366.1 402.335 Multiply 320 256 256 Samples=75 8-bit AVX512VNNI 531720 533687 1419.3 8-bit AVX512VNNI 436536 437186 352.487 Multiply 472 256 256 Samples=75 8-bit AVX512VNNI 785026 787784 2068.05 8-bit AVX512VNNI 646240 647382 416.252 Multiply 248 256 256 Samples=75 8-bit AVX512VNNI 412282 413484 971.843 8-bit AVX512VNNI 338368 338656 141.354 Multiply 200 256 256 Samples=75 8-bit AVX512VNNI 332578 333463 742.297 8-bit AVX512VNNI 272890 273103 77.2789 Multiply 256 256 256 Samples=75 8-bit AVX512VNNI 425654 427240 1095.53 8-bit AVX512VNNI 349418 349580 80.8586 Multiply 512 512 512 Samples=75 8-bit AVX512VNNI 3122382 3.13179e+06 4215.88 8-bit AVX512VNNI 2493984 2.51602e+06 6052.1 Multiply 1024 1024 1024 Samples=3 8-bit AVX512VNNI 24927622 2.49795e+07 44940.9 8-bit AVX512VNNI 19210646 1.9229e+07 17037 Multiply 4096 4096 128 Samples=3 8-bit AVX512VNNI 49870840 4.99655e+07 133057 8-bit AVX512VNNI 46146812 4.62847e+07 205448 | |||
2020-04-17 | Merge remote-tracking branch 'origin/master' into multiply-tiling-8x | Mateusz Chudyk | |
2020-04-16 | Add PrepareBCustomTile | Mateusz Chudyk | |
2020-04-16 | Add Multiply[8Shit]CustomTile | Mateusz Chudyk | |
2020-04-16 | Use static loops in Multiply8Shift AVX512VNNI | Mateusz Chudyk | |
2020-04-13 | Juse use posix_memalign everywhere | Kenneth Heafield | |
2020-04-06 | Merge pull request #77 from kpuatamazon/master | Kenneth Heafield | |
OMP parallelization for Multiply | |||
2020-04-02 | Merge pull request #76 from kpu/static-loop-empty-iterator | Mateusz Chudyk | |
Add support for empty iterator to static loop | |||
2020-04-02 | Add support for empty iterator to static loop | Mateusz Chudyk | |
2020-03-27 | Merge pull request #74 from kpu/unify-reference-mulitplies-funs | Kenneth Heafield | |
Unify reference mulitplies funs | |||
2020-03-27 | Add round_up function | Mateusz Chudyk | |
2020-03-27 | Add missing static keyword in utils.h | Mateusz Chudyk | |
2020-03-27 | Unify references::MultiplyFF and references::Multiply | Mateusz Chudyk | |
2020-03-26 | Merge branch 'warning_fail' of https://github.com/kpu/intgemm into warning_fail | Nikolay Bogoychev | |
2020-03-26 | Make it inline | Nikolay Bogoychev | |
2020-03-26 | Add inline | Kenneth Heafield | |
2020-03-26 | Merge remote-tracking branch 'origin/master' into warning_fail | Nikolay Bogoychev | |
2020-03-26 | Merge branch 'master' of https://github.com/kpu/intgemm | Nikolay Bogoychev | |
2020-03-26 | do not use fake intrinsics | Nikolay Bogoychev | |
2020-03-26 | Move QuantizerStd outside of the 8bit | Nikolay Bogoychev | |
2020-03-25 | Merge pull request #54 from kpu/stdQuantizer | Kenneth Heafield | |
Add standard deviation quantizer | |||
2020-03-25 | Address comments | Nikolay Bogoychev | |
2020-03-25 | merge WiP | Nikolay Bogoychev | |
2020-03-23 | OMP parallelization for Multiply | Kenneth Heafield | |
2020-03-17 | Add tmmintrin.h header | Kenneth Heafield | |
2020-03-17 | Merge pull request #71 from kpuatamazon/master | Kenneth Heafield | |
Improve compiler support | |||
2020-03-17 | Merge branch 'master' of github.com:kpu/intgemm | Kenneth Heafield | |
2020-03-16 | Fix AVX512 16bit | Mateusz Chudyk | |
2020-03-08 | Change to INTGEMM_COMPILER_SUPPORTS_AVX512BW and update test | Kenneth Heafield | |
2020-03-06 | clang unused parameter | Kenneth Heafield | |
2020-03-06 | __clang__ instead of __CLANG__ | Kenneth Heafield | |
2020-03-06 | Change include for CPUID | Kenneth Heafield | |
2020-03-06 | Fix kName for unsupported cases | Kenneth Heafield | |
2020-03-06 | Merge branch 'master' of github.com:kpu/intgemm | Kenneth Heafield | |
2020-03-06 | Fix assert | Kenneth Heafield | |
2020-03-06 | Fix OpenMP compilation on gcc 7.4.0 | Kenneth Heafield | |
The #pragma omp parallel is implemented by creating another function for the thread to launch. gcc 7.4.0 fails to carry the target attributes to that new function, so intrinstics were not working. Copying register types causes an internal compiler error. These for loops need constants initialized in registers like -127 and just moving the constants into the for loop was generating code that initializes them every iteration (no cross-loop constant extraction). The workaround is to split #pragma omp parallel to launch a function with target attributes which initializes the constants then does #pragma omp for to just divvy up the work. | |||
2020-03-06 | fabsf to fix std::max | Kenneth Heafield | |
2020-03-06 | Merge branch 'master' of github.com:kpu/intgemm | Kenneth Heafield | |
2020-03-05 | Make multiply_sat_test compile on older compilers | Kenneth Heafield | |
2020-03-05 | Change to option for OpenMP | Kenneth Heafield | |
2020-03-05 | Change CPUID dispatch for gcc and clang | Kenneth Heafield | |
2020-03-05 | clang warning | Kenneth Heafield | |
2020-03-05 | Incorrect assert | Kenneth Heafield | |
2020-03-05 | Merge branch 'master' of github.com:kpu/intgemm | Kenneth Heafield | |
2020-03-04 | Tiny tolerance for failing tests on newer compilers | Kenneth Heafield | |
2020-03-04 | Get at least the PrepareA tests working | Kenneth Heafield | |
2020-03-03 | Cap unsigned quantized value no good reason? | Kenneth Heafield | |