Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/intgemm.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-04-30Remove need of passing template parameter which can be deduced by a compilerstaticMateusz Chudyk
2020-04-29Fix assert in tile_test.inlMateusz Chudyk
2020-04-28Fix void * error on g++ 8.4.0. Weird error.Kenneth Heafield
2020-04-28Smaller tile for compiling checked in codeKenneth Heafield
2020-04-25Memoising benchmark program to decide tile size, but it likes 1x16Kenneth Heafield
2020-04-24Rudimentary tile benchmark. Keep in mind Multiply still needs optimization.Kenneth Heafield
2020-04-24Silence compiler warnings on 1<< overflowKenneth Heafield
2020-04-24Extract randomly generated matrix classKenneth Heafield
2020-04-24CommentKenneth Heafield
2020-04-24Oops use memcmp in test for whole arrayKenneth Heafield
2020-04-24Basic general sized multiply, not optimized yetKenneth Heafield
2020-04-24Add empty check for TileKenneth Heafield
2020-04-23Comment ends of ifdefsKenneth Heafield
2020-04-23General write working on AVX512, at least for tested casesKenneth Heafield
2020-04-23Insane implementation of most cases for writing C. Still missing offset ↵Kenneth Heafield
scatter.
2020-04-23Tests for unrolled inner dimension are trickyKenneth Heafield
2020-04-22Lots of tests, including inner failingKenneth Heafield
2020-04-22Fix TestMultiplyNoOverhangShapes to call kernelKenneth Heafield
2020-04-22Merge remote-tracking branch 'origin/master' into staticKenneth Heafield
2020-04-20Merge pull request #73 from kpu/absolute_stdKenneth Heafield
Add option for absolute value STD
2020-04-20Rename and fix interfaceabsolute_stdNikolay Bogoychev
2020-04-20Rename and move the if outside the hot loopNikolay Bogoychev
2020-04-20Merge branch 'master' into absolute_stdNikolay Bogoychev
2020-04-20Fix OMP parallel wrap typing for ShiftKenneth Heafield
2020-04-20Workaround gcc bug producing extra move instructionsKenneth Heafield
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94663 Improvement ranges from 3% (1x64x8) to 35% (8x2048x256) and is often 21-25%. Benchmark program output: BEFORE AFTER Multiply 1 64 8 Samples=75 8-bit AVX512VNNI 64 65.4933 0.875698 8-bit AVX512VNNI 62 64.8533 1.36256 Multiply 8 256 256 Samples=75 8-bit AVX512VNNI 13296 13385.3 36.0012 8-bit AVX512VNNI 10754 10873.9 31.3479 Multiply 8 2048 256 Samples=75 8-bit AVX512VNNI 86800 86974.3 59.9597 8-bit AVX512VNNI 64222 65428.6 222.893 Multiply 8 256 2048 Samples=75 8-bit AVX512VNNI 106780 107392 232.955 8-bit AVX512VNNI 86176 88366.1 402.335 Multiply 320 256 256 Samples=75 8-bit AVX512VNNI 531720 533687 1419.3 8-bit AVX512VNNI 436536 437186 352.487 Multiply 472 256 256 Samples=75 8-bit AVX512VNNI 785026 787784 2068.05 8-bit AVX512VNNI 646240 647382 416.252 Multiply 248 256 256 Samples=75 8-bit AVX512VNNI 412282 413484 971.843 8-bit AVX512VNNI 338368 338656 141.354 Multiply 200 256 256 Samples=75 8-bit AVX512VNNI 332578 333463 742.297 8-bit AVX512VNNI 272890 273103 77.2789 Multiply 256 256 256 Samples=75 8-bit AVX512VNNI 425654 427240 1095.53 8-bit AVX512VNNI 349418 349580 80.8586 Multiply 512 512 512 Samples=75 8-bit AVX512VNNI 3122382 3.13179e+06 4215.88 8-bit AVX512VNNI 2493984 2.51602e+06 6052.1 Multiply 1024 1024 1024 Samples=3 8-bit AVX512VNNI 24927622 2.49795e+07 44940.9 8-bit AVX512VNNI 19210646 1.9229e+07 17037 Multiply 4096 4096 128 Samples=3 8-bit AVX512VNNI 49870840 4.99655e+07 133057 8-bit AVX512VNNI 46146812 4.62847e+07 205448
2020-04-19Don't catch clang with the gcc hack, move VNNI to a functionKenneth Heafield
2020-04-19Fix commentKenneth Heafield
2020-04-19Work around gcc _mm512_dpbusds_epi32 spurious vmovdqa64 instructionsKenneth Heafield
Use asm ("vpdpbusds %2, %1, %0" : "+x"(c) : "x"(a), "mx"(b)); instead of c = _mm512_dpbusds_epi32(c, a, b);
2020-04-19template argument for shuffle immediateKenneth Heafield
makes clang happy
2020-04-19Remove StaticLoopKenneth Heafield
2020-04-19Change tile_test to variadic index_sequenceKenneth Heafield
2020-04-19Sum16To32 using variadic templatesKenneth Heafield
2020-04-19Replace StaticLoop with variadic templateKenneth Heafield
2020-04-19Document unordered_unfurlKenneth Heafield
2020-04-19Header for std::size_tKenneth Heafield
2020-04-19Change Index to size_tKenneth Heafield
2020-04-19Switch reduce to taking RegisterPairKenneth Heafield
2020-04-19Change to integer sequence for unrolling kernelsKenneth Heafield
2020-04-18Even more test configurationsKenneth Heafield
2020-04-18Test statically unrolled multiplies tooKenneth Heafield
2020-04-18Tiled multiply with basic testing workKenneth Heafield
2020-04-18Merge remote-tracking branch 'origin/master' into staticKenneth Heafield
2020-04-13Juse use posix_memalign everywhereKenneth Heafield
2020-04-06Merge pull request #77 from kpuatamazon/masterKenneth Heafield
OMP parallelization for Multiply
2020-04-05CommentsKenneth Heafield
2020-04-04Test SSE2Kenneth Heafield
2020-04-04Rename Pack to ReduceKenneth Heafield
2020-04-04More thoroughly test reduction codeKenneth Heafield
2020-04-04Does AVX512 reduce work?Kenneth Heafield
2020-04-04Reduce working for SSE2 and AVX2, working on AVX512Kenneth Heafield