github.com/marian-nmt/intgemm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-04-30	Remove need of passing template parameter which can be deduced by a compilerstatic	Mateusz Chudyk

2020-04-29	Fix assert in tile_test.inl	Mateusz Chudyk

2020-04-28	Fix void * error on g++ 8.4.0. Weird error.	Kenneth Heafield

2020-04-28	Smaller tile for compiling checked in code	Kenneth Heafield

2020-04-25	Memoising benchmark program to decide tile size, but it likes 1x16	Kenneth Heafield

2020-04-24	Rudimentary tile benchmark. Keep in mind Multiply still needs optimization.	Kenneth Heafield

2020-04-24	Silence compiler warnings on 1<< overflow	Kenneth Heafield

2020-04-24	Extract randomly generated matrix class	Kenneth Heafield

2020-04-24	Comment	Kenneth Heafield

2020-04-24	Oops use memcmp in test for whole array	Kenneth Heafield

2020-04-24	Basic general sized multiply, not optimized yet	Kenneth Heafield

2020-04-24	Add empty check for Tile	Kenneth Heafield

2020-04-23	Comment ends of ifdefs	Kenneth Heafield

2020-04-23	General write working on AVX512, at least for tested cases	Kenneth Heafield

2020-04-23	Insane implementation of most cases for writing C. Still missing offset ↵	Kenneth Heafield
	scatter.
2020-04-23	Tests for unrolled inner dimension are tricky	Kenneth Heafield

2020-04-22	Lots of tests, including inner failing	Kenneth Heafield

2020-04-22	Fix TestMultiplyNoOverhangShapes to call kernel	Kenneth Heafield

2020-04-22	Merge remote-tracking branch 'origin/master' into static	Kenneth Heafield

2020-04-20	Merge pull request #73 from kpu/absolute_std	Kenneth Heafield
	Add option for absolute value STD
2020-04-20	Rename and fix interfaceabsolute_std	Nikolay Bogoychev

2020-04-20	Rename and move the if outside the hot loop	Nikolay Bogoychev

2020-04-20	Merge branch 'master' into absolute_std	Nikolay Bogoychev

2020-04-20	Fix OMP parallel wrap typing for Shift	Kenneth Heafield

2020-04-20	Workaround gcc bug producing extra move instructions	Kenneth Heafield
	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94663 Improvement ranges from 3% (1x64x8) to 35% (8x2048x256) and is often 21-25%. Benchmark program output: BEFORE AFTER Multiply 1 64 8 Samples=75 8-bit AVX512VNNI 64 65.4933 0.875698 8-bit AVX512VNNI 62 64.8533 1.36256 Multiply 8 256 256 Samples=75 8-bit AVX512VNNI 13296 13385.3 36.0012 8-bit AVX512VNNI 10754 10873.9 31.3479 Multiply 8 2048 256 Samples=75 8-bit AVX512VNNI 86800 86974.3 59.9597 8-bit AVX512VNNI 64222 65428.6 222.893 Multiply 8 256 2048 Samples=75 8-bit AVX512VNNI 106780 107392 232.955 8-bit AVX512VNNI 86176 88366.1 402.335 Multiply 320 256 256 Samples=75 8-bit AVX512VNNI 531720 533687 1419.3 8-bit AVX512VNNI 436536 437186 352.487 Multiply 472 256 256 Samples=75 8-bit AVX512VNNI 785026 787784 2068.05 8-bit AVX512VNNI 646240 647382 416.252 Multiply 248 256 256 Samples=75 8-bit AVX512VNNI 412282 413484 971.843 8-bit AVX512VNNI 338368 338656 141.354 Multiply 200 256 256 Samples=75 8-bit AVX512VNNI 332578 333463 742.297 8-bit AVX512VNNI 272890 273103 77.2789 Multiply 256 256 256 Samples=75 8-bit AVX512VNNI 425654 427240 1095.53 8-bit AVX512VNNI 349418 349580 80.8586 Multiply 512 512 512 Samples=75 8-bit AVX512VNNI 3122382 3.13179e+06 4215.88 8-bit AVX512VNNI 2493984 2.51602e+06 6052.1 Multiply 1024 1024 1024 Samples=3 8-bit AVX512VNNI 24927622 2.49795e+07 44940.9 8-bit AVX512VNNI 19210646 1.9229e+07 17037 Multiply 4096 4096 128 Samples=3 8-bit AVX512VNNI 49870840 4.99655e+07 133057 8-bit AVX512VNNI 46146812 4.62847e+07 205448
2020-04-19	Don't catch clang with the gcc hack, move VNNI to a function	Kenneth Heafield

2020-04-19	Fix comment	Kenneth Heafield

2020-04-19	Work around gcc _mm512_dpbusds_epi32 spurious vmovdqa64 instructions	Kenneth Heafield
	Use asm ("vpdpbusds %2, %1, %0" : "+x"(c) : "x"(a), "mx"(b)); instead of c = _mm512_dpbusds_epi32(c, a, b);
2020-04-19	template argument for shuffle immediate	Kenneth Heafield
	makes clang happy
2020-04-19	Remove StaticLoop	Kenneth Heafield

2020-04-19	Change tile_test to variadic index_sequence	Kenneth Heafield

2020-04-19	Sum16To32 using variadic templates	Kenneth Heafield

2020-04-19	Replace StaticLoop with variadic template	Kenneth Heafield

2020-04-19	Document unordered_unfurl	Kenneth Heafield

2020-04-19	Header for std::size_t	Kenneth Heafield

2020-04-19	Change Index to size_t	Kenneth Heafield

2020-04-19	Switch reduce to taking RegisterPair	Kenneth Heafield

2020-04-19	Change to integer sequence for unrolling kernels	Kenneth Heafield

2020-04-18	Even more test configurations	Kenneth Heafield

2020-04-18	Test statically unrolled multiplies too	Kenneth Heafield

2020-04-18	Tiled multiply with basic testing work	Kenneth Heafield

2020-04-18	Merge remote-tracking branch 'origin/master' into static	Kenneth Heafield

2020-04-13	Juse use posix_memalign everywhere	Kenneth Heafield

2020-04-06	Merge pull request #77 from kpuatamazon/master	Kenneth Heafield
	OMP parallelization for Multiply
2020-04-05	Comments	Kenneth Heafield

2020-04-04	Test SSE2	Kenneth Heafield

2020-04-04	Rename Pack to Reduce	Kenneth Heafield

2020-04-04	More thoroughly test reduction code	Kenneth Heafield

2020-04-04	Does AVX512 reduce work?	Kenneth Heafield

2020-04-04	Reduce working for SSE2 and AVX2, working on AVX512	Kenneth Heafield