github.com/marian-nmt/intgemm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-04-20	Rename and fix interfaceabsolute_std	Nikolay Bogoychev

2020-04-20	Rename and move the if outside the hot loop	Nikolay Bogoychev

2020-04-20	Merge branch 'master' into absolute_std	Nikolay Bogoychev

2020-04-20	Fix OMP parallel wrap typing for Shift	Kenneth Heafield

2020-04-20	Workaround gcc bug producing extra move instructions	Kenneth Heafield
	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94663 Improvement ranges from 3% (1x64x8) to 35% (8x2048x256) and is often 21-25%. Benchmark program output: BEFORE AFTER Multiply 1 64 8 Samples=75 8-bit AVX512VNNI 64 65.4933 0.875698 8-bit AVX512VNNI 62 64.8533 1.36256 Multiply 8 256 256 Samples=75 8-bit AVX512VNNI 13296 13385.3 36.0012 8-bit AVX512VNNI 10754 10873.9 31.3479 Multiply 8 2048 256 Samples=75 8-bit AVX512VNNI 86800 86974.3 59.9597 8-bit AVX512VNNI 64222 65428.6 222.893 Multiply 8 256 2048 Samples=75 8-bit AVX512VNNI 106780 107392 232.955 8-bit AVX512VNNI 86176 88366.1 402.335 Multiply 320 256 256 Samples=75 8-bit AVX512VNNI 531720 533687 1419.3 8-bit AVX512VNNI 436536 437186 352.487 Multiply 472 256 256 Samples=75 8-bit AVX512VNNI 785026 787784 2068.05 8-bit AVX512VNNI 646240 647382 416.252 Multiply 248 256 256 Samples=75 8-bit AVX512VNNI 412282 413484 971.843 8-bit AVX512VNNI 338368 338656 141.354 Multiply 200 256 256 Samples=75 8-bit AVX512VNNI 332578 333463 742.297 8-bit AVX512VNNI 272890 273103 77.2789 Multiply 256 256 256 Samples=75 8-bit AVX512VNNI 425654 427240 1095.53 8-bit AVX512VNNI 349418 349580 80.8586 Multiply 512 512 512 Samples=75 8-bit AVX512VNNI 3122382 3.13179e+06 4215.88 8-bit AVX512VNNI 2493984 2.51602e+06 6052.1 Multiply 1024 1024 1024 Samples=3 8-bit AVX512VNNI 24927622 2.49795e+07 44940.9 8-bit AVX512VNNI 19210646 1.9229e+07 17037 Multiply 4096 4096 128 Samples=3 8-bit AVX512VNNI 49870840 4.99655e+07 133057 8-bit AVX512VNNI 46146812 4.62847e+07 205448
2020-04-13	Juse use posix_memalign everywhere	Kenneth Heafield

2020-04-06	Merge pull request #77 from kpuatamazon/master	Kenneth Heafield
	OMP parallelization for Multiply
2020-04-02	Merge pull request #76 from kpu/static-loop-empty-iterator	Mateusz Chudyk
	Add support for empty iterator to static loop
2020-04-02	Add support for empty iterator to static loop	Mateusz Chudyk

2020-03-27	Merge pull request #74 from kpu/unify-reference-mulitplies-funs	Kenneth Heafield
	Unify reference mulitplies funs
2020-03-27	Add round_up function	Mateusz Chudyk

2020-03-27	Add missing static keyword in utils.h	Mateusz Chudyk

2020-03-27	Unify references::MultiplyFF and references::Multiply	Mateusz Chudyk

2020-03-26	Add option for absolute value STD	Nikolay Bogoychev

2020-03-26	Merge branch 'warning_fail' of https://github.com/kpu/intgemm into warning_fail	Nikolay Bogoychev

2020-03-26	Make it inline	Nikolay Bogoychev

2020-03-26	Add inline	Kenneth Heafield

2020-03-26	Merge remote-tracking branch 'origin/master' into warning_fail	Nikolay Bogoychev

2020-03-26	Merge branch 'master' of https://github.com/kpu/intgemm	Nikolay Bogoychev

2020-03-26	do not use fake intrinsics	Nikolay Bogoychev

2020-03-26	Move QuantizerStd outside of the 8bit	Nikolay Bogoychev

2020-03-25	Merge pull request #54 from kpu/stdQuantizer	Kenneth Heafield
	Add standard deviation quantizer
2020-03-25	Address comments	Nikolay Bogoychev

2020-03-25	merge WiP	Nikolay Bogoychev

2020-03-23	OMP parallelization for Multiply	Kenneth Heafield

2020-03-17	Add tmmintrin.h header	Kenneth Heafield

2020-03-17	Merge pull request #71 from kpuatamazon/master	Kenneth Heafield
	Improve compiler support
2020-03-17	Merge branch 'master' of github.com:kpu/intgemm	Kenneth Heafield

2020-03-08	Change to INTGEMM_COMPILER_SUPPORTS_AVX512BW and update test	Kenneth Heafield

2020-03-06	clang unused parameter	Kenneth Heafield

2020-03-06	__clang__ instead of __CLANG__	Kenneth Heafield

2020-03-06	Change include for CPUID	Kenneth Heafield

2020-03-06	Fix kName for unsupported cases	Kenneth Heafield

2020-03-06	Merge branch 'master' of github.com:kpu/intgemm	Kenneth Heafield

2020-03-06	Fix assert	Kenneth Heafield

2020-03-06	Fix OpenMP compilation on gcc 7.4.0	Kenneth Heafield
	The #pragma omp parallel is implemented by creating another function for the thread to launch. gcc 7.4.0 fails to carry the target attributes to that new function, so intrinstics were not working. Copying register types causes an internal compiler error. These for loops need constants initialized in registers like -127 and just moving the constants into the for loop was generating code that initializes them every iteration (no cross-loop constant extraction). The workaround is to split #pragma omp parallel to launch a function with target attributes which initializes the constants then does #pragma omp for to just divvy up the work.
2020-03-06	fabsf to fix std::max	Kenneth Heafield

2020-03-06	Merge branch 'master' of github.com:kpu/intgemm	Kenneth Heafield

2020-03-05	Make multiply_sat_test compile on older compilers	Kenneth Heafield

2020-03-05	Change to option for OpenMP	Kenneth Heafield

2020-03-05	Change CPUID dispatch for gcc and clang	Kenneth Heafield

2020-03-05	clang warning	Kenneth Heafield

2020-03-05	Incorrect assert	Kenneth Heafield

2020-03-05	Merge branch 'master' of github.com:kpu/intgemm	Kenneth Heafield

2020-03-04	Tiny tolerance for failing tests on newer compilers	Kenneth Heafield

2020-03-04	Get at least the PrepareA tests working	Kenneth Heafield

2020-03-03	Cap unsigned quantized value no good reason?	Kenneth Heafield

2020-03-03	Use _mm512_mask_cvtusepi32_storeu_epi8	Kenneth Heafield

2020-03-03	Fix output address for quantizer	Kenneth Heafield

2020-03-03	Fix compiler warning for compilers without VNNI	Kenneth Heafield