Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/google/ruy.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBenoit Jacob <benoitjacob@google.com>2020-07-21 06:08:19 +0300
committerCopybara-Service <copybara-worker@google.com>2020-07-21 06:08:41 +0300
commitec99c704a19d38ea502e81c0a9f5b82026471cef (patch)
tree68815e0c2cd78cdad56114121816e2d9332f8254 /ruy/kernel_avx512.cc
parentbebf022784e9b22277b84373c9877aebff8411a7 (diff)
Optimized packing code path for row-major float inputs.
This is implemented in plain C++ with memcpy and memset because: - The 1x8 kernel block layout lends itself well to such an implementation when the source is row-major. - This allows to cover at once ARM64, ARM32, and x86 AVX2 and AVX512. These kernels' layouts only differ in the number of columns. Implementing this in C++ allowed to just make that a `int KernelCols` template param. - Surprisingly, despite the humble implementation, this already seems to make row-major sources faster than column-major on x86, ARM32 and ARM64. I don't have an explanation for that! PiperOrigin-RevId: 322279263
Diffstat (limited to 'ruy/kernel_avx512.cc')
0 files changed, 0 insertions, 0 deletions