Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/google/ruy.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBenoit Jacob <benoitjacob@google.com>2020-07-13 20:27:06 +0300
committerCopybara-Service <copybara-worker@google.com>2020-07-13 20:27:26 +0300
commit7784e18d9f29e01ce16a62dfa05d58007f1c021c (patch)
tree82553fd6961e328da9800e2c3bd134890155961d /ruy/have_built_path_for.h
parent27d16d0b47ad31a81aa1d7b044a4a2162159d928 (diff)
FMA is technically a separate ISA extension from AVX2.
In practice, at least Intel CPUs supporting AVX2 also support FMA. We have always chosen to only implement a code path for AVX2+FMA, not AVX2 without FMA. At some point we had also fixed our internal ruy_copts_avx2() to pass -mfma in addition to -mavx2. So our code was technically correct. But it was a bit misleading because this AVX2+FMA path was named just AVX2. One area where this has led to confusion, has been benchmarking against other libraries that rely on the user manually passing copts to enable ISA extensions (header-only libraries) and that are rigorous about only using FMA instructions if enabled without assuming that AVX2 implies it. Concretely, Benchmarking against Eigen with -mavx2 leads to the false impression that ruy is 2x faster in its AVX2 code path, while benchmarking with -mavx2 -mfma paints the correct picture that ruy is only about 5% faster. PiperOrigin-RevId: 320982698
Diffstat (limited to 'ruy/have_built_path_for.h')
-rw-r--r--ruy/have_built_path_for.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/ruy/have_built_path_for.h b/ruy/have_built_path_for.h
index 94761a7..60e98e1 100644
--- a/ruy/have_built_path_for.h
+++ b/ruy/have_built_path_for.h
@@ -21,7 +21,7 @@ limitations under the License.
namespace ruy {
#if RUY_PLATFORM_X86
-bool HaveBuiltPathForAvx2();
+bool HaveBuiltPathForAvx2Fma();
bool HaveBuiltPathForAvx512();
#endif // RUY_PLATFORM_X86