Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/google/ruy.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBenoit Jacob <benoitjacob@google.com>2020-07-13 20:27:06 +0300
committerCopybara-Service <copybara-worker@google.com>2020-07-13 20:27:26 +0300
commit7784e18d9f29e01ce16a62dfa05d58007f1c021c (patch)
tree82553fd6961e328da9800e2c3bd134890155961d /ruy/path.h
parent27d16d0b47ad31a81aa1d7b044a4a2162159d928 (diff)
FMA is technically a separate ISA extension from AVX2.
In practice, at least Intel CPUs supporting AVX2 also support FMA. We have always chosen to only implement a code path for AVX2+FMA, not AVX2 without FMA. At some point we had also fixed our internal ruy_copts_avx2() to pass -mfma in addition to -mavx2. So our code was technically correct. But it was a bit misleading because this AVX2+FMA path was named just AVX2. One area where this has led to confusion, has been benchmarking against other libraries that rely on the user manually passing copts to enable ISA extensions (header-only libraries) and that are rigorous about only using FMA instructions if enabled without assuming that AVX2 implies it. Concretely, Benchmarking against Eigen with -mavx2 leads to the false impression that ruy is 2x faster in its AVX2 code path, while benchmarking with -mavx2 -mfma paints the correct picture that ruy is only about 5% faster. PiperOrigin-RevId: 320982698
Diffstat (limited to 'ruy/path.h')
-rw-r--r--ruy/path.h8
1 files changed, 5 insertions, 3 deletions
diff --git a/ruy/path.h b/ruy/path.h
index 94d3089..a3cd939 100644
--- a/ruy/path.h
+++ b/ruy/path.h
@@ -77,9 +77,11 @@ enum class Path : std::uint8_t {
#endif // RUY_PLATFORM_ARM
#if RUY_PLATFORM_X86
- // Optimized for AVX2.
- kAvx2 = 0x10,
+ // Optimized for AVX2+FMA.
+ // Compiled with -mavx2 -mfma.
+ kAvx2Fma = 0x10,
// Optimized for AVX-512.
+ // Compiled with -mavx512f -mavx512vl -mavx512cd -mavx512bw -mavx512dq.
kAvx512 = 0x20,
#endif // RUY_PLATFORM_X86
};
@@ -143,7 +145,7 @@ constexpr Path kExtraArchPaths = Path::kNone;
constexpr Path kDefaultArchPaths = Path::kNeon;
constexpr Path kExtraArchPaths = Path::kNone;
#elif RUY_PLATFORM_X86
-constexpr Path kDefaultArchPaths = Path::kAvx2 | Path::kAvx512;
+constexpr Path kDefaultArchPaths = Path::kAvx2Fma | Path::kAvx512;
constexpr Path kExtraArchPaths = Path::kNone;
#else
constexpr Path kDefaultArchPaths = Path::kNone;