Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.xiph.org/xiph/opus.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTimothy B. Terriberry <tterribe@xiph.org>2015-01-03 02:48:54 +0300
committerTimothy B. Terriberry <tterribe@xiph.org>2015-01-03 03:16:21 +0300
commit7422189ab16de442554da7f73c3c6f3c15130d22 (patch)
tree38894f0c3d4cca820268fe881fc334d5bb1a7422
parent23f503ad1c388aa9171af931ccb2f114f0839e0e (diff)
Fix silk_VQ_WMat_EC_sse4_1().
During review of c95c9a048f32, I replaced a call to _mm_cvtepi8_epi32() with the OP_CVTEPI16_EPI32_M64() macro (note the 16 instead of 8). Make a separate OP_CVTEPI8_EPI32_M32() macro and use that instead. Thaks to Wei Zhou for the report.
-rw-r--r--celt/x86/x86cpu.h24
-rw-r--r--silk/x86/VQ_WMat_EC_sse.c2
2 files changed, 17 insertions, 9 deletions
diff --git a/celt/x86/x86cpu.h b/celt/x86/x86cpu.h
index 2394b05e..44b3a597 100644
--- a/celt/x86/x86cpu.h
+++ b/celt/x86/x86cpu.h
@@ -44,18 +44,26 @@
int opus_select_arch(void);
# endif
-/*gcc appears to emit MOVDQA's to load the argument of an _mm_cvtepi16_epi32()
- when optimizations are disabled, even though the actual PMOVSXWD instruction
- takes an m64. Unlike a normal m64 reference, these require 16-byte alignment
- and load 16 bytes instead of 8, possibly reading out of bounds.
-
- We can insert an explicit MOVQ using _mm_loadl_epi64(), which should have the
- same semantics as an m64 reference in the PMOVSXWD instruction itself, but
- gcc is not smart enough to optimize this out when optimizations ARE enabled.*/
+/*gcc appears to emit MOVDQA's to load the argument of an _mm_cvtepi8_epi32()
+ or _mm_cvtepi16_epi32() when optimizations are disabled, even though the
+ actual PMOVSXWD instruction takes an m32 or m64. Unlike a normal memory
+ reference, these require 16-byte alignment and load a full 16 bytes (instead
+ of 4 or 8), possibly reading out of bounds.
+
+ We can insert an explicit MOVD or MOVQ using _mm_cvtsi32_si128() or
+ _mm_loadl_epi64(), which should have the same semantics as an m32 or m64
+ reference in the PMOVSXWD instruction itself, but gcc is not smart enough to
+ optimize this out when optimizations ARE enabled.*/
# if !defined(__OPTIMIZE__)
+# define OP_CVTEPI8_EPI32_M32(x) \
+ (_mm_cvtepi8_epi32(_mm_cvtsi32_si128(*(int *)(x))))
+
# define OP_CVTEPI16_EPI32_M64(x) \
(_mm_cvtepi16_epi32(_mm_loadl_epi64((__m128i *)(x))))
# else
+# define OP_CVTEPI8_EPI32_M32(x) \
+ (_mm_cvtepi8_epi32(*(__m128i *)(x)))
+
# define OP_CVTEPI16_EPI32_M64(x) \
(_mm_cvtepi16_epi32(*(__m128i *)(x)))
# endif
diff --git a/silk/x86/VQ_WMat_EC_sse.c b/silk/x86/VQ_WMat_EC_sse.c
index 1460cead..74d6c6d0 100644
--- a/silk/x86/VQ_WMat_EC_sse.c
+++ b/silk/x86/VQ_WMat_EC_sse.c
@@ -65,7 +65,7 @@ void silk_VQ_WMat_EC_sse4_1(
diff_Q14[ 0 ] = in_Q14[ 0 ] - silk_LSHIFT( cb_row_Q7[ 0 ], 7 );
C_tmp1 = OP_CVTEPI16_EPI32_M64( &in_Q14[ 1 ] );
- C_tmp2 = OP_CVTEPI16_EPI32_M64( &cb_row_Q7[ 1 ] );
+ C_tmp2 = OP_CVTEPI8_EPI32_M32( &cb_row_Q7[ 1 ] );
C_tmp2 = _mm_slli_epi32( C_tmp2, 7 );
C_tmp1 = _mm_sub_epi32( C_tmp1, C_tmp2 );