diff options
author | Kyle Siefring <kylesiefring@gmail.com> | 2021-01-08 20:29:39 +0300 |
---|---|---|
committer | Jean-Baptiste Kempf <jb@videolan.org> | 2021-01-11 15:27:31 +0300 |
commit | 0bd57c6b22f42e9236707c0d7dc3b69a72df6225 (patch) | |
tree | ea8431514af1f120b69bc18266afb3b86049a2e3 /src/decode.c | |
parent | 3ccfc25a8451d5f17d80b06eb9d6a622722ec08d (diff) |
Rework the usage of noskip_mask
Remove half of the masks since they are only used for cdef on a 8x8
level of granularity.
Load the mask and combine the 16-bit sections into the 32-bit sections
outside of the inner cdef loop. This should save some registers.
Results in mild performance improvements.
Diffstat (limited to 'src/decode.c')
-rw-r--r-- | src/decode.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/src/decode.c b/src/decode.c index 4b076ca..197af98 100644 --- a/src/decode.c +++ b/src/decode.c @@ -1984,10 +1984,10 @@ static int decode_b(Dav1dTileContext *const t, #undef set_ctx } if (!b->skip) { - uint16_t (*noskip_mask)[2] = &t->lf_mask->noskip_mask[by4]; + uint16_t (*noskip_mask)[2] = &t->lf_mask->noskip_mask[by4 >> 1]; const unsigned mask = (~0U >> (32 - bw4)) << (bx4 & 15); const int bx_idx = (bx4 & 16) >> 4; - for (int y = 0; y < bh4; y++, noskip_mask++) { + for (int y = 0; y < bh4; y += 2, noskip_mask++) { (*noskip_mask)[bx_idx] |= mask; if (bw4 == 32) // this should be mask >> 16, but it's 0xffffffff anyway (*noskip_mask)[1] |= mask; |