NEWS


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144

Changes for 0.6.0 'Gyrfalcon':
------------------------------

0.6.0 is a major release for dav1d:
 - New ARM64 optimizations for the 10/12bit depth:
    - mc_avg, mc_w_avg, mc_mask
    - mc_put/mc_prep 8tap/bilin
    - mc_warp_8x8
    - wiener
    - SGR
    - loopfilter
    - cdef
 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
 - New SSSE3 optimizations for film grain
 - New AVX2 optimizations for msac_adapt16
 - Fix rare mismatches against the reference decoder, notably because of clipping
 - Improvements on ARM64 on msac, cdef and looprestoration optimizations
 - Improvements on AVX2 optimizations for cdef_filter
 - Improvements in the C version for itxfm, cdef_filter


Changes for 0.5.2 'Asiatic Cheetah':
------------------------------------

0.5.2 is a small release improving speed for ARM32 and adding minor features:
 - ARM32 optimizations for loopfilter, ipred_dc|h|v
 - Add section-5 raw OBU demuxer
 - Improve the speed by reducing the L2 cache collisions
 - Fix minor issues


Changes for 0.5.1 'Asiatic Cheetah':
------------------------------------

0.5.1 is a small release improving speeds and fixing minor issues
compared to 0.5.0:
 - SSE2 optimizations for CDEF, wiener and warp_affine
 - NEON optimizations for SGR on ARM32
 - Fix mismatch issue in x86 asm in inverse identity transforms
 - Fix build issue in ARM64 assembly if debug info was enabled
 - Add a workaround for Xcode 11 -fstack-check bug


Changes for 0.5.0 'Asiatic Cheetah':
------------------------------------

0.5.0 is a medium release fixing regressions and minor issues,
and improving speed significantly:
 - Export ITU T.35 metadata
 - Speed improvements on blend_ on ARM
 - Speed improvements on decode_coef and MSAC
 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
 - NEON optimizations for CDEF and warp on ARM32
 - SSE2 optimizations for MSAC hi_tok decoding
 - SSSE3 optimizations for deblocking loopfilters and warp_affine
 - AVX2 optimizations for film grain and ipred_z2
 - SSE4 optimizations for warp_affine
 - VSX optimizations for wiener
 - Fix inverse transform overflows in x86 and NEON asm
 - Fix integer overflows with large frames
 - Improve film grain generation to match reference code
 - Improve compatibility with older binutils for ARM
 - More advanced Player example in tools


Changes for 0.4.0 'Cheetah':
----------------------------

 - Fix playback with unknown OBUs
 - Add an option to limit the maximum frame size
 - SSE2 and ARM64 optimizations for MSAC
 - Improve speed on 32bits systems
 - Optimization in obmc blend
 - Reduce RAM usage significantly
 - The initial PPC SIMD code, cdef_filter
 - NEON optimizations for blend functions on ARM
 - NEON optimizations for w_mask functions on ARM
 - NEON optimizations for inverse transforms on ARM64
 - VSX optimizations for CDEF filter
 - Improve handling of malloc failures
 - Simple Player example in tools


Changes for 0.3.1 'Sailfish':
------------------------------

 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
 - Reduce binary size, notably on Windows
 - SSSE3 optimizations for ipred_filter
 - ARM optimizations for MSAC


Changes for 0.3.0 'Sailfish':
------------------------------

This is the final release for the numerous speed improvements of 0.3.0-rc.
It mostly:
 - Fixes an annoying crash on SSSE3 that happened in the itx functions


Changes for 0.2.2 (0.3.0-rc) 'Antelope':
-----------------------------

 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
   The impact is important on SSSE3, SSE4 and AVX2 cpus
 - SSSE3 optimizations for all blocks size in itx
 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
 - Speed improvements on CDEF for SSE4 CPUs
 - NEON optimizations for SGR and loop filter
 - Minor crashes, improvements and build changes


Changes for 0.2.1 'Antelope':
----------------------------

 - SSSE3 optimization for cdef_dir
 - AVX2 improvements of the existing CDEF optimizations
 - NEON improvements of the existing CDEF and wiener optimizations
 - Clarification about the numbering/versionning scheme


Changes for 0.2.0 'Antelope':
----------------------------

 - ARM64 and ARM optimizations using NEON instructions
 - SSSE3 optimizations for both 32 and 64bits
 - More AVX2 assembly, reaching almost completion
 - Fix installation of includes
 - Rewrite inverse transforms to avoid overflows
 - Snap packaging for Linux
 - Updated API (ABI and API break)
 - Fixes for un-decodable samples


Changes for 0.1.0 'Gazelle':
----------------------------

Initial release of dav1d, the fast and small AV1 decoder.
 - Support for all features of the AV1 bitstream
 - Support for all bitdepth, 8, 10 and 12bits
 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
 - Full acceleration for AVX2 64bits processors, making it the fastest decoder
 - Partial acceleration for SSSE3 processors
 - Partial acceleration for NEON processors