Age | Commit message (Collapse) | Author |
|
From section 6.8.2 in the AV1 spec:
"It is a requirement of bitstream conformance that when show_existing_frame is
used to show a previous frame with RefFrameType[ frame_to_show_map_idx ] equal
to KEY_FRAME, that the frame is output via the show_existing_frame mechanism at
most once."
|
|
the next visible picture in display order
If the first picture in coding order after a new sequence header is parsed is
not visible, the first picture output by dav1d after the fact (which is coded
after the aforementioned invisible picture) would not trigger the new seq
header event flag as expected, despite being the first containing a reference
to a new sequence header.
Assuming the invisible picture is ever output, the result of this change will
be two pictures signaling a new sequence header was seen despite there being
only one new sequence header.
|
|
|
|
|
|
Merges the 3 threading parameters into a single `--threads=` argument.
Frame threading can still be controlled via the `--framedelay=` argument.
Internally, the threading model is now a global thread/task pool design.
Co-authored-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
And a function to fetch them. Should be useful to signal changes in the
bitstream the user may want to know about.
Starting with two flags, DAV1D_EVENT_FLAG_NEW_SEQUENCE and
DAV1D_EVENT_FLAG_NEW_OP_PARAMS_INFO, which signal the presence of an updated
sequence header in the last returned (or to be returned) picture.
|
|
Add buffer pools for miscellaneous smaller buffers that are
repeatedly being freed and reallocated.
Also improve dav1d_ref_create() by consolidating two separate
memory allocations into a single one.
|
|
Reuse buffers allocated for picture data instead of constantly
freeing and allocating new ones.
The impact of this can vary significantly between different systems,
in particular it's highly beneficial on Windows where it can result
in an overall performance increase of up to 10% in some cases.
|
|
|
|
Memory addresses with certain power-of-two offsets will map to the
same set of cache lines. Using such offsets as strides will cause
excessive cache evictions resulting in more cache misses.
Avoid this by adding a small padding when the stride is a multiple
of 1024 (somewhat arbitrarily chosen as the specific number depends
on the hardware implementation) when allocating picture buffers.
|
|
When compiling in release mode, instead of just deleting assertions,
use them to give hints to the compiler. This allows for slightly
better code generation in some cases.
|
|
Based on a patch by Renato Cassaca.
|
|
|
|
|
|
The doxy for Dav1dPicAllocator.alloc_picture_callback() states it must be a
negative errno value.
Propagate it as well in picture_alloc_with_edges().
|
|
allocating pictures
|
|
Allows simplified SIMD function implementations that don't exactly
respect picture boundaries when reading picture data. Fixes #251 and
#250.
|
|
dav1d_picture_alloc_copy()
The references in the Dav1dContext may not necessarely apply to the picture being copied.
|
|
|
|
It's called from a single function in the entire codebase, so no point
passing so many individual arguments to it when almost all of them are
derived from a single struct.
|
|
|
|
|
|
|
|
|
|
|
|
Unlikely to cause problems in practice, but technically required since the
compiler is free to use aligned AVX stores to zero local stack-allocated
variables (when using the appropriate compiler flags) for example.
|
|
|
|
|
|
|
|
|
|
|
|
Also remove redundant entries from Dav1dPictureParameters, and move
documentation of these fields into Dav1dFrame/SequenceHeader instead.
|
|
|
|
The old flushing logic would simply leave frame threads (and tile
threads) running without caring how much latency that might impose
in the post-seek time-to-first-frame. This commit adds a 'flush'
state that will abort all running frame/tile threads from decoding
their current frame, as well as dispose of all frames in the output
queue.
Then, we use dav1d_flush() in dav1d_close() to abort running threads
on exit, instead of signaling their respective dependents to prevent
deadlocks. The advantage of this approach is that we don't signal on
objects we don't have ownership over, and thus this should prevent
race conditions where the owning thread could dispose of the object
just as we're signaling it, which I believe is what causes #193.
|
|
Fixes #59.
|
|
|
|
Fixed 00000802.ivf.
|
|
Also ensure we apply film-grain to delayed pictures.
|
|
This is using a slightly adapted version of my GPU-based algorithm. The
major difference to the algorithm suggested by the spec (and implemented
in libaom) is that instead of using a line buffer to hold the previous
row's film grain blocks, we compute each row/block fully independently.
This opens up the door to exploit parallelism in the future, since we
don't have any left->right or top->down dependency except for the PRNG
state. (Which we could pre-compute for a massively parallel / GPU
implementation)
That being said, it's probably somewhat slower than using a line buffer
for the serial / single CPU case, although most likely not by much
(since the areas with the most redundant work get progressively smaller,
down to a single 2x2 square for the worst case).
|
|
This becomes part of the picture properties, since users may want to
apply film grain themselves (e.g. for a GPU implementation).
|
|
Fixes #172.
|
|
|
|
Fixes #127
|
|
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Fix #120.
|
|
|
|
"comparison between signed and unsigned integer expressions"
|
|
|
|
|