Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-11-10Add info to dav1d_send_data docsHEADmasterCharlie Hayden
2022-10-30build: drop -D_DARWIN_C_SOURCE on macOS/iOS after 6b611d36acabJan Beich
Already implied when -D_POSIX_C_SOURCE is not passed.
2022-10-30build: drop -D_POSIX_C_SOURCE on non-Linux after 6b611d36acabJan Beich
Non-GNU systems enable extensions (XSI, BSD, GNU) by default.
2022-10-27threading: Add a pending list for async task insertionVictorien Le Couviour--Tuffet
2022-10-26Implement atomic_compare_exchange_strong in the atomic compat headersMartin Storsjö
This fixes building with MSVC (and older GCC versions) after 3e7886db54d0cb3ce32909c71ad2a8c9d9eab223.
2022-10-20threading: Fix a race around frame completion (frame-mt)Victorien Le Couviour--Tuffet
The completion of the first frame to decode while an async reset request on that same frame is pending will render it stale. The processing of such a stale request is likely to result in a hang. One reason this happens is the skip condition at the beginning of reset_task_cur(). => Consume the async request before that check. Another reason is several threads producing async reset requests in parallel: an async request for the first frame could cascade through the other threads (other frames) during completion of that frame, meaning not being caught by the last synchronous reset_task_cur() after signaling the main thread and before releasing the lock. => To solve this we need to add protections at the racy locations. That means after we increase first, before returning from reset_task_cur_async(), and after consuming the async request.
2022-10-10Handle host_machine.system() 'ios' and 'tvos' the same way as 'darwin'Sebastian Dröge
Despite not being documented in Meson's list of canonical system names, Meson does accept 'ios' mostly a synonym for darwin. By using 'ios' instead of darwin, it allows distinguishing between the two in the cases where that is necessary. Therefore, within dav1d, allow using the 'ios' name as alias for 'darwin' for system name, to allow using cross files that does this distinction. meson itself also allows 'tvos' in addition to 'ios' in the internal `is_darwin()` function, as such all 3 are handled the same here.
2022-09-30x86: Add 10-bit 8x8/8x16/16x8/16x16 itx AVX-512 (Ice Lake) asmHenrik Gramner
2022-09-30Specify hidden visibility for global data symbol declarationsHenrik Gramner
'-fvisibility=hidden' only applies to definitions, not declarations, so the compiler has to be conservative about how references to global data symbols are performed. Explicitly specifying the visibility allows for better code generation.
2022-09-28build: strip() the result of cc.get_define()Henrik Gramner
Whitespace is added to the result if compiling with MSVC using /std:c11 which breaks various things. Adding strip() fixes the problem.
2022-09-28checkasm: Move printf format string to .rodata on x86Henrik Gramner
2022-09-28checkasm: Improve 32-bit parameter clobbering on x86-64Henrik Gramner
Use explicit parameter type detection and manually clobber the upper bits instead of relying on internal compiler behavior.
2022-09-26x86: Fix incorrect 32-bit parameter usage in high bit-depth AVX-512 mcHenrik Gramner
The 32-bit width parameter was used directly as a pointer offset, but the upper half is undefined. Fix it by replacing 'cmp' with 'sub' to explicitly zero those bits.
2022-09-19arm: itx: Add clipping to row_clip_min/max in the 10 bpc codepathsMartin Storsjö
This fixes conformance with the argon test samples, in particular with these samples: profile0_core/streams/test10100_579_8614.obu profile0_core/streams/test10218_6914.obu This gives a pretty notable slowdown to these transforms - some examples: Before: Cortex A53 A72 A73 Apple M1 inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 365.7 290.2 299.8 0.3 inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 1865.2 1384.1 1457.5 2.6 inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 33976.3 26817.0 24864.2 40.4 After: inv_txfm_add_8x8_dct_dct_1_10bpc_neon: 397.7 322.2 335.1 0.4 inv_txfm_add_16x16_dct_dct_2_10bpc_neon: 2121.9 1336.7 1664.6 2.6 inv_txfm_add_64x64_dct_dct_4_10bpc_neon: 38569.4 27622.6 28176.0 51.0 Thus, for the transforms alone, it makes them around 10-13% slower (the Apple M1 measurements are too noisy to be conclusive here). Measured on actual full decoding, it makes decoding of 10 bpc Chimera around maybe 1% slower on an Apple M1 - close to measurement noise anyway.
2022-09-19x86: Fix overflows in 12bpc AVX2 IDCT/IADSTHenrik Gramner
2022-09-19x86: Fix overflows in 12bpc AVX2 DC-only IDCTHenrik Gramner
Using smaller immediates also results in a small code size reduction in some cases, so apply those changes to the (10bpc-only) SSE code as well.
2022-09-19x86: Fix clipping in high bit-depth AVX2 4x16 IDCTHenrik Gramner
Certain clips were incorrectly performed on negated values, which caused things to be off-by-one in both directions. Correct this by negating such values prior to clipping instead of afterwards.
2022-09-15Don't use gas-preprocessor with clang-cl for arm targetsMartin Storsjö
Since meson 0.58.0 (released in May 2021), meson accepts adding '.S' assembly files as source files to the clang-cl compiler. If using an older version of meson, keep using gas-preprocessor just like for MSVC builds.
2022-09-15Fix checking the reference dimesions for the projection processDavid Conrad
Section 7.9.2 returns 0 "If RefMiRows[ srcIdx ] is not equal to MiRows, RefMiCols[ srcIdx ] is not equal to MiCols" dav1d was comparing pixel width/height, not block width/height, so conform with the spec
2022-09-15Fix calculation of OBMC lap dimensionsDavid Conrad
Individual OBMC lapped predictions have a max width of 64 pixels for the top lap and have a max height of 64 for the left laps This is 7.11.3.9. Overlapped motion compensation process step4 = Clip3( 2, 16, Num_4x4_Blocks_Wide[ candSz ] ) dav1d wasn't clipping this as needed, which means that with scaled MC, the interpolation of the 2nd half of a 128 block was incorrect, since mx/my for subpel filter selection need to be reset at the 64 pixel boundary
2022-09-15Support film grain application whose only effect is clipping to video rangeDavid Conrad
This is the parameter combination: num_y_points == 0 && num_cb_points == 0 && num_cr_points == 0 && chroma_scaling_from_luma == 1 && clip_to_restricted_range == 1 Film grain application has two effects: adding noise, and optionally clipping to video range For luma, the spec skips film grain application if there's no noise (num_y_points == 0), but for chroma, it's only skipped if there's no chroma noise *and* chroma_scaling_from_luma is false This means it's possible for there to be no noise (num_*_points = 0), but if clip_to_restricted_range is true then chroma pixels can be clipped to video range, if chroma_scaling_from_luma is true. Luma pixels, however, aren't clipped to video range unless there's noise to apply. dav1d currently skips applying film grain entirely if there is no noise, regardless of the secondary clipping.
2022-09-15Ignore T.35 metadata if the OBU contains no payloadDavid Conrad
The syntax of itu_t_t35_payload_bytes is not defined in the AV1 specification, but it does state that decoders should ignore the entire OBU if they do not understand it.
2022-09-15Fix chroma deblock filter size calculation for losslessDavid Conrad
In section 5.11.34 txSz is always defined to TX_4X4 if Lossless is true Chroma deblock filter size calculation needs to use this overridden txSz when lossless is enabled
2022-09-15Fix rounding in the calculation of initialSubpelXDavid Conrad
The spec divides err by two, rounding to 0, instead of >>1, which rounds towards negative infinity
2022-09-15Fix overflow when saturating dequantized coefficients clipped to 0David Conrad
It's possible to encode a large coefficient that becomes 0 after the clipping in dequant (Abs( dq ) & 0xFFFFFF), e.g. 0x1000000 After that &0xFFFFFF, coeffs are saturated in the range of [-(1 << (bitdepth+7)), 1 << (bitdepth+7)) dav1d implements this saturation via umin(dq - sign, cf_max), then applies the sign afterwards via xor. However, for dq = 0 and sign = 1, this step evaulates to umin(UINT_MAX, cf_max) == cf_max instead of the expected 0. So instead, do unsigned saturate as umin(dq, cf_max + sign), then apply sign via (sign ? -dq : dq) On arm this is the same number of instructions, since cneg exists and is used On x86 this requires an additional instruction, but this isn't a latency-critical path
2022-09-15Fix overflow in 8-bit NEON ADSTDavid Conrad
In 8-bit adst, it's possible that the final Round2(x[0], 12) can exceed 16-bits signed Specifically, in 7.13.2.6. Inverse ADST4 process, the precision requirement is: "It is a requirement of bitstream conformance that all values stored in the s and x arrays by this process are representable by a signed integer using r + 12 bits of precision." For 8 bits, r is 16 for both row and column, so x[] can be 28-bit signed. For values [134215680, 134217727] (within 2047 of the maximum 28-bit value), the final Round2(x[0], 12) evaluates to 32768, exceeding 16-bits signed. So switch to using sqrshrn, which saturates to 16-bits signed This is a continuation of: Commit b53ff29d80a21180e5ad9bbe39a02541151f4f53 arm: itx: Do clipping in all narrowing downshifts
2022-09-14tools: Allocate the priv structs with proper alignmentMartin Storsjö
Previously, they could be allocated with any random alignment matching the end of the MuxerContext/DemuxerContext. The priv structs themselves can have members that require specific alignment, or at least the default alignment of malloc()/calloc() (which is sufficient for native types such as uint64_t and doubles). This fixes crashes in some arm builds, where GCC (correctly) wants to use 64 bit aligned stores to write to MD5Context.
2022-09-12x86: Fix clipping in 10bpc SSE4.1 IDCT asmHenrik Gramner
2022-09-10build: Improve Windows linking optionsHenrik Gramner
2022-09-09tools: Improve demuxer probingHenrik Gramner
Increase the probing size, and change the logic to assume a stream is valid even if no conclusive decision could be made within the probing window as long as a sequence header was detected.
2022-09-09CI: Disable trimming on some testsMatthias Dressel
Allow checkasm to run.
2022-09-09CI: Remove git 'safe.directory' configMatthias Dressel
It is now handled by the gitlab runner. Ref: 7d859f9c728e5042f9e1fbb98625d624c489a50e
2022-09-09gcovr: Ignore parsing errorsMatthias Dressel
2022-09-09crossfiles: Update Android toolchainsMatthias Dressel
* Android armv7: target API 19 since it's the lowest directly provided by the new NDK. * Newer NDK has generic tools for ar, strip, etc. * Remove windres as it's only relevant for Windows targets.
2022-09-09CI: Update imagesMatthias Dressel
Remove experimental since gcc12, clang14, mold are now in unstable.
2022-09-08threading: Limit the progress bitfields to the used sizeVictorien Le Couviour--Tuffet
Store the used size instead of the allocated size. The used size can be smaller than the allocated size, which results in a wrong computation of the linear progress from the frame_progress bitfield.
2022-09-08x86: Fix rare crash in chroma film grain asmHenrik Gramner
The width parameter is used directly as a pointer offset, so ensure that it has an appropriately sized data type. This has been done previously for luma, but chroma was overlooked.
2022-09-07x86: Fix overflows in 12bpc AVX2 identity itx asmHenrik Gramner
2022-09-07x86: Fix an alignment issue in 8-bit AVX-512 loop restorationHenrik Gramner
We don't have a separate 8-bit AVX-512 5-tap Wiener filter so the 7-tap function is used for chroma as well, and in some esoteric edge cases chroma dst pointers may only have a 32-byte alignment despite having a width larger than 32, so use an unaligned store as a workaround.
2022-09-02checkasm: Add short optionsVictorien Le Couviour--Tuffet
2022-09-02checkasm: Add pattern matching to --testVictorien Le Couviour--Tuffet
2022-09-02checkasm: Remove pattern matching from --benchVictorien Le Couviour--Tuffet
The pattern matching feature has been improved and is now performed under the new --function parameter, rendering this one obsolete.
2022-09-02checkasm: Add a --function optionVictorien Le Couviour--Tuffet
Allows to run checkasm only for functions matching a given pattern.
2022-08-30threading: Fix copy_lpf_progress initializationVictorien Le Couviour--Tuffet
The copy_lpf_progress bitfield might not be fully cleared when size goes down. Credit to Oss-Fuzz.
2022-08-19data: don't overwrite the Dav1dDataProps size valueJames Almer
Fixes a regression since commit 3d3c51a07cc3dd1e3687da40fdb6fbb857cbced1.
2022-07-25Adjust inlining attributes on some functionsHenrik Gramner
The code size increase of inlining every call to certain functions isn't a worthwhile trade-off, and most compilers actually ends up overriding those particular inlining hints anyway. In some cases it's also better to split the function into separate luma and chroma functions.
2022-07-19x86: Remove leftover instruction in loopfilter AVX2 asmHenrik Gramner
In 0aca76c sequences of pand/pandn/por was replaced by pblendvb, but one instruction (which now acts as a no-op) was accidentally left in.
2022-07-14Enable pointer authentication in assembly when building arm64eDavid Conrad
2022-07-11Don't trash the return stack buffer in the NEON loop filterDavid Conrad
The NEON loop filter's innermost asm function can return to a different location than the address that called it. This messes up the return stack predictor, causing returns to be mispredicted Instead, rework the function to always return to the address that calls it, and instead return the information needed for the caller to short-circuit storing pixels
2022-07-06CI: Removed snap package generationKonstantin Pavlov
snapcraft version we use is no longer compatible with authentication schemes snap store uses. This could be fixed by updating the snapcraft inside the docker image, but Ubuntu no longer ships an up to date snapcraft version in their own repositories. The other way to install snapcraft is to manually fetch the project and core snaps just like we do in https://code.videolan.org/videolan/docker-images/-/blob/master/vlc-ubuntu-focal/Dockerfile, but that currently fails on Jammy due to conflict in Python versions between what is shipped in Jammy and inside snapcraft project. All in all, it seems snapcraft seems to be abandoned for our CI use-case, and the usefulness of dav1d snap is disputable, so just drop it altogether. Packaging is still available in package/snap/ for the brave souls who want to build it on their own.