1 files changed, 308 insertions, 0 deletions
diff --git a/third_party/dav1d/NEWS b/third_party/dav1d/NEWS
new file mode 100644
index 0000000000..45ffd4f8f6
--- /dev/null
+++ b/third_party/dav1d/NEWS
@@ -0,0 +1,308 @@
+Changes for 1.1.0 'Arctic Peregrine Falcon':
+-------------------------------------------
+
+1.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD
+
+- New function dav1d_get_frame_delay to query the decoder frame delay
+- Numerous fixes for strict conformity to the specs and samples
+- NEON and AVX-512 misc fixes and improvements
+- Partial AVX2 12bpc transform implementations
+- AVX-512 high bit-depth cdef_filter, loopfilter, itx
+- NEON z1/z3 optimization for 8bpc
+- SSSE3 z1 optimization for 8bpc
+
+ "From VideoLAN with love"
+
+
+Changes for 1.0.0 'Peregrine Falcon':
+-------------------------------------
+
+1.0.0 is a major release of dav1d, adding important features and bug fixes.
+
+It notably changes, in an important way, the way threading works, by adding
+an automatic thread management.
+
+It also adds support for AVX-512 acceleration, and adds speedups to existing x86
+code (from SSE2 to AVX2).
+
+1.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call
+to get information of which frame failed to decode, in error cases.
+
+Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning
+of the project to have a proper release.
+
+                                     .''.
+         .''.      .        *''*    :_\/_:     .
+        :_\/_:   _\(/_  .:.*_\/_*   : /\ :  .'.:.'.
+    .''.: /\ :   ./)\   ':'* /\ * :  '..'.  -=:o:=-
+   :_\/_:'.:::.    ' *''*    * '.\'/.' _\(/_'.':'.'
+   : /\ : :::::     *_\/_*     -= o =-  /)\    '  *
+    '..'  ':::'     * /\ *     .'/.\'.   '
+        *            *..*         :
+          *                       :
+          *         1.0.0
+
+
+
+Changes for 0.9.2 'Golden Eagle':
+---------------------------------
+
+0.9.2 is a small update of dav1d on the 0.9.x branch:
+ - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes
+ - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b
+ - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b
+ - ARM NEON optimizations for FilmGrain Gen_grain functions
+ - Optimizations for splat_mv in SSE2/AVX2 and NEON
+ - x86: SGR improvements for SSSE3 CPUs
+ - x86: AVX2 optimizations for cfl_ac
+
+
+Changes for 0.9.1 'Golden Eagle':
+---------------------------------
+
+0.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3:
+ - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge),
+   prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener,
+   sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors
+ - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32
+ - Fixes for filmgrain on ARM
+ - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4
+ - Misc improvements on SSE2, SSE4
+
+
+Changes for 0.9.0 'Golden Eagle':
+---------------------------------
+
+0.9.0 is a major version of dav1d, adding notably 10b acceleration on x64.
+
+Details:
+ - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide
+   a large boost for high-bitdepth decoding on modern x86 computers and servers.
+ - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit)
+ - New API to signal events happening during the decoding process
+
+
+Changes for 0.8.2 'Eurasian Hobby':
+-----------------------------------
+
+0.8.2 is a middle-size update of the 0.8.0 branch:
+ - ARM32 optimizations for ipred and itx in 10/12bits,
+   completing the 10b/12b work on ARM64 and ARM32
+ - Give the post-filters their own threads
+ - ARM64: rewrite the wiener functions
+ - Speed up coefficient decoding, 0.5%-3% global decoding gain
+ - x86 optimizations for CDEF_filter and wiener in 10/12bit
+ - x86: rewrite the SGR AVX2 asm
+ - x86: improve msac speed on SSE2+ machines
+ - ARM32: improve speed of ipred and warp
+ - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
+ - ARM32/64: improve speed of looprestoration
+ - Add seeking, pausing to the player
+ - Update the player for rendering of 10b/12b
+ - Misc speed improvements and fixes on all platforms
+ - Add a xxh3 muxer in the dav1d application
+
+
+Changes for 0.8.1 'Eurasian Hobby':
+-----------------------------------
+
+0.8.1 is a minor update on 0.8.0:
+ - Keep references to buffers valid after dav1d_close(). Fixes a regression
+   caused by the picture buffer pool added in 0.8.0.
+ - ARM32 optimizations for 10bit bitdepth for SGR
+ - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
+ - ARM64 optimizations for 10bit bitdepth for SGR
+ - x86 optimizations for wiener in SSE2/SSSE3/AVX2
+
+
+Changes for 0.8.0 'Eurasian Hobby':
+-----------------------------------
+
+0.8.0 is a major update for dav1d:
+ - Improve the performance by using a picture buffer pool;
+   The improvements can reach 10% on some cases on Windows.
+ - Support for Apple ARM Silicon
+ - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
+ - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
+   put/prep 8tap/bilin, wiener and CDEF filters
+ - ARM64 optimizations for cfl_ac 444 for all bitdepths
+ - x86 optimizations for MC 8-tap, mc_scaled in AVX2
+ - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
+
+
+Changes for 0.7.1 'Frigatebird':
+------------------------------
+
+0.7.1 is a minor update on 0.7.0:
+ - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
+ - SSE2 optimizations for prep_bilin and prep_8tap
+ - AVX2 optimizations for MC scaled
+ - Fix a clamping issue in motion vector projection
+ - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
+ - Improvements on the dav1dplay utility player to support resizing
+
+
+Changes for 0.7.0 'Frigatebird':
+------------------------------
+
+0.7.0 is a major release for dav1d:
+ - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
+ - 10b/12b ARM64 optimizations are mostly complete:
+   - ipred (paeth, smooth, dc, pal, filter, cfl)
+   - itxfm (only 10b)
+ - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
+ - AVX2 for cfl4:4:4
+ - AVX-512 CDEF filter
+ - ARM64 8b improvements for cfl_ac and itxfm
+ - ARM64 implementation for emu_edge in 8b/10b/12b
+ - ARM32 implementation for emu_edge in 8b
+ - Improvements on the dav1dplay utility player to support 10 bit,
+   non-4:2:0 pixel formats and film grain on the GPU
+
+
+Changes for 0.6.0 'Gyrfalcon':
+------------------------------
+
+0.6.0 is a major release for dav1d:
+ - New ARM64 optimizations for the 10/12bit depth:
+    - mc_avg, mc_w_avg, mc_mask
+    - mc_put/mc_prep 8tap/bilin
+    - mc_warp_8x8
+    - mc_w_mask
+    - mc_blend
+    - wiener
+    - SGR
+    - loopfilter
+    - cdef
+ - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
+ - New SSSE3 optimizations for film grain
+ - New AVX2 optimizations for msac_adapt16
+ - Fix rare mismatches against the reference decoder, notably because of clipping
+ - Improvements on ARM64 on msac, cdef and looprestoration optimizations
+ - Improvements on AVX2 optimizations for cdef_filter
+ - Improvements in the C version for itxfm, cdef_filter
+
+
+Changes for 0.5.2 'Asiatic Cheetah':
+------------------------------------
+
+0.5.2 is a small release improving speed for ARM32 and adding minor features:
+ - ARM32 optimizations for loopfilter, ipred_dc|h|v
+ - Add section-5 raw OBU demuxer
+ - Improve the speed by reducing the L2 cache collisions
+ - Fix minor issues
+
+
+Changes for 0.5.1 'Asiatic Cheetah':
+------------------------------------
+
+0.5.1 is a small release improving speeds and fixing minor issues
+compared to 0.5.0:
+ - SSE2 optimizations for CDEF, wiener and warp_affine
+ - NEON optimizations for SGR on ARM32
+ - Fix mismatch issue in x86 asm in inverse identity transforms
+ - Fix build issue in ARM64 assembly if debug info was enabled
+ - Add a workaround for Xcode 11 -fstack-check bug
+
+
+Changes for 0.5.0 'Asiatic Cheetah':
+------------------------------------
+
+0.5.0 is a medium release fixing regressions and minor issues,
+and improving speed significantly:
+ - Export ITU T.35 metadata
+ - Speed improvements on blend_ on ARM
+ - Speed improvements on decode_coef and MSAC
+ - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
+ - NEON optimizations for CDEF and warp on ARM32
+ - SSE2 optimizations for MSAC hi_tok decoding
+ - SSSE3 optimizations for deblocking loopfilters and warp_affine
+ - AVX2 optimizations for film grain and ipred_z2
+ - SSE4 optimizations for warp_affine
+ - VSX optimizations for wiener
+ - Fix inverse transform overflows in x86 and NEON asm
+ - Fix integer overflows with large frames
+ - Improve film grain generation to match reference code
+ - Improve compatibility with older binutils for ARM
+ - More advanced Player example in tools
+
+
+Changes for 0.4.0 'Cheetah':
+----------------------------
+
+ - Fix playback with unknown OBUs
+ - Add an option to limit the maximum frame size
+ - SSE2 and ARM64 optimizations for MSAC
+ - Improve speed on 32bits systems
+ - Optimization in obmc blend
+ - Reduce RAM usage significantly
+ - The initial PPC SIMD code, cdef_filter
+ - NEON optimizations for blend functions on ARM
+ - NEON optimizations for w_mask functions on ARM
+ - NEON optimizations for inverse transforms on ARM64
+ - VSX optimizations for CDEF filter
+ - Improve handling of malloc failures
+ - Simple Player example in tools
+
+
+Changes for 0.3.1 'Sailfish':
+------------------------------
+
+ - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
+ - Reduce binary size, notably on Windows
+ - SSSE3 optimizations for ipred_filter
+ - ARM optimizations for MSAC
+
+
+Changes for 0.3.0 'Sailfish':
+------------------------------
+
+This is the final release for the numerous speed improvements of 0.3.0-rc.
+It mostly:
+ - Fixes an annoying crash on SSSE3 that happened in the itx functions
+
+
+Changes for 0.2.2 (0.3.0-rc) 'Antelope':
+-----------------------------
+
+ - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
+   The impact is important on SSSE3, SSE4 and AVX2 cpus
+ - SSSE3 optimizations for all blocks size in itx
+ - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
+ - Speed improvements on CDEF for SSE4 CPUs
+ - NEON optimizations for SGR and loop filter
+ - Minor crashes, improvements and build changes
+
+
+Changes for 0.2.1 'Antelope':
+----------------------------
+
+ - SSSE3 optimization for cdef_dir
+ - AVX2 improvements of the existing CDEF optimizations
+ - NEON improvements of the existing CDEF and wiener optimizations
+ - Clarification about the numbering/versionning scheme
+
+
+Changes for 0.2.0 'Antelope':
+----------------------------
+
+ - ARM64 and ARM optimizations using NEON instructions
+ - SSSE3 optimizations for both 32 and 64bits
+ - More AVX2 assembly, reaching almost completion
+ - Fix installation of includes
+ - Rewrite inverse transforms to avoid overflows
+ - Snap packaging for Linux
+ - Updated API (ABI and API break)
+ - Fixes for un-decodable samples
+
+
+Changes for 0.1.0 'Gazelle':
+----------------------------
+
+Initial release of dav1d, the fast and small AV1 decoder.
+ - Support for all features of the AV1 bitstream
+ - Support for all bitdepth, 8, 10 and 12bits
+ - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
+ - Full acceleration for AVX2 64bits processors, making it the fastest decoder
+ - Partial acceleration for SSSE3 processors
+ - Partial acceleration for NEON processors