# Test vectors for MS-XCA [de-]compression There are currently two supported variants of the Xpress Compression Algorithm, "Plain LZ77" and "LZ77 + Huffman". For each we have two directories of files compressed on Windows, corresponding to the two compression levels that Windows offers. The subdirectories are ./decompressed - test files to compress with .decomp extension. ./compressed-huffman - LZ77+Huffman compressed, with .lzhuff extension. ./compressed-more-huffman - LZ77+Huffman compressed, with .lzhuff extension. ./compressed-plain - Plain LZ77 compressed, with .lzplain extension. ./compressed-more-plain - Plain LZ77 compressed, with .lzplain extension. where the more-compressed-* versions have the files that Windows put more effort into compressing (largely in vain -- they are similar in size). Windows probably does not use this more effortful compression in network protocols, but these files must be decompressible. The compressed files were made using the Windows Compression API, which uses the same underlying code as MS-XCA, but which puts some annoying hurdles in the way. In particular, it won't perform LZ77+Huffman compression on any file smaller than 300 bytes. The relationship between the two is covered in various messages in https://lists.samba.org/archive/cifs-protocol/2022-October/ https://lists.samba.org/archive/cifs-protocol/2022-November/ To recreate these files or add more, use lib/compression/tests/scripts/generate-windows-test-vectors.c under Cygwin or MSYS2. This file is also in the decompressed directory. Some of the decompressed files were found via fuzzing, some are designed to test one aspect or another of the format, while others are public domain texts. These are used in compression and decompression tests. - For decompression tests, we need the decompressed versions to compare against. - For compression tests, we do not assert that the compressed file is identical to the Windows compressed file. Exact equality is not expected by MS-XCA, which leaves room for implementation tricks, but the size of the compressed file allows us to make ballpark assertions about expected compression ratios.