diff options
-rw-r--r-- | AUTHORS | 7 | ||||
-rw-r--r-- | COPYING | 17 | ||||
-rw-r--r-- | COPYING.GPL | 338 | ||||
-rw-r--r-- | ChangeLog | 255 | ||||
-rw-r--r-- | INSTALL | 80 | ||||
-rw-r--r-- | Makefile.in | 202 | ||||
-rw-r--r-- | NEWS | 15 | ||||
-rw-r--r-- | README | 103 | ||||
-rw-r--r-- | bbexample.c | 367 | ||||
-rw-r--r-- | carg_parser.c | 319 | ||||
-rw-r--r-- | carg_parser.h | 97 | ||||
-rw-r--r-- | cbuffer.c | 143 | ||||
-rwxr-xr-x | configure | 239 | ||||
-rw-r--r-- | decoder.c | 145 | ||||
-rw-r--r-- | decoder.h | 463 | ||||
-rw-r--r-- | doc/lzlib.info | 1323 | ||||
-rw-r--r-- | doc/lzlib.texi | 1395 | ||||
-rw-r--r-- | doc/minilzip.1 | 134 | ||||
-rw-r--r-- | encoder.c | 586 | ||||
-rw-r--r-- | encoder.h | 326 | ||||
-rw-r--r-- | encoder_base.c | 196 | ||||
-rw-r--r-- | encoder_base.h | 612 | ||||
-rw-r--r-- | fast_encoder.c | 175 | ||||
-rw-r--r-- | fast_encoder.h | 70 | ||||
-rw-r--r-- | ffexample.c | 300 | ||||
-rw-r--r-- | lzcheck.c | 367 | ||||
-rw-r--r-- | lzip.h | 298 | ||||
-rw-r--r-- | lzlib.c | 601 | ||||
-rw-r--r-- | lzlib.h | 110 | ||||
-rw-r--r-- | minilzip.c | 1290 | ||||
-rwxr-xr-x | testsuite/check.sh | 444 | ||||
-rw-r--r-- | testsuite/fox.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_bcrc.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_crc0.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_das46.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_de20.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_lf | 9 | ||||
-rw-r--r-- | testsuite/fox_mes81.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_s11.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_v2.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/test.txt | 676 | ||||
-rw-r--r-- | testsuite/test.txt.lz | bin | 0 -> 7376 bytes | |||
-rw-r--r-- | testsuite/test_em.txt.lz | bin | 0 -> 14024 bytes | |||
-rw-r--r-- | testsuite/test_sync.lz | bin | 0 -> 7568 bytes |
44 files changed, 11702 insertions, 0 deletions
@@ -0,0 +1,7 @@ +Lzlib was written by Antonio Diaz Diaz. + +The ideas embodied in lzlib are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). @@ -0,0 +1,17 @@ + Lzlib - Compression library for the lzip format + Copyright (C) Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. diff --git a/COPYING.GPL b/COPYING.GPL new file mode 100644 index 0000000..4ad17ae --- /dev/null +++ b/COPYING.GPL @@ -0,0 +1,338 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) <year> <name of author> + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + <signature of Ty Coon>, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. diff --git a/ChangeLog b/ChangeLog new file mode 100644 index 0000000..8d7da96 --- /dev/null +++ b/ChangeLog @@ -0,0 +1,255 @@ +2022-01-23 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.13 released. + * Set variables AR and ARFLAGS from configure. + (Reported by Hoël Bézier). + * main.c: Rename to minilzip.c. + * minilzip.c (getnum): Show option name and valid range if error. + (check_lib): Check that LZ_API_VERSION and LZ_version_string match. + * Improve several descriptions in manual, '--help', and man page. + * lzlib.texi: Change GNU Texinfo category to 'Compression'. + (Reported by Alfred M. Szmidt). + +2021-01-02 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.12 released. + * lzlib.h: Define LZ_API_VERSION as 1000 * major + minor. 1.12 = 1012. + This change does not affect the soversion. + * lzlib.h, lzlib.c: New function LZ_api_version. + * LZd_try_verify_trailer: Return 2 if EOF at trailer or EOS marker. + * Decompression speed has been slightly increased. + * decoder.h: Increase 'rd_min_available_bytes' from 8 to 10. + * encoder_base.c (LZeb_try_sync_flush): + Compensate for the increase in 'rd_min_available_bytes'. + * main.c (do_decompress): Fix false report about library stall. + * main.c: New option '--check-lib'. + * main.c (main): Report an error if a file name is empty. + Make '-o' behave like '-c', but writing to file instead of stdout. + Make '-c' and '-o' check whether the output is a terminal only once. + Do not open output if input is a terminal. + Replace 'decompressed', 'compressed' with 'out', 'in' in output. + Set a valid invocation_name even if argc == 0. + * lzlib.texi: Document the new way of verifying the library version. + Document that 'LZ_(de)compress_close' and 'LZ_(de)compress_errno' + can be called with a null argument. + Document that sync flush marker is not allowed in lzip files. + Document the consequences of not calling 'LZ_decompress_finish'. + Document that 'LZ_decompress_read' returns at least once per member. + Document that 'LZ_(de)compress_read' can be called with a null + buffer pointer argument. + Real code examples for common uses have been added to the tutorial. + * bbexample.c: Don't use 'LZ_(de)compress_write_size'. + * lzcheck.c: New options '-s' (sync) and '-m' (member by member). + Test member by member without 'LZ_decompress_finish'. + * ffexample.c: New file containing example functions for file-to-file + compression/decompression. + * Document extraction from tar.lz in '--help' output and man page. + * Makefile.in: 'install-bin' no longer installs the man page. + New targets 'install-bin-compress' and 'install-bin-strip-compress'. + * testsuite: Add 9 new test files. + +2019-01-02 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.11 released. + * Rename File_* to Lzip_*. + * LZ_decompress_read: Don't return error until all data is read. + * decoder.c (LZd_decode_member): Decode truncated data until EOF. + * cbuffer.c (Cb_read_data): Allow a null buffer pointer. + * main.c: Don't allow mixing different operations (-d and -t). + * main.c: Check return value of close( infd ). + * main.c: Compile on DOS with DJGPP. + * lzlib.texi: Improve descriptions of '-0..-9', '-m', and '-s'. + Document that 'LZ_(de)compress_finish' can be called repeatedly. + * configure: Accept appending to CFLAGS; 'CFLAGS+=OPTIONS'. + * Makefile.in: Rename targets 'install-bin*' to 'install-lib*'. + * Makefile.in: Targets 'install-bin*' now install minilzip. + * INSTALL: Document use of CFLAGS+='-D __USE_MINGW_ANSI_STDIO'. + +2018-02-07 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.10 released. + * LZ_compress_finish now adjusts dictionary size for each member. + (Older versions can adjust dictionary size only once). + * lzlib.c (LZ_decompress_read): Detect corrupt header with HD=3. + * main.c: New option '--loose-trailing'. + * main.c (main): Option '-S, --volume-size' now keeps input files. + * main.c: Replace 'bits/byte' with inverse compression ratio. + * main.c: Show final diagnostic when testing multiple files. + * main.c: Do not add a second .lz extension to the arg of -o. + * main.c: Show dictionary size at verbosity level 4 (-vvvv). + * lzlib.texi: New chapter 'Invoking minilzip'. + +2017-04-11 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.9 released. + * Compression time of option '-0' has been reduced by 3%. + * Compression time of options -1 to -9 has been reduced by 1%. + * Decompression time has been reduced by 3%. + * main.c: Continue testing if any input file is a terminal. + * Change the license of the library to "2-clause BSD". + +2016-05-17 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.8 released. + * lzlib.h: Define LZ_API_VERSION to 1. + * lzlib.c (LZ_decompress_sync_to_member): Add skipped size to in_size. + * decoder.c (LZd_verify_trailer): Remove test of final code. + * main.c: New option '-a, --trailing-error'. + * main.c (main): Delete '--output' file if infd is a terminal. + * main.c (main): Don't use stdin more than once. + * configure: Avoid warning on some shells when testing for gcc. + * Makefile.in: Detect the existence of install-info. + * check.sh: A POSIX shell is required to run the tests. + * check.sh: Don't check error messages. + +2015-07-08 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.7 released. + * Port fast encoder and option '-0' from lzip. + * If open-->write-->finish, produce same dictionary size as lzip. + * Makefile.in: New targets 'install*-compress'. + +2014-08-27 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.6 released. + * Compression ratio of option '-9' has been slightly increased. + * configure: New options '--disable-static' and '--disable-ldconfig'. + * Makefile.in: Ignore errors from ldconfig. + * Makefile.in: Use 'CFLAGS' in every invocation of 'CC'. + * main.c (close_and_set_permissions): Behave like 'cp -p'. + * lzlib.texinfo: Rename to lzlib.texi. + * Change license to "GPL version 2 or later with link exception". + +2013-09-15 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.5 released. + * Remove decompression support for version 0 files. + * The LZ_compress_sync_flush mechanism has been fixed (again). + * Minor fixes. + +2013-05-28 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.4 released. + * Multi-step trials have been implemented. + * Compression ratio has been slightly increased. + * Compression time has been reduced by 8%. + * Decompression time has been reduced by 7%. + * lzlib.h: Change 'long long' values to 'unsigned long long'. + * encoder.c (Mf_init): Reduce minimum buffer size to 64KiB. + * lzlib.c (LZ_decompress_read): Tell LZ_header_error from + LZ_unexpected_eof the same way as lzip does. + * Makefile.in: New targets 'install-as-lzip' and 'install-bin'. + * main.c: Use 'setmode' instead of '_setmode' on Windows and OS/2. + * main.c: Define 'strtoull' to 'strtoul' on Windows. + +2012-02-29 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 1.3 released. + * Translated to C from the C++ source of lzlib 1.2. + * configure: Rename 'datadir' to 'datarootdir'. + +2011-10-25 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 1.2 released. + * encoder.h (Lee_update_prices): Update high length symbol prices + independently of the value of 'pos_state'. This gives better + compression for large values of '--match-length' without being + slower. + * encoder.h, encoder.cc: Optimize pair price calculations, reducing + compression time for large values of '--match-length' by up to 6%. + * main.cc: New option '-F, --recompress'. + * Makefile.in: 'make install' no longer tries to run + '/sbin/ldconfig' on systems lacking it. + +2011-01-03 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 1.1 released. + * Compression time has been reduced by 2%. + * All declarations not belonging to the API have been + encapsulated in the namespace 'Lzlib'. + * testsuite: Rename 'test1' to 'test.txt'. New tests. + * Match length limits set by options -1 to -9 of minilzip have + been changed to match those of lzip 1.11. + * main.cc: Set stdin/stdout in binary mode on OS2. + * bbexample.cc: New file containing example functions for + buffer-to-buffer compression/decompression. + +2010-05-08 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 1.0 released. + * New functions LZ_decompress_member_version, LZ_decompress_data_crc, + LZ_decompress_member_finished, and LZ_decompress_dictionary_size. + * Variables declared 'extern' have been encapsulated in a namespace. + * main.cc: Fix warning about fchown's return value being ignored. + * decoder.h: Integrate Input_buffer in Range_decoder. + +2010-02-10 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.9 released. + * Compression time has been reduced by 8%. + * main.cc: New constant 'o_binary'. + +2010-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.8 released. + * New functions LZ_decompress_reset, LZ_decompress_sync_to_member, + LZ_decompress_write_size, and LZ_strerror. + * lzlib.h: API change. Replace 'enum' with functions for values of + dictionary size limits to make interface names consistent. + * lzlib.h: API change. Rename 'LZ_errno' to 'LZ_Errno'. + * lzlib.h: API change. Replace 'void *' with 'struct LZ_Encoder *' + and 'struct LZ_Decoder *' to make interface type safe. + * decoder.cc: A truncated member trailer is now correctly detected. + * encoder.cc: Matchfinder::reset now also clears at_stream_end_, + allowing LZ_compress_restart_member to restart a finished stream. + * lzlib.cc: Accept only query or close operations after a fatal + error has occurred. + * The shared version of lzlib is no longer built by default. + * check.sh: Use 'test1' instead of 'COPYING' for testing. + +2009-10-20 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.7 released. + * Compression time has been reduced by 4%. + * check.sh: Remove -9 to run in less than 256MiB of RAM. + * lzcheck.cc: Read files of any size up to 2^63 bytes. + +2009-09-02 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.6 released. + * The LZ_compress_sync_flush mechanism has been fixed. + +2009-07-03 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.5 released. + * Decompression speed has been improved. + * main.cc (signal_handler): Declare as 'extern "C"'. + +2009-06-03 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.4 released. + * New functions LZ_compress_sync_flush and LZ_compress_write_size. + * Decompression speed has been improved. + * lzlib.texinfo: New chapter 'Buffering'. + +2009-05-03 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.3 released. + * Lzlib is now built as a shared library (in addition to static). + +2009-04-26 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.2 released. + * Fix a segfault when decompressing trailing garbage. + * Fix a false positive in LZ_(de)compress_finished. + +2009-04-21 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.1 released. + + +Copyright (C) 2009-2022 Antonio Diaz Diaz. + +This file is a collection of facts, and thus it is not copyrightable, +but just in case, you have unlimited permission to copy, distribute, and +modify it. @@ -0,0 +1,80 @@ +Requirements +------------ +You will need a C99 compiler. (gcc 3.3.6 or newer is recommended). +I use gcc 6.1.0 and 3.3.6, but the code should compile with any standards +compliant compiler. +Gcc is available at http://gcc.gnu.org. + +The operating system must allow signal handlers read access to objects with +static storage duration so that the cleanup handler for Control-C can delete +the partial output file. (This requirement is for minilzip only). + + +Procedure +--------- +1. Unpack the archive if you have not done so already: + + tar -xf lzlib[version].tar.lz +or + lzip -cd lzlib[version].tar.lz | tar -xf - + +This creates the directory ./lzlib[version] containing the source from +the main archive. + +2. Change to lzlib directory and run configure. + (Try 'configure --help' for usage instructions). + + cd lzlib[version] + ./configure + + If you are compiling on MinGW, use: + + ./configure CFLAGS+='-D __USE_MINGW_ANSI_STDIO' + +3. Run make. + + make + +4. Optionally, type 'make check' to run the tests that come with lzlib. + +5. Type 'make install' to install the library and any data files and + documentation. (You may need to run ldconfig also). + + Or type 'make install-compress', which additionally compresses the + info manual after installation. + (Installing compressed docs may become the default in the future). + + You can install only the library or the info manual by typing + 'make install-lib' or 'make install-info' respectively. + + 'make install-bin install-man' installs the program minilzip and its man + page. 'install-bin' installs a shared minilzip if the shared library has + been configured. Else it installs a static minilzip. + 'make install-bin-compress' additionally compresses the man page after + installation. + + 'make install-as-lzip' runs 'make install-bin' and then links minilzip to + the name 'lzip'. + + +Another way +----------- +You can also compile lzlib into a separate directory. +To do this, you must use a version of 'make' that supports the variable +'VPATH', such as GNU 'make'. 'cd' to the directory where you want the +object files and executables to go and run the 'configure' script. +'configure' automatically checks for the source code in '.', in '..', and +in the directory that 'configure' is in. + +'configure' recognizes the option '--srcdir=DIR' to control where to +look for the sources. Usually 'configure' can determine that directory +automatically. + +After running 'configure', you can run 'make' and 'make install' as +explained above. + + +Copyright (C) 2009-2022 Antonio Diaz Diaz. + +This file is free documentation: you have unlimited permission to copy, +distribute, and modify it. diff --git a/Makefile.in b/Makefile.in new file mode 100644 index 0000000..81b404b --- /dev/null +++ b/Makefile.in @@ -0,0 +1,202 @@ + +DISTNAME = $(pkgname)-$(pkgversion) +INSTALL = install +INSTALL_PROGRAM = $(INSTALL) -m 755 +INSTALL_DATA = $(INSTALL) -m 644 +INSTALL_DIR = $(INSTALL) -d -m 755 +LDCONFIG = /sbin/ldconfig +SHELL = /bin/sh +CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1 + +objs = carg_parser.o minilzip.o + + +.PHONY : all install install-bin install-info install-man \ + install-strip install-compress install-strip-compress \ + install-bin-strip install-info-compress install-man-compress \ + install-bin-compress install-bin-strip-compress \ + install-lib install-lib-strip \ + install-as-lzip \ + uninstall uninstall-bin uninstall-lib uninstall-info uninstall-man \ + doc info man check dist clean distclean + +all : $(progname_static) $(progname_shared) + +lib$(libname).a : lzlib.o + $(AR) $(ARFLAGS) $@ $< + +lib$(libname).so.$(pkgversion) : lzlib_sh.o + $(CC) $(CFLAGS) $(LDFLAGS) -fpic -fPIC -shared -Wl,--soname=lib$(libname).so.$(soversion) -o $@ $< + +$(progname) : $(objs) lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(objs) lib$(libname).a + +$(progname)_shared : $(objs) lib$(libname).so.$(pkgversion) + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(objs) lib$(libname).so.$(pkgversion) + +bbexample : bbexample.o lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ bbexample.o lib$(libname).a + +ffexample : ffexample.o lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ ffexample.o lib$(libname).a + +lzcheck : lzcheck.o lib$(libname).a + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ lzcheck.o lib$(libname).a + +minilzip.o : minilzip.c + $(CC) $(CPPFLAGS) $(CFLAGS) -DPROGVERSION=\"$(pkgversion)\" -c -o $@ $< + +lzlib_sh.o : lzlib.c + $(CC) $(CPPFLAGS) $(CFLAGS) -fpic -fPIC -c -o $@ $< + +%.o : %.c + $(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@ $< + +lzdeps = lzlib.h lzip.h cbuffer.c decoder.h decoder.c encoder_base.h \ + encoder_base.c encoder.h encoder.c fast_encoder.h fast_encoder.c + +$(objs) : Makefile +carg_parser.o : carg_parser.h +lzlib.o : Makefile $(lzdeps) +lzlib_sh.o : Makefile $(lzdeps) +minilzip.o : carg_parser.h lzlib.h +bbexample.o : Makefile lzlib.h +ffexample.o : Makefile lzlib.h +lzcheck.o : Makefile lzlib.h + + +doc : info man + +info : $(VPATH)/doc/$(pkgname).info + +$(VPATH)/doc/$(pkgname).info : $(VPATH)/doc/$(pkgname).texi + cd $(VPATH)/doc && makeinfo $(pkgname).texi + +man : $(VPATH)/doc/$(progname).1 + +$(VPATH)/doc/$(progname).1 : $(progname) + help2man -n 'reduces the size of files' -o $@ --info-page=$(pkgname) ./$(progname) + +Makefile : $(VPATH)/configure $(VPATH)/Makefile.in + ./config.status + +check : $(progname) bbexample ffexample lzcheck + @$(VPATH)/testsuite/check.sh $(VPATH)/testsuite $(pkgversion) + +install : install-lib install-info +install-strip : install-lib-strip install-info +install-compress : install-lib install-info-compress +install-strip-compress : install-lib-strip install-info-compress +install-bin-compress : install-bin install-man-compress +install-bin-strip-compress : install-bin-strip install-man-compress + +install-bin : all + if [ ! -d "$(DESTDIR)$(bindir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(bindir)" ; fi + $(INSTALL_PROGRAM) ./$(progname_lzip) "$(DESTDIR)$(bindir)/$(progname)" + +install-bin-strip : all + $(MAKE) INSTALL_PROGRAM='$(INSTALL_PROGRAM) -s' install-bin + +install-lib : all + if [ ! -d "$(DESTDIR)$(includedir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(includedir)" ; fi + if [ ! -d "$(DESTDIR)$(libdir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(libdir)" ; fi + $(INSTALL_DATA) $(VPATH)/$(libname)lib.h "$(DESTDIR)$(includedir)/$(libname)lib.h" + if [ -n "$(progname_static)" ] ; then \ + $(INSTALL_DATA) ./lib$(libname).a "$(DESTDIR)$(libdir)/lib$(libname).a" ; \ + fi + if [ -n "$(progname_shared)" ] ; then \ + $(INSTALL_PROGRAM) ./lib$(libname).so.$(pkgversion) "$(DESTDIR)$(libdir)/lib$(libname).so.$(pkgversion)" ; \ + if [ -e "$(DESTDIR)$(libdir)/lib$(libname).so.$(soversion)" ] ; then \ + run_ldconfig=no ; \ + else run_ldconfig=yes ; \ + fi ; \ + rm -f "$(DESTDIR)$(libdir)/lib$(libname).so" ; \ + rm -f "$(DESTDIR)$(libdir)/lib$(libname).so.$(soversion)" ; \ + cd "$(DESTDIR)$(libdir)" && ln -s lib$(libname).so.$(pkgversion) lib$(libname).so ; \ + cd "$(DESTDIR)$(libdir)" && ln -s lib$(libname).so.$(pkgversion) lib$(libname).so.$(soversion) ; \ + if [ "${disable_ldconfig}" != yes ] && [ $${run_ldconfig} = yes ] && \ + [ -x "$(LDCONFIG)" ] ; then "$(LDCONFIG)" -n "$(DESTDIR)$(libdir)" || true ; fi ; \ + fi + +install-lib-strip : all + $(MAKE) INSTALL_PROGRAM='$(INSTALL_PROGRAM) -s' install-lib + +install-info : + if [ ! -d "$(DESTDIR)$(infodir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(infodir)" ; fi + -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"* + $(INSTALL_DATA) $(VPATH)/doc/$(pkgname).info "$(DESTDIR)$(infodir)/$(pkgname).info" + -if $(CAN_RUN_INSTALLINFO) ; then \ + install-info --info-dir="$(DESTDIR)$(infodir)" "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ + fi + +install-info-compress : install-info + lzip -v -9 "$(DESTDIR)$(infodir)/$(pkgname).info" + +install-man : + if [ ! -d "$(DESTDIR)$(mandir)/man1" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1" ; fi + -rm -f "$(DESTDIR)$(mandir)/man1/$(progname).1"* + $(INSTALL_DATA) $(VPATH)/doc/$(progname).1 "$(DESTDIR)$(mandir)/man1/$(progname).1" + +install-man-compress : install-man + lzip -v -9 "$(DESTDIR)$(mandir)/man1/$(progname).1" + +install-as-lzip : install-bin + -rm -f "$(DESTDIR)$(bindir)/lzip" + cd "$(DESTDIR)$(bindir)" && ln -s $(progname) lzip + +uninstall : uninstall-info uninstall-lib + +uninstall-bin : + -rm -f "$(DESTDIR)$(bindir)/$(progname)" + +uninstall-lib : + -rm -f "$(DESTDIR)$(includedir)/$(libname)lib.h" + -rm -f "$(DESTDIR)$(libdir)/lib$(libname).a" + -rm -f "$(DESTDIR)$(libdir)/lib$(libname).so" + -rm -f "$(DESTDIR)$(libdir)/lib$(libname).so.$(soversion)" + -rm -f "$(DESTDIR)$(libdir)/lib$(libname).so.$(pkgversion)" + +uninstall-info : + -if $(CAN_RUN_INSTALLINFO) ; then \ + install-info --info-dir="$(DESTDIR)$(infodir)" --remove "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ + fi + -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"* + +uninstall-man : + -rm -f "$(DESTDIR)$(mandir)/man1/$(progname).1"* + +dist : doc + ln -sf $(VPATH) $(DISTNAME) + tar -Hustar --owner=root --group=root -cvf $(DISTNAME).tar \ + $(DISTNAME)/AUTHORS \ + $(DISTNAME)/COPYING \ + $(DISTNAME)/COPYING.GPL \ + $(DISTNAME)/ChangeLog \ + $(DISTNAME)/INSTALL \ + $(DISTNAME)/Makefile.in \ + $(DISTNAME)/NEWS \ + $(DISTNAME)/README \ + $(DISTNAME)/configure \ + $(DISTNAME)/doc/$(progname).1 \ + $(DISTNAME)/doc/$(pkgname).info \ + $(DISTNAME)/doc/$(pkgname).texi \ + $(DISTNAME)/*.h \ + $(DISTNAME)/*.c \ + $(DISTNAME)/testsuite/check.sh \ + $(DISTNAME)/testsuite/test.txt \ + $(DISTNAME)/testsuite/fox_lf \ + $(DISTNAME)/testsuite/fox.lz \ + $(DISTNAME)/testsuite/fox_*.lz \ + $(DISTNAME)/testsuite/test_sync.lz \ + $(DISTNAME)/testsuite/test.txt.lz \ + $(DISTNAME)/testsuite/test_em.txt.lz + rm -f $(DISTNAME) + lzip -v -9 $(DISTNAME).tar + +clean : + -rm -f $(progname) $(objs) lzlib.o lib$(libname).a + -rm -f $(progname)_shared lzlib_sh.o lib$(libname).so* + -rm -f bbexample bbexample.o ffexample ffexample.o lzcheck lzcheck.o + +distclean : clean + -rm -f Makefile config.status *.tar *.tar.lz @@ -0,0 +1,15 @@ +Changes in version 1.13: + +The variables AR and ARFLAGS can now be set from configure. (Before you +needed to run 'make AR=<ar_command>'. (Reported by Hoël Bézier). + +In case of error in a numerical argument to a command line option, minilzip +now shows the name of the option and the range of valid values. + +'minilzip --check-lib' now checks that LZ_API_VERSION and LZ_version_string +match. + +Several descriptions have been improved in manual, '--help', and man page. + +The texinfo category of the manual has been changed from 'Data Compression' +to 'Compression' to match that of gzip. (Reported by Alfred M. Szmidt). @@ -0,0 +1,103 @@ +Description + +Lzlib is a data compression library providing in-memory LZMA compression and +decompression functions, including integrity checking of the decompressed +data. The compressed data format used by the library is the lzip format. +Lzlib is written in C. + +The lzip file format is designed for data sharing and long-term archiving, +taking into account both data integrity and decoder availability: + + * The lzip format provides very safe integrity checking and some data + recovery means. The program lziprecover can repair bit flip errors + (one of the most common forms of data corruption) in lzip files, and + provides data recovery capabilities, including error-checked merging + of damaged copies of a file. + + * The lzip format is as simple as possible (but not simpler). The lzip + manual provides the source code of a simple decompressor along with a + detailed explanation of how it works, so that with the only help of the + lzip manual it would be possible for a digital archaeologist to extract + the data from a lzip file long after quantum computers eventually + render LZMA obsolete. + + * Additionally the lzip reference implementation is copylefted, which + guarantees that it will remain free forever. + +A nice feature of the lzip format is that a corrupt byte is easier to repair +the nearer it is from the beginning of the file. Therefore, with the help of +lziprecover, losing an entire archive just because of a corrupt byte near +the beginning is a thing of the past. + +The functions and variables forming the interface of the compression library +are declared in the file 'lzlib.h'. Usage examples of the library are given +in the files 'bbexample.c', 'ffexample.c', and 'minilzip.c' from the source +distribution. + +All the library functions are thread safe. The library does not install any +signal handler. The decoder checks the consistency of the compressed data, +so the library should never crash even in case of corrupted input. + +Compression/decompression is done by repeatedly calling a couple of +read/write functions until all the data have been processed by the library. +This interface is safer and less error prone than the traditional zlib +interface. + +Compression/decompression is done when the read function is called. This +means the value returned by the position functions will not be updated until +a read call, even if a lot of data are written. If you want the data to be +compressed in advance, just call the read function with a size equal to 0. + +If all the data to be compressed are written in advance, lzlib will +automatically adjust the header of the compressed data to use the largest +dictionary size that does not exceed neither the data size nor the limit +given to 'LZ_compress_open'. This feature reduces the amount of memory +needed for decompression and allows minilzip to produce identical compressed +output as lzip. + +Lzlib will correctly decompress a data stream which is the concatenation of +two or more compressed data streams. The result is the concatenation of the +corresponding decompressed data streams. Integrity testing of concatenated +compressed data streams is also supported. + +Lzlib is able to compress and decompress streams of unlimited size by +automatically creating multimember output. The members so created are large, +about 2 PiB each. + +In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option '-0' of lzip uses the scheme in almost the +simplest way possible; issuing the longest match it can find, or a literal +byte if it can't find a match. Inversely, a much more elaborated way of +finding coding sequences of minimum size than the one currently used by lzip +could be developed, and the resulting sequence could also be coded using the +LZMA coding scheme. + +Lzlib currently implements two variants of the LZMA algorithm: fast (used by +option '-0' of minilzip) and normal (used by all other compression levels). + +The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77/78) and markov models (the +thing used by every compression algorithm that uses a range encoder or +similar order-0 entropy coder as its last stage) with segregation of +contexts according to what the bits are used for. + +The ideas embodied in lzlib are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). + +LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have +been compressed. Decompressed is used to refer to data which have undergone +the process of decompression. + + +Copyright (C) 2009-2022 Antonio Diaz Diaz. + +This file is free documentation: you have unlimited permission to copy, +distribute, and modify it. + +The file Makefile.in is a data file used by configure to produce the +Makefile. It has the same copyright owner and permissions that configure +itself. diff --git a/bbexample.c b/bbexample.c new file mode 100644 index 0000000..074f7ae --- /dev/null +++ b/bbexample.c @@ -0,0 +1,367 @@ +/* Buffer to buffer example - Test program for the library lzlib + Copyright (C) 2010-2022 Antonio Diaz Diaz. + + This program is free software: you have unlimited permission + to copy, distribute, and modify it. + + Usage: bbexample filename + + This program is an example of how buffer-to-buffer + compression/decompression can be implemented using lzlib. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <errno.h> +#include <limits.h> +#include <stdbool.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include "lzlib.h" + +#ifndef min + #define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif + + +/* Return the address of a malloc'd buffer containing the file data and + the file size in '*file_sizep'. + In case of error, return 0 and do not modify '*file_sizep'. +*/ +uint8_t * read_file( const char * const name, long * const file_sizep ) + { + long buffer_size = 1 << 20, file_size; + uint8_t * buffer, * tmp; + FILE * const f = fopen( name, "rb" ); + if( !f ) + { fprintf( stderr, "bbexample: Can't open input file '%s': %s\n", + name, strerror( errno ) ); return 0; } + + buffer = (uint8_t *)malloc( buffer_size ); + if( !buffer ) + { fputs( "bbexample: read_file: Not enough memory.\n", stderr ); + fclose( f ); return 0; } + file_size = fread( buffer, 1, buffer_size, f ); + while( file_size >= buffer_size ) + { + if( buffer_size >= LONG_MAX ) + { + fprintf( stderr, "bbexample: Input file '%s' is too large.\n", name ); + free( buffer ); fclose( f ); return 0; + } + buffer_size = ( buffer_size <= LONG_MAX / 2 ) ? 2 * buffer_size : LONG_MAX; + tmp = (uint8_t *)realloc( buffer, buffer_size ); + if( !tmp ) + { fputs( "bbexample: read_file: Not enough memory.\n", stderr ); + free( buffer ); fclose( f ); return 0; } + buffer = tmp; + file_size += fread( buffer + file_size, 1, buffer_size - file_size, f ); + } + if( ferror( f ) || !feof( f ) ) + { + fprintf( stderr, "bbexample: Error reading file '%s': %s\n", + name, strerror( errno ) ); + free( buffer ); fclose( f ); return 0; + } + fclose( f ); + *file_sizep = file_size; + return buffer; + } + + +/* Compress 'insize' bytes from 'inbuf'. + Return the address of a malloc'd buffer containing the compressed data, + and the size of the data in '*outlenp'. + In case of error, return 0 and do not modify '*outlenp'. +*/ +uint8_t * bbcompressl( const uint8_t * const inbuf, const long insize, + const int level, long * const outlenp ) + { + struct Lzma_options + { + int dictionary_size; /* 4 KiB .. 512 MiB */ + int match_len_limit; /* 5 .. 273 */ + }; + /* Mapping from gzip/bzip2 style 1..9 compression modes + to the corresponding LZMA compression modes. */ + const struct Lzma_options option_mapping[] = + { + { 65535, 16 }, /* -0 (65535,16 chooses fast encoder) */ + { 1 << 20, 5 }, /* -1 */ + { 3 << 19, 6 }, /* -2 */ + { 1 << 21, 8 }, /* -3 */ + { 3 << 20, 12 }, /* -4 */ + { 1 << 22, 20 }, /* -5 */ + { 1 << 23, 36 }, /* -6 */ + { 1 << 24, 68 }, /* -7 */ + { 3 << 23, 132 }, /* -8 */ + { 1 << 25, 273 } }; /* -9 */ + struct Lzma_options encoder_options; + struct LZ_Encoder * encoder; + uint8_t * outbuf; + const long delta_size = ( insize / 4 ) + 64; /* insize may be zero */ + long outsize = delta_size; /* initial outsize */ + long inpos = 0; + long outpos = 0; + bool error = false; + + if( level < 0 || level > 9 ) return 0; + encoder_options = option_mapping[level]; + + if( encoder_options.dictionary_size > insize && level != 0 ) + encoder_options.dictionary_size = insize; /* saves memory */ + if( encoder_options.dictionary_size < LZ_min_dictionary_size() ) + encoder_options.dictionary_size = LZ_min_dictionary_size(); + encoder = LZ_compress_open( encoder_options.dictionary_size, + encoder_options.match_len_limit, INT64_MAX ); + outbuf = (uint8_t *)malloc( outsize ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok || !outbuf ) + { free( outbuf ); LZ_compress_close( encoder ); return 0; } + + while( true ) + { + int ret = LZ_compress_write( encoder, inbuf + inpos, + min( INT_MAX, insize - inpos ) ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, + min( INT_MAX, outsize - outpos ) ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( outpos >= outsize ) + { + uint8_t * tmp; + if( outsize > LONG_MAX - delta_size ) { error = true; break; } + outsize += delta_size; + tmp = (uint8_t *)realloc( outbuf, outsize ); + if( !tmp ) { error = true; break; } + outbuf = tmp; + } + } + + if( LZ_compress_close( encoder ) < 0 ) error = true; + if( error ) { free( outbuf ); return 0; } + *outlenp = outpos; + return outbuf; + } + + +/* Decompress 'insize' bytes from 'inbuf'. + Return the address of a malloc'd buffer containing the decompressed + data, and the size of the data in '*outlenp'. + In case of error, return 0 and do not modify '*outlenp'. +*/ +uint8_t * bbdecompressl( const uint8_t * const inbuf, const long insize, + long * const outlenp ) + { + struct LZ_Decoder * const decoder = LZ_decompress_open(); + const long delta_size = insize; /* insize must be > zero */ + long outsize = delta_size; /* initial outsize */ + uint8_t * outbuf = (uint8_t *)malloc( outsize ); + long inpos = 0; + long outpos = 0; + bool error = false; + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok || !outbuf ) + { free( outbuf ); LZ_decompress_close( decoder ); return 0; } + + while( true ) + { + int ret = LZ_decompress_write( decoder, inbuf + inpos, + min( INT_MAX, insize - inpos ) ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, + min( INT_MAX, outsize - outpos ) ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( outpos >= outsize ) + { + uint8_t * tmp; + if( outsize > LONG_MAX - delta_size ) { error = true; break; } + outsize += delta_size; + tmp = (uint8_t *)realloc( outbuf, outsize ); + if( !tmp ) { error = true; break; } + outbuf = tmp; + } + } + + if( LZ_decompress_close( decoder ) < 0 ) error = true; + if( error ) { free( outbuf ); return 0; } + *outlenp = outpos; + return outbuf; + } + + +/* Test the whole file at all levels. */ +int full_test( const uint8_t * const inbuf, const long insize ) + { + int level; + for( level = 0; level <= 9; ++level ) + { + long midsize = 0, outsize = 0; + uint8_t * outbuf; + uint8_t * midbuf = bbcompressl( inbuf, insize, level, &midsize ); + if( !midbuf ) + { fputs( "bbexample: full_test: Not enough memory or compress error.\n", + stderr ); return 1; } + + outbuf = bbdecompressl( midbuf, midsize, &outsize ); + free( midbuf ); + if( !outbuf ) + { fputs( "bbexample: full_test: Not enough memory or decompress error.\n", + stderr ); return 1; } + + if( insize != outsize || + ( insize > 0 && memcmp( inbuf, outbuf, insize ) != 0 ) ) + { fputs( "bbexample: full_test: Decompressed data differs from original.\n", + stderr ); free( outbuf ); return 1; } + + free( outbuf ); + } + return 0; + } + + +/* Compress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the compressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbcompress( const uint8_t * const inbuf, const int insize, + const int dictionary_size, const int match_len_limit, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + struct LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, INT64_MAX ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { LZ_compress_close( encoder ); return false; } + + while( true ) + { + int ret = LZ_compress_write( encoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_compress_close( encoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +/* Decompress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the decompressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbdecompress( const uint8_t * const inbuf, const int insize, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + struct LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { LZ_decompress_close( decoder ); return false; } + + while( true ) + { + int ret = LZ_decompress_write( decoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_decompress_close( decoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +/* Test at most INT_MAX bytes from the file with buffers of fixed size. */ +int fixed_test( const uint8_t * const inbuf, const int insize ) + { + int dictionary_size = 65535; /* fast encoder */ + int midsize = min( INT_MAX, ( insize / 8 ) * 9LL + 44 ), outsize = insize; + uint8_t * midbuf = (uint8_t *)malloc( midsize ); + uint8_t * outbuf = (uint8_t *)malloc( outsize ); + if( !midbuf || !outbuf ) + { fputs( "bbexample: fixed_test: Not enough memory.\n", stderr ); + free( outbuf ); free( midbuf ); return 1; } + + for( ; dictionary_size <= 8 << 20; dictionary_size += 8323073 ) + { + int midlen, outlen; + if( !bbcompress( inbuf, insize, dictionary_size, 16, midbuf, midsize, &midlen ) ) + { fputs( "bbexample: fixed_test: Not enough memory or compress error.\n", + stderr ); free( outbuf ); free( midbuf ); return 1; } + + if( !bbdecompress( midbuf, midlen, outbuf, outsize, &outlen ) ) + { fputs( "bbexample: fixed_test: Not enough memory or decompress error.\n", + stderr ); free( outbuf ); free( midbuf ); return 1; } + + if( insize != outlen || + ( insize > 0 && memcmp( inbuf, outbuf, insize ) != 0 ) ) + { fputs( "bbexample: fixed_test: Decompressed data differs from original.\n", + stderr ); free( outbuf ); free( midbuf ); return 1; } + + } + free( outbuf ); + free( midbuf ); + return 0; + } + + +int main( const int argc, const char * const argv[] ) + { + int retval = 0, i; + int open_failures = 0; + const bool verbose = ( argc > 2 ); + + if( argc < 2 ) + { + fputs( "Usage: bbexample filename\n", stderr ); + return 1; + } + + for( i = 1; i < argc && retval == 0; ++i ) + { + long insize; + uint8_t * const inbuf = read_file( argv[i], &insize ); + if( !inbuf ) { ++open_failures; continue; } + if( verbose ) fprintf( stderr, " Testing file '%s'\n", argv[i] ); + + retval = full_test( inbuf, insize ); + if( retval == 0 ) retval = fixed_test( inbuf, min( INT_MAX, insize ) ); + free( inbuf ); + } + if( open_failures > 0 && verbose ) + fprintf( stderr, "bbexample: warning: %d %s failed to open.\n", + open_failures, ( open_failures == 1 ) ? "file" : "files" ); + if( retval == 0 && open_failures ) retval = 1; + return retval; + } diff --git a/carg_parser.c b/carg_parser.c new file mode 100644 index 0000000..181ba23 --- /dev/null +++ b/carg_parser.c @@ -0,0 +1,319 @@ +/* Arg_parser - POSIX/GNU command line argument parser. (C version) + Copyright (C) 2006-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +#include <stdlib.h> +#include <string.h> + +#include "carg_parser.h" + + +/* assure at least a minimum size for buffer 'buf' */ +static void * ap_resize_buffer( void * buf, const int min_size ) + { + if( buf ) buf = realloc( buf, min_size ); + else buf = malloc( min_size ); + return buf; + } + + +static char push_back_record( struct Arg_parser * const ap, const int code, + const char * const long_name, + const char * const argument ) + { + struct ap_Record * p; + void * tmp = ap_resize_buffer( ap->data, + ( ap->data_size + 1 ) * sizeof (struct ap_Record) ); + if( !tmp ) return 0; + ap->data = (struct ap_Record *)tmp; + p = &(ap->data[ap->data_size]); + p->code = code; + if( long_name ) + { + const int len = strlen( long_name ); + p->parsed_name = (char *)malloc( len + 2 + 1 ); + if( !p->parsed_name ) return 0; + p->parsed_name[0] = p->parsed_name[1] = '-'; + strncpy( p->parsed_name + 2, long_name, len + 1 ); + } + else if( code > 0 && code < 256 ) + { + p->parsed_name = (char *)malloc( 2 + 1 ); + if( !p->parsed_name ) return 0; + p->parsed_name[0] = '-'; p->parsed_name[1] = code; p->parsed_name[2] = 0; + } + else p->parsed_name = 0; + if( argument ) + { + const int len = strlen( argument ); + p->argument = (char *)malloc( len + 1 ); + if( !p->argument ) { free( p->parsed_name ); return 0; } + strncpy( p->argument, argument, len + 1 ); + } + else p->argument = 0; + ++ap->data_size; + return 1; + } + + +static char add_error( struct Arg_parser * const ap, const char * const msg ) + { + const int len = strlen( msg ); + void * tmp = ap_resize_buffer( ap->error, ap->error_size + len + 1 ); + if( !tmp ) return 0; + ap->error = (char *)tmp; + strncpy( ap->error + ap->error_size, msg, len + 1 ); + ap->error_size += len; + return 1; + } + + +static void free_data( struct Arg_parser * const ap ) + { + int i; + for( i = 0; i < ap->data_size; ++i ) + { free( ap->data[i].argument ); free( ap->data[i].parsed_name ); } + if( ap->data ) { free( ap->data ); ap->data = 0; } + ap->data_size = 0; + } + + +/* Return 0 only if out of memory. */ +static char parse_long_option( struct Arg_parser * const ap, + const char * const opt, const char * const arg, + const struct ap_Option options[], + int * const argindp ) + { + unsigned len; + int index = -1, i; + char exact = 0, ambig = 0; + + for( len = 0; opt[len+2] && opt[len+2] != '='; ++len ) ; + + /* Test all long options for either exact match or abbreviated matches. */ + for( i = 0; options[i].code != 0; ++i ) + if( options[i].long_name && + strncmp( options[i].long_name, &opt[2], len ) == 0 ) + { + if( strlen( options[i].long_name ) == len ) /* Exact match found */ + { index = i; exact = 1; break; } + else if( index < 0 ) index = i; /* First nonexact match found */ + else if( options[index].code != options[i].code || + options[index].has_arg != options[i].has_arg ) + ambig = 1; /* Second or later nonexact match found */ + } + + if( ambig && !exact ) + { + add_error( ap, "option '" ); add_error( ap, opt ); + add_error( ap, "' is ambiguous" ); + return 1; + } + + if( index < 0 ) /* nothing found */ + { + add_error( ap, "unrecognized option '" ); add_error( ap, opt ); + add_error( ap, "'" ); + return 1; + } + + ++*argindp; + + if( opt[len+2] ) /* '--<long_option>=<argument>' syntax */ + { + if( options[index].has_arg == ap_no ) + { + add_error( ap, "option '--" ); add_error( ap, options[index].long_name ); + add_error( ap, "' doesn't allow an argument" ); + return 1; + } + if( options[index].has_arg == ap_yes && !opt[len+3] ) + { + add_error( ap, "option '--" ); add_error( ap, options[index].long_name ); + add_error( ap, "' requires an argument" ); + return 1; + } + return push_back_record( ap, options[index].code, + options[index].long_name, &opt[len+3] ); + } + + if( options[index].has_arg == ap_yes ) + { + if( !arg || !arg[0] ) + { + add_error( ap, "option '--" ); add_error( ap, options[index].long_name ); + add_error( ap, "' requires an argument" ); + return 1; + } + ++*argindp; + return push_back_record( ap, options[index].code, + options[index].long_name, arg ); + } + + return push_back_record( ap, options[index].code, + options[index].long_name, 0 ); + } + + +/* Return 0 only if out of memory. */ +static char parse_short_option( struct Arg_parser * const ap, + const char * const opt, const char * const arg, + const struct ap_Option options[], + int * const argindp ) + { + int cind = 1; /* character index in opt */ + + while( cind > 0 ) + { + int index = -1, i; + const unsigned char c = opt[cind]; + char code_str[2]; + code_str[0] = c; code_str[1] = 0; + + if( c != 0 ) + for( i = 0; options[i].code; ++i ) + if( c == options[i].code ) + { index = i; break; } + + if( index < 0 ) + { + add_error( ap, "invalid option -- '" ); add_error( ap, code_str ); + add_error( ap, "'" ); + return 1; + } + + if( opt[++cind] == 0 ) { ++*argindp; cind = 0; } /* opt finished */ + + if( options[index].has_arg != ap_no && cind > 0 && opt[cind] ) + { + if( !push_back_record( ap, c, 0, &opt[cind] ) ) return 0; + ++*argindp; cind = 0; + } + else if( options[index].has_arg == ap_yes ) + { + if( !arg || !arg[0] ) + { + add_error( ap, "option requires an argument -- '" ); + add_error( ap, code_str ); add_error( ap, "'" ); + return 1; + } + ++*argindp; cind = 0; + if( !push_back_record( ap, c, 0, arg ) ) return 0; + } + else if( !push_back_record( ap, c, 0, 0 ) ) return 0; + } + return 1; + } + + +char ap_init( struct Arg_parser * const ap, + const int argc, const char * const argv[], + const struct ap_Option options[], const char in_order ) + { + const char ** non_options = 0; /* skipped non-options */ + int non_options_size = 0; /* number of skipped non-options */ + int argind = 1; /* index in argv */ + char done = 0; /* false until success */ + + ap->data = 0; + ap->error = 0; + ap->data_size = 0; + ap->error_size = 0; + if( argc < 2 || !argv || !options ) return 1; + + while( argind < argc ) + { + const unsigned char ch1 = argv[argind][0]; + const unsigned char ch2 = ch1 ? argv[argind][1] : 0; + + if( ch1 == '-' && ch2 ) /* we found an option */ + { + const char * const opt = argv[argind]; + const char * const arg = ( argind + 1 < argc ) ? argv[argind+1] : 0; + if( ch2 == '-' ) + { + if( !argv[argind][2] ) { ++argind; break; } /* we found "--" */ + else if( !parse_long_option( ap, opt, arg, options, &argind ) ) goto out; + } + else if( !parse_short_option( ap, opt, arg, options, &argind ) ) goto out; + if( ap->error ) break; + } + else + { + if( in_order ) + { if( !push_back_record( ap, 0, 0, argv[argind++] ) ) goto out; } + else + { + void * tmp = ap_resize_buffer( non_options, + ( non_options_size + 1 ) * sizeof *non_options ); + if( !tmp ) goto out; + non_options = (const char **)tmp; + non_options[non_options_size++] = argv[argind++]; + } + } + } + if( ap->error ) free_data( ap ); + else + { + int i; + for( i = 0; i < non_options_size; ++i ) + if( !push_back_record( ap, 0, 0, non_options[i] ) ) goto out; + while( argind < argc ) + if( !push_back_record( ap, 0, 0, argv[argind++] ) ) goto out; + } + done = 1; +out: if( non_options ) free( non_options ); + return done; + } + + +void ap_free( struct Arg_parser * const ap ) + { + free_data( ap ); + if( ap->error ) { free( ap->error ); ap->error = 0; } + ap->error_size = 0; + } + + +const char * ap_error( const struct Arg_parser * const ap ) + { return ap->error; } + + +int ap_arguments( const struct Arg_parser * const ap ) + { return ap->data_size; } + + +int ap_code( const struct Arg_parser * const ap, const int i ) + { + if( i < 0 || i >= ap_arguments( ap ) ) return 0; + return ap->data[i].code; + } + + +const char * ap_parsed_name( const struct Arg_parser * const ap, const int i ) + { + if( i < 0 || i >= ap_arguments( ap ) || !ap->data[i].parsed_name ) return ""; + return ap->data[i].parsed_name; + } + + +const char * ap_argument( const struct Arg_parser * const ap, const int i ) + { + if( i < 0 || i >= ap_arguments( ap ) || !ap->data[i].argument ) return ""; + return ap->data[i].argument; + } diff --git a/carg_parser.h b/carg_parser.h new file mode 100644 index 0000000..0c64861 --- /dev/null +++ b/carg_parser.h @@ -0,0 +1,97 @@ +/* Arg_parser - POSIX/GNU command line argument parser. (C version) + Copyright (C) 2006-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +/* Arg_parser reads the arguments in 'argv' and creates a number of + option codes, option arguments, and non-option arguments. + + In case of error, 'ap_error' returns a non-null pointer to an error + message. + + 'options' is an array of 'struct ap_Option' terminated by an element + containing a code which is zero. A null long_name means a short-only + option. A code value outside the unsigned char range means a long-only + option. + + Arg_parser normally makes it appear as if all the option arguments + were specified before all the non-option arguments for the purposes + of parsing, even if the user of your program intermixed option and + non-option arguments. If you want the arguments in the exact order + the user typed them, call 'ap_init' with 'in_order' = true. + + The argument '--' terminates all options; any following arguments are + treated as non-option arguments, even if they begin with a hyphen. + + The syntax for optional option arguments is '-<short_option><argument>' + (without whitespace), or '--<long_option>=<argument>'. +*/ + +#ifdef __cplusplus +extern "C" { +#endif + +enum ap_Has_arg { ap_no, ap_yes, ap_maybe }; + +struct ap_Option + { + int code; /* Short option letter or code ( code != 0 ) */ + const char * long_name; /* Long option name (maybe null) */ + enum ap_Has_arg has_arg; + }; + + +struct ap_Record + { + int code; + char * parsed_name; + char * argument; + }; + + +struct Arg_parser + { + struct ap_Record * data; + char * error; + int data_size; + int error_size; + }; + + +char ap_init( struct Arg_parser * const ap, + const int argc, const char * const argv[], + const struct ap_Option options[], const char in_order ); + +void ap_free( struct Arg_parser * const ap ); + +const char * ap_error( const struct Arg_parser * const ap ); + +/* The number of arguments parsed. May be different from argc. */ +int ap_arguments( const struct Arg_parser * const ap ); + +/* If ap_code( i ) is 0, ap_argument( i ) is a non-option. + Else ap_argument( i ) is the option's argument (or empty). */ +int ap_code( const struct Arg_parser * const ap, const int i ); + +/* Full name of the option parsed (short or long). */ +const char * ap_parsed_name( const struct Arg_parser * const ap, const int i ); + +const char * ap_argument( const struct Arg_parser * const ap, const int i ); + +#ifdef __cplusplus +} +#endif diff --git a/cbuffer.c b/cbuffer.c new file mode 100644 index 0000000..812de42 --- /dev/null +++ b/cbuffer.c @@ -0,0 +1,143 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +struct Circular_buffer + { + uint8_t * buffer; + unsigned buffer_size; /* capacity == buffer_size - 1 */ + unsigned get; /* buffer is empty when get == put */ + unsigned put; + }; + +static inline bool Cb_init( struct Circular_buffer * const cb, + const unsigned buf_size ) + { + cb->buffer_size = buf_size + 1; + cb->get = 0; + cb->put = 0; + cb->buffer = + ( cb->buffer_size > 1 ) ? (uint8_t *)malloc( cb->buffer_size ) : 0; + return ( cb->buffer != 0 ); + } + +static inline void Cb_free( struct Circular_buffer * const cb ) + { free( cb->buffer ); cb->buffer = 0; } + +static inline void Cb_reset( struct Circular_buffer * const cb ) + { cb->get = 0; cb->put = 0; } + +static inline unsigned Cb_empty( const struct Circular_buffer * const cb ) + { return cb->get == cb->put; } + +static inline unsigned Cb_used_bytes( const struct Circular_buffer * const cb ) + { return ( (cb->get <= cb->put) ? 0 : cb->buffer_size ) + cb->put - cb->get; } + +static inline unsigned Cb_free_bytes( const struct Circular_buffer * const cb ) + { return ( (cb->get <= cb->put) ? cb->buffer_size : 0 ) - cb->put + cb->get - 1; } + +static inline uint8_t Cb_get_byte( struct Circular_buffer * const cb ) + { + const uint8_t b = cb->buffer[cb->get]; + if( ++cb->get >= cb->buffer_size ) cb->get = 0; + return b; + } + +static inline void Cb_put_byte( struct Circular_buffer * const cb, + const uint8_t b ) + { + cb->buffer[cb->put] = b; + if( ++cb->put >= cb->buffer_size ) cb->put = 0; + } + + +static bool Cb_unread_data( struct Circular_buffer * const cb, + const unsigned size ) + { + if( size > Cb_free_bytes( cb ) ) return false; + if( cb->get >= size ) cb->get -= size; + else cb->get = cb->buffer_size - size + cb->get; + return true; + } + + +/* Copy up to 'out_size' bytes to 'out_buffer' and update 'get'. + If 'out_buffer' is null, the bytes are discarded. + Return the number of bytes copied or discarded. +*/ +static unsigned Cb_read_data( struct Circular_buffer * const cb, + uint8_t * const out_buffer, + const unsigned out_size ) + { + unsigned size = 0; + if( out_size == 0 ) return 0; + if( cb->get > cb->put ) + { + size = min( cb->buffer_size - cb->get, out_size ); + if( size > 0 ) + { + if( out_buffer ) memcpy( out_buffer, cb->buffer + cb->get, size ); + cb->get += size; + if( cb->get >= cb->buffer_size ) cb->get = 0; + } + } + if( cb->get < cb->put ) + { + const unsigned size2 = min( cb->put - cb->get, out_size - size ); + if( size2 > 0 ) + { + if( out_buffer ) memcpy( out_buffer + size, cb->buffer + cb->get, size2 ); + cb->get += size2; + size += size2; + } + } + return size; + } + + +/* Copy up to 'in_size' bytes from 'in_buffer' and update 'put'. + Return the number of bytes copied. +*/ +static unsigned Cb_write_data( struct Circular_buffer * const cb, + const uint8_t * const in_buffer, + const unsigned in_size ) + { + unsigned size = 0; + if( in_size == 0 ) return 0; + if( cb->put >= cb->get ) + { + size = min( cb->buffer_size - cb->put - (cb->get == 0), in_size ); + if( size > 0 ) + { + memcpy( cb->buffer + cb->put, in_buffer, size ); + cb->put += size; + if( cb->put >= cb->buffer_size ) cb->put = 0; + } + } + if( cb->put < cb->get ) + { + const unsigned size2 = min( cb->get - cb->put - 1, in_size - size ); + if( size2 > 0 ) + { + memcpy( cb->buffer + cb->put, in_buffer + size, size2 ); + cb->put += size2; + size += size2; + } + } + return size; + } diff --git a/configure b/configure new file mode 100755 index 0000000..4060472 --- /dev/null +++ b/configure @@ -0,0 +1,239 @@ +#! /bin/sh +# configure script for Lzlib - Compression library for the lzip format +# Copyright (C) 2009-2022 Antonio Diaz Diaz. +# +# This configure script is free software: you have unlimited permission +# to copy, distribute, and modify it. + +pkgname=lzlib +pkgversion=1.13 +soversion=1 +progname=minilzip +progname_static=${progname} +progname_shared= +progname_lzip=${progname} +disable_ldconfig= +libname=lz +srctrigger=doc/${pkgname}.texi + +# clear some things potentially inherited from environment. +LC_ALL=C +export LC_ALL +srcdir= +prefix=/usr/local +exec_prefix='$(prefix)' +bindir='$(exec_prefix)/bin' +datarootdir='$(prefix)/share' +includedir='$(prefix)/include' +infodir='$(datarootdir)/info' +libdir='$(exec_prefix)/lib' +mandir='$(datarootdir)/man' +CC=gcc +AR=ar +CPPFLAGS= +CFLAGS='-Wall -W -O2' +LDFLAGS= +ARFLAGS=-rcs + +# checking whether we are using GNU C. +/bin/sh -c "${CC} --version" > /dev/null 2>&1 || { CC=cc ; CFLAGS=-O2 ; } + +# Loop over all args +args= +no_create= +while [ $# != 0 ] ; do + + # Get the first arg, and shuffle + option=$1 ; arg2=no + shift + + # Add the argument quoted to args + if [ -z "${args}" ] ; then args="\"${option}\"" + else args="${args} \"${option}\"" ; fi + + # Split out the argument for options that take them + case ${option} in + *=*) optarg=`echo "${option}" | sed -e 's,^[^=]*=,,;s,/$,,'` ;; + esac + + # Process the options + case ${option} in + --help | -h) + echo "Usage: $0 [OPTION]... [VAR=VALUE]..." + echo + echo "To assign makefile variables (e.g., CC, CFLAGS...), specify them as" + echo "arguments to configure in the form VAR=VALUE." + echo + echo "Options and variables: [defaults in brackets]" + echo " -h, --help display this help and exit" + echo " -V, --version output version information and exit" + echo " --srcdir=DIR find the sources in DIR [. or ..]" + echo " --prefix=DIR install into DIR [${prefix}]" + echo " --exec-prefix=DIR base directory for arch-dependent files [${exec_prefix}]" + echo " --bindir=DIR user executables directory [${bindir}]" + echo " --datarootdir=DIR base directory for doc and data [${datarootdir}]" + echo " --includedir=DIR C header files [${includedir}]" + echo " --infodir=DIR info files directory [${infodir}]" + echo " --libdir=DIR object code libraries [${libdir}]" + echo " --mandir=DIR man pages directory [${mandir}]" + echo " --disable-static don't build a static library [enable]" + echo " (implies --enable-shared)" + echo " --enable-shared build also a shared library [disable]" + echo " --disable-ldconfig don't run ldconfig after install" + echo " CC=COMPILER C compiler to use [${CC}]" + echo " AR=ARCHIVER library archiver to use [${AR}]" + echo " CPPFLAGS=OPTIONS command line options for the preprocessor [${CPPFLAGS}]" + echo " CFLAGS=OPTIONS command line options for the C compiler [${CFLAGS}]" + echo " CFLAGS+=OPTIONS append options to the current value of CFLAGS" + echo " LDFLAGS=OPTIONS command line options for the linker [${LDFLAGS}]" + echo " ARFLAGS=OPTIONS command line options for the library archiver [${ARFLAGS}]" + echo + exit 0 ;; + --version | -V) + echo "Configure script for ${pkgname} version ${pkgversion}" + exit 0 ;; + --srcdir) srcdir=$1 ; arg2=yes ;; + --prefix) prefix=$1 ; arg2=yes ;; + --exec-prefix) exec_prefix=$1 ; arg2=yes ;; + --bindir) bindir=$1 ; arg2=yes ;; + --datarootdir) datarootdir=$1 ; arg2=yes ;; + --includedir) includedir=$1 ; arg2=yes ;; + --infodir) infodir=$1 ; arg2=yes ;; + --libdir) libdir=$1 ; arg2=yes ;; + --mandir) mandir=$1 ; arg2=yes ;; + + --srcdir=*) srcdir=${optarg} ;; + --prefix=*) prefix=${optarg} ;; + --exec-prefix=*) exec_prefix=${optarg} ;; + --bindir=*) bindir=${optarg} ;; + --datarootdir=*) datarootdir=${optarg} ;; + --includedir=*) includedir=${optarg} ;; + --infodir=*) infodir=${optarg} ;; + --libdir=*) libdir=${optarg} ;; + --mandir=*) mandir=${optarg} ;; + --no-create) no_create=yes ;; + --disable-static) + progname_static= + progname_shared=${progname}_shared + progname_lzip=${progname}_shared ;; + --enable-shared) + progname_shared=${progname}_shared + progname_lzip=${progname}_shared ;; + --disable-ldconfig) disable_ldconfig=yes ;; + + CC=*) CC=${optarg} ;; + AR=*) AR=${optarg} ;; + CPPFLAGS=*) CPPFLAGS=${optarg} ;; + CFLAGS=*) CFLAGS=${optarg} ;; + CFLAGS+=*) CFLAGS="${CFLAGS} ${optarg}" ;; + LDFLAGS=*) LDFLAGS=${optarg} ;; + ARFLAGS=*) ARFLAGS=${optarg} ;; + + --*) + echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;; + *=* | *-*-*) ;; + *) + echo "configure: unrecognized option: '${option}'" 1>&2 + echo "Try 'configure --help' for more information." 1>&2 + exit 1 ;; + esac + + # Check if the option took a separate argument + if [ "${arg2}" = yes ] ; then + if [ $# != 0 ] ; then args="${args} \"$1\"" ; shift + else echo "configure: Missing argument to '${option}'" 1>&2 + exit 1 + fi + fi +done + +# Find the source files, if location was not specified. +srcdirtext= +if [ -z "${srcdir}" ] ; then + srcdirtext="or . or .." ; srcdir=. + if [ ! -r "${srcdir}/${srctrigger}" ] ; then srcdir=.. ; fi + if [ ! -r "${srcdir}/${srctrigger}" ] ; then + ## the sed command below emulates the dirname command + srcdir=`echo "$0" | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` + fi +fi + +if [ ! -r "${srcdir}/${srctrigger}" ] ; then + echo "configure: Can't find sources in ${srcdir} ${srcdirtext}" 1>&2 + echo "configure: (At least ${srctrigger} is missing)." 1>&2 + exit 1 +fi + +# Set srcdir to . if that's what it is. +if [ "`pwd`" = "`cd "${srcdir}" ; pwd`" ] ; then srcdir=. ; fi + +echo +if [ -z "${no_create}" ] ; then + echo "creating config.status" + rm -f config.status + cat > config.status << EOF +#! /bin/sh +# This file was generated automatically by configure. Don't edit. +# Run this file to recreate the current configuration. +# +# This script is free software: you have unlimited permission +# to copy, distribute, and modify it. + +exec /bin/sh $0 ${args} --no-create +EOF + chmod +x config.status +fi + +echo "creating Makefile" +echo "VPATH = ${srcdir}" +echo "prefix = ${prefix}" +echo "exec_prefix = ${exec_prefix}" +echo "bindir = ${bindir}" +echo "datarootdir = ${datarootdir}" +echo "includedir = ${includedir}" +echo "infodir = ${infodir}" +echo "libdir = ${libdir}" +echo "mandir = ${mandir}" +echo "CC = ${CC}" +echo "AR = ${AR}" +echo "CPPFLAGS = ${CPPFLAGS}" +echo "CFLAGS = ${CFLAGS}" +echo "LDFLAGS = ${LDFLAGS}" +echo "ARFLAGS = ${ARFLAGS}" +rm -f Makefile +cat > Makefile << EOF +# Makefile for Lzlib - Compression library for the lzip format +# Copyright (C) 2009-2022 Antonio Diaz Diaz. +# This file was generated automatically by configure. Don't edit. +# +# This Makefile is free software: you have unlimited permission +# to copy, distribute, and modify it. + +pkgname = ${pkgname} +pkgversion = ${pkgversion} +soversion = ${soversion} +progname = ${progname} +progname_static = ${progname_static} +progname_shared = ${progname_shared} +progname_lzip = ${progname_lzip} +disable_ldconfig = ${disable_ldconfig} +libname = ${libname} +VPATH = ${srcdir} +prefix = ${prefix} +exec_prefix = ${exec_prefix} +bindir = ${bindir} +datarootdir = ${datarootdir} +includedir = ${includedir} +infodir = ${infodir} +libdir = ${libdir} +mandir = ${mandir} +CC = ${CC} +AR = ${AR} +CPPFLAGS = ${CPPFLAGS} +CFLAGS = ${CFLAGS} +LDFLAGS = ${LDFLAGS} +ARFLAGS = ${ARFLAGS} +EOF +cat "${srcdir}/Makefile.in" >> Makefile + +echo "OK. Now you can run make." diff --git a/decoder.c b/decoder.c new file mode 100644 index 0000000..16f6532 --- /dev/null +++ b/decoder.c @@ -0,0 +1,145 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +static int LZd_try_verify_trailer( struct LZ_decoder * const d ) + { + Lzip_trailer trailer; + if( Rd_available_bytes( d->rdec ) < Lt_size ) + { if( !d->rdec->at_stream_end ) return 0; else return 2; } + d->verify_trailer_pending = false; + d->member_finished = true; + + if( Rd_read_data( d->rdec, trailer, Lt_size ) == Lt_size && + Lt_get_data_crc( trailer ) == LZd_crc( d ) && + Lt_get_data_size( trailer ) == LZd_data_position( d ) && + Lt_get_member_size( trailer ) == d->rdec->member_position ) return 0; + return 3; + } + + +/* Return value: 0 = OK, 1 = decoder error, 2 = unexpected EOF, + 3 = trailer error, 4 = unknown marker found, + 5 = library error. */ +static int LZd_decode_member( struct LZ_decoder * const d ) + { + struct Range_decoder * const rdec = d->rdec; + State * const state = &d->state; + /* unsigned old_mpos = rdec->member_position; */ + + if( d->member_finished ) return 0; + if( !Rd_try_reload( rdec ) ) + { if( !rdec->at_stream_end ) return 0; else return 2; } + if( d->verify_trailer_pending ) return LZd_try_verify_trailer( d ); + + while( !Rd_finished( rdec ) ) + { + /* const unsigned mpos = rdec->member_position; + if( mpos - old_mpos > rd_min_available_bytes ) return 5; + old_mpos = mpos; */ + if( !Rd_enough_available_bytes( rdec ) ) /* check unexpected EOF */ + { if( !rdec->at_stream_end ) return 0; + if( Cb_empty( &rdec->cb ) ) break; } /* decode until EOF */ + if( !LZd_enough_free_bytes( d ) ) return 0; + const int pos_state = LZd_data_position( d ) & pos_state_mask; + if( Rd_decode_bit( rdec, &d->bm_match[*state][pos_state] ) == 0 ) /* 1st bit */ + { + /* literal byte */ + Bit_model * const bm = d->bm_literal[get_lit_state(LZd_peek_prev( d ))]; + if( ( *state = St_set_char( *state ) ) < 4 ) + LZd_put_byte( d, Rd_decode_tree8( rdec, bm ) ); + else + LZd_put_byte( d, Rd_decode_matched( rdec, bm, LZd_peek( d, d->rep0 ) ) ); + continue; + } + /* match or repeated match */ + int len; + if( Rd_decode_bit( rdec, &d->bm_rep[*state] ) != 0 ) /* 2nd bit */ + { + if( Rd_decode_bit( rdec, &d->bm_rep0[*state] ) == 0 ) /* 3rd bit */ + { + if( Rd_decode_bit( rdec, &d->bm_len[*state][pos_state] ) == 0 ) /* 4th bit */ + { *state = St_set_short_rep( *state ); + LZd_put_byte( d, LZd_peek( d, d->rep0 ) ); continue; } + } + else + { + unsigned distance; + if( Rd_decode_bit( rdec, &d->bm_rep1[*state] ) == 0 ) /* 4th bit */ + distance = d->rep1; + else + { + if( Rd_decode_bit( rdec, &d->bm_rep2[*state] ) == 0 ) /* 5th bit */ + distance = d->rep2; + else + { distance = d->rep3; d->rep3 = d->rep2; } + d->rep2 = d->rep1; + } + d->rep1 = d->rep0; + d->rep0 = distance; + } + *state = St_set_rep( *state ); + len = Rd_decode_len( rdec, &d->rep_len_model, pos_state ); + } + else /* match */ + { + len = Rd_decode_len( rdec, &d->match_len_model, pos_state ); + unsigned distance = Rd_decode_tree6( rdec, d->bm_dis_slot[get_len_state(len)] ); + if( distance >= start_dis_model ) + { + const unsigned dis_slot = distance; + const int direct_bits = ( dis_slot >> 1 ) - 1; + distance = ( 2 | ( dis_slot & 1 ) ) << direct_bits; + if( dis_slot < end_dis_model ) + distance += Rd_decode_tree_reversed( rdec, + d->bm_dis + ( distance - dis_slot ), direct_bits ); + else + { + distance += + Rd_decode( rdec, direct_bits - dis_align_bits ) << dis_align_bits; + distance += Rd_decode_tree_reversed4( rdec, d->bm_align ); + if( distance == 0xFFFFFFFFU ) /* marker found */ + { + Rd_normalize( rdec ); + /* const unsigned mpos = rdec->member_position; + if( mpos - old_mpos > rd_min_available_bytes ) return 5; + old_mpos = mpos; */ + if( len == min_match_len ) /* End Of Stream marker */ + { + d->verify_trailer_pending = true; + return LZd_try_verify_trailer( d ); + } + if( len == min_match_len + 1 ) /* Sync Flush marker */ + { + rdec->reload_pending = true; + if( Rd_try_reload( rdec ) ) continue; + else { if( !rdec->at_stream_end ) return 0; else break; } + } + return 4; + } + } + } + d->rep3 = d->rep2; d->rep2 = d->rep1; d->rep1 = d->rep0; d->rep0 = distance; + *state = St_set_match( *state ); + if( d->rep0 >= d->dictionary_size || + ( d->rep0 >= d->cb.put && !d->pos_wrapped ) ) return 1; + } + LZd_copy_block( d, d->rep0, len ); + } + return 2; + } diff --git a/decoder.h b/decoder.h new file mode 100644 index 0000000..27de9cb --- /dev/null +++ b/decoder.h @@ -0,0 +1,463 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +enum { rd_min_available_bytes = 10 }; + +struct Range_decoder + { + struct Circular_buffer cb; /* input buffer */ + unsigned long long member_position; + uint32_t code; + uint32_t range; + bool at_stream_end; + bool reload_pending; + }; + +static inline bool Rd_init( struct Range_decoder * const rdec ) + { + if( !Cb_init( &rdec->cb, 65536 + rd_min_available_bytes ) ) return false; + rdec->member_position = 0; + rdec->code = 0; + rdec->range = 0xFFFFFFFFU; + rdec->at_stream_end = false; + rdec->reload_pending = false; + return true; + } + +static inline void Rd_free( struct Range_decoder * const rdec ) + { Cb_free( &rdec->cb ); } + +static inline bool Rd_finished( const struct Range_decoder * const rdec ) + { return rdec->at_stream_end && Cb_empty( &rdec->cb ); } + +static inline void Rd_finish( struct Range_decoder * const rdec ) + { rdec->at_stream_end = true; } + +static inline bool Rd_enough_available_bytes( const struct Range_decoder * const rdec ) + { return ( Cb_used_bytes( &rdec->cb ) >= rd_min_available_bytes ); } + +static inline unsigned Rd_available_bytes( const struct Range_decoder * const rdec ) + { return Cb_used_bytes( &rdec->cb ); } + +static inline unsigned Rd_free_bytes( const struct Range_decoder * const rdec ) + { return rdec->at_stream_end ? 0 : Cb_free_bytes( &rdec->cb ); } + +static inline unsigned long long Rd_purge( struct Range_decoder * const rdec ) + { + const unsigned long long size = + rdec->member_position + Cb_used_bytes( &rdec->cb ); + Cb_reset( &rdec->cb ); + rdec->member_position = 0; rdec->at_stream_end = true; + return size; + } + +static inline void Rd_reset( struct Range_decoder * const rdec ) + { Cb_reset( &rdec->cb ); + rdec->member_position = 0; rdec->at_stream_end = false; } + + +/* Seek for a member header and update 'get'. Set '*skippedp' to the number + of bytes skipped. Return true if a valid header is found. +*/ +static bool Rd_find_header( struct Range_decoder * const rdec, + unsigned * const skippedp ) + { + *skippedp = 0; + while( rdec->cb.get != rdec->cb.put ) + { + if( rdec->cb.buffer[rdec->cb.get] == lzip_magic[0] ) + { + unsigned get = rdec->cb.get; + int i; + Lzip_header header; + for( i = 0; i < Lh_size; ++i ) + { + if( get == rdec->cb.put ) return false; /* not enough data */ + header[i] = rdec->cb.buffer[get]; + if( ++get >= rdec->cb.buffer_size ) get = 0; + } + if( Lh_verify( header ) ) return true; + } + if( ++rdec->cb.get >= rdec->cb.buffer_size ) rdec->cb.get = 0; + ++*skippedp; + } + return false; + } + + +static inline int Rd_write_data( struct Range_decoder * const rdec, + const uint8_t * const inbuf, const int size ) + { + if( rdec->at_stream_end || size <= 0 ) return 0; + return Cb_write_data( &rdec->cb, inbuf, size ); + } + +static inline uint8_t Rd_get_byte( struct Range_decoder * const rdec ) + { + /* 0xFF avoids decoder error if member is truncated at EOS marker */ + if( Rd_finished( rdec ) ) return 0xFF; + ++rdec->member_position; + return Cb_get_byte( &rdec->cb ); + } + +static inline int Rd_read_data( struct Range_decoder * const rdec, + uint8_t * const outbuf, const int size ) + { + const int sz = Cb_read_data( &rdec->cb, outbuf, size ); + if( sz > 0 ) rdec->member_position += sz; + return sz; + } + +static inline bool Rd_unread_data( struct Range_decoder * const rdec, + const unsigned size ) + { + if( size > rdec->member_position || !Cb_unread_data( &rdec->cb, size ) ) + return false; + rdec->member_position -= size; + return true; + } + +static bool Rd_try_reload( struct Range_decoder * const rdec ) + { + if( rdec->reload_pending && Rd_available_bytes( rdec ) >= 5 ) + { + int i; + rdec->reload_pending = false; + rdec->code = 0; + for( i = 0; i < 5; ++i ) rdec->code = (rdec->code << 8) | Rd_get_byte( rdec ); + rdec->range = 0xFFFFFFFFU; + rdec->code &= rdec->range; /* make sure that first byte is discarded */ + } + return !rdec->reload_pending; + } + +static inline void Rd_normalize( struct Range_decoder * const rdec ) + { + if( rdec->range <= 0x00FFFFFFU ) + { rdec->range <<= 8; rdec->code = (rdec->code << 8) | Rd_get_byte( rdec ); } + } + +static inline unsigned Rd_decode( struct Range_decoder * const rdec, + const int num_bits ) + { + unsigned symbol = 0; + int i; + for( i = num_bits; i > 0; --i ) + { + Rd_normalize( rdec ); + rdec->range >>= 1; +/* symbol <<= 1; */ +/* if( rdec->code >= rdec->range ) { rdec->code -= rdec->range; symbol |= 1; } */ + const bool bit = ( rdec->code >= rdec->range ); + symbol <<= 1; symbol += bit; + rdec->code -= rdec->range & ( 0U - bit ); + } + return symbol; + } + +static inline unsigned Rd_decode_bit( struct Range_decoder * const rdec, + Bit_model * const probability ) + { + Rd_normalize( rdec ); + const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability; + if( rdec->code < bound ) + { + rdec->range = bound; + *probability += ( bit_model_total - *probability ) >> bit_model_move_bits; + return 0; + } + else + { + rdec->code -= bound; + rdec->range -= bound; + *probability -= *probability >> bit_model_move_bits; + return 1; + } + } + +static inline void Rd_decode_symbol_bit( struct Range_decoder * const rdec, + Bit_model * const probability, unsigned * symbol ) + { + Rd_normalize( rdec ); + *symbol <<= 1; + const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability; + if( rdec->code < bound ) + { + rdec->range = bound; + *probability += ( bit_model_total - *probability ) >> bit_model_move_bits; + } + else + { + rdec->code -= bound; + rdec->range -= bound; + *probability -= *probability >> bit_model_move_bits; + *symbol |= 1; + } + } + +static inline void Rd_decode_symbol_bit_reversed( struct Range_decoder * const rdec, + Bit_model * const probability, unsigned * model, + unsigned * symbol, const int i ) + { + Rd_normalize( rdec ); + *model <<= 1; + const uint32_t bound = ( rdec->range >> bit_model_total_bits ) * *probability; + if( rdec->code < bound ) + { + rdec->range = bound; + *probability += ( bit_model_total - *probability ) >> bit_model_move_bits; + } + else + { + rdec->code -= bound; + rdec->range -= bound; + *probability -= *probability >> bit_model_move_bits; + *model |= 1; + *symbol |= 1 << i; + } + } + +static inline unsigned Rd_decode_tree6( struct Range_decoder * const rdec, + Bit_model bm[] ) + { + unsigned symbol = 1; + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + return symbol & 0x3F; + } + +static inline unsigned Rd_decode_tree8( struct Range_decoder * const rdec, + Bit_model bm[] ) + { + unsigned symbol = 1; + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + return symbol & 0xFF; + } + +static inline unsigned +Rd_decode_tree_reversed( struct Range_decoder * const rdec, + Bit_model bm[], const int num_bits ) + { + unsigned model = 1; + unsigned symbol = 0; + int i; + for( i = 0; i < num_bits; ++i ) + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, i ); + return symbol; + } + +static inline unsigned +Rd_decode_tree_reversed4( struct Range_decoder * const rdec, Bit_model bm[] ) + { + unsigned model = 1; + unsigned symbol = 0; + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 0 ); + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 1 ); + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 2 ); + Rd_decode_symbol_bit_reversed( rdec, &bm[model], &model, &symbol, 3 ); + return symbol; + } + +static inline unsigned Rd_decode_matched( struct Range_decoder * const rdec, + Bit_model bm[], unsigned match_byte ) + { + unsigned symbol = 1; + unsigned mask = 0x100; + while( true ) + { + const unsigned match_bit = ( match_byte <<= 1 ) & mask; + const unsigned bit = Rd_decode_bit( rdec, &bm[symbol+match_bit+mask] ); + symbol <<= 1; symbol += bit; + if( symbol > 0xFF ) return symbol & 0xFF; + mask &= ~(match_bit ^ (bit << 8)); /* if( match_bit != bit ) mask = 0; */ + } + } + +static inline unsigned Rd_decode_len( struct Range_decoder * const rdec, + struct Len_model * const lm, + const int pos_state ) + { + Bit_model * bm; + unsigned mask, offset, symbol = 1; + + if( Rd_decode_bit( rdec, &lm->choice1 ) == 0 ) + { bm = lm->bm_low[pos_state]; mask = 7; offset = 0; goto len3; } + if( Rd_decode_bit( rdec, &lm->choice2 ) == 0 ) + { bm = lm->bm_mid[pos_state]; mask = 7; offset = len_low_symbols; goto len3; } + bm = lm->bm_high; mask = 0xFF; offset = len_low_symbols + len_mid_symbols; + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); +len3: + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + Rd_decode_symbol_bit( rdec, &bm[symbol], &symbol ); + return ( symbol & mask ) + min_match_len + offset; + } + + +enum { lzd_min_free_bytes = max_match_len }; + +struct LZ_decoder + { + struct Circular_buffer cb; + unsigned long long partial_data_pos; + struct Range_decoder * rdec; + unsigned dictionary_size; + uint32_t crc; + bool member_finished; + bool verify_trailer_pending; + bool pos_wrapped; + unsigned rep0; /* rep[0-3] latest four distances */ + unsigned rep1; /* used for efficient coding of */ + unsigned rep2; /* repeated distances */ + unsigned rep3; + State state; + + Bit_model bm_literal[1<<literal_context_bits][0x300]; + Bit_model bm_match[states][pos_states]; + Bit_model bm_rep[states]; + Bit_model bm_rep0[states]; + Bit_model bm_rep1[states]; + Bit_model bm_rep2[states]; + Bit_model bm_len[states][pos_states]; + Bit_model bm_dis_slot[len_states][1<<dis_slot_bits]; + Bit_model bm_dis[modeled_distances-end_dis_model+1]; + Bit_model bm_align[dis_align_size]; + + struct Len_model match_len_model; + struct Len_model rep_len_model; + }; + +static inline bool LZd_enough_free_bytes( const struct LZ_decoder * const d ) + { return Cb_free_bytes( &d->cb ) >= lzd_min_free_bytes; } + +static inline uint8_t LZd_peek_prev( const struct LZ_decoder * const d ) + { return d->cb.buffer[((d->cb.put > 0) ? d->cb.put : d->cb.buffer_size)-1]; } + +static inline uint8_t LZd_peek( const struct LZ_decoder * const d, + const unsigned distance ) + { + const unsigned i = ( ( d->cb.put > distance ) ? 0 : d->cb.buffer_size ) + + d->cb.put - distance - 1; + return d->cb.buffer[i]; + } + +static inline void LZd_put_byte( struct LZ_decoder * const d, const uint8_t b ) + { + CRC32_update_byte( &d->crc, b ); + d->cb.buffer[d->cb.put] = b; + if( ++d->cb.put >= d->cb.buffer_size ) + { d->partial_data_pos += d->cb.put; d->cb.put = 0; d->pos_wrapped = true; } + } + +static inline void LZd_copy_block( struct LZ_decoder * const d, + const unsigned distance, unsigned len ) + { + unsigned lpos = d->cb.put, i = lpos - distance - 1; + bool fast, fast2; + if( lpos > distance ) + { + fast = ( len < d->cb.buffer_size - lpos ); + fast2 = ( fast && len <= lpos - i ); + } + else + { + i += d->cb.buffer_size; + fast = ( len < d->cb.buffer_size - i ); /* (i == pos) may happen */ + fast2 = ( fast && len <= i - lpos ); + } + if( fast ) /* no wrap */ + { + const unsigned tlen = len; + if( fast2 ) /* no wrap, no overlap */ + memcpy( d->cb.buffer + lpos, d->cb.buffer + i, len ); + else + for( ; len > 0; --len ) d->cb.buffer[lpos++] = d->cb.buffer[i++]; + CRC32_update_buf( &d->crc, d->cb.buffer + d->cb.put, tlen ); + d->cb.put += tlen; + } + else for( ; len > 0; --len ) + { + LZd_put_byte( d, d->cb.buffer[i] ); + if( ++i >= d->cb.buffer_size ) i = 0; + } + } + +static inline bool LZd_init( struct LZ_decoder * const d, + struct Range_decoder * const rde, + const unsigned dict_size ) + { + if( !Cb_init( &d->cb, max( 65536, dict_size ) + lzd_min_free_bytes ) ) + return false; + d->partial_data_pos = 0; + d->rdec = rde; + d->dictionary_size = dict_size; + d->crc = 0xFFFFFFFFU; + d->member_finished = false; + d->verify_trailer_pending = false; + d->pos_wrapped = false; + /* prev_byte of first byte; also for LZd_peek( 0 ) on corrupt file */ + d->cb.buffer[d->cb.buffer_size-1] = 0; + d->rep0 = 0; + d->rep1 = 0; + d->rep2 = 0; + d->rep3 = 0; + d->state = 0; + + Bm_array_init( d->bm_literal[0], (1 << literal_context_bits) * 0x300 ); + Bm_array_init( d->bm_match[0], states * pos_states ); + Bm_array_init( d->bm_rep, states ); + Bm_array_init( d->bm_rep0, states ); + Bm_array_init( d->bm_rep1, states ); + Bm_array_init( d->bm_rep2, states ); + Bm_array_init( d->bm_len[0], states * pos_states ); + Bm_array_init( d->bm_dis_slot[0], len_states * (1 << dis_slot_bits) ); + Bm_array_init( d->bm_dis, modeled_distances - end_dis_model + 1 ); + Bm_array_init( d->bm_align, dis_align_size ); + Lm_init( &d->match_len_model ); + Lm_init( &d->rep_len_model ); + return true; + } + +static inline void LZd_free( struct LZ_decoder * const d ) + { Cb_free( &d->cb ); } + +static inline bool LZd_member_finished( const struct LZ_decoder * const d ) + { return ( d->member_finished && Cb_empty( &d->cb ) ); } + +static inline unsigned LZd_crc( const struct LZ_decoder * const d ) + { return d->crc ^ 0xFFFFFFFFU; } + +static inline unsigned long long +LZd_data_position( const struct LZ_decoder * const d ) + { return d->partial_data_pos + d->cb.put; } diff --git a/doc/lzlib.info b/doc/lzlib.info new file mode 100644 index 0000000..d81bc88 --- /dev/null +++ b/doc/lzlib.info @@ -0,0 +1,1323 @@ +This is lzlib.info, produced by makeinfo version 4.13+ from lzlib.texi. + +INFO-DIR-SECTION Compression +START-INFO-DIR-ENTRY +* Lzlib: (lzlib). Compression library for the lzip format +END-INFO-DIR-ENTRY + + +File: lzlib.info, Node: Top, Next: Introduction, Up: (dir) + +Lzlib Manual +************ + +This manual is for Lzlib (version 1.13, 23 January 2022). + +* Menu: + +* Introduction:: Purpose and features of lzlib +* Library version:: Checking library version +* Buffering:: Sizes of lzlib's buffers +* Parameter limits:: Min / max values for some parameters +* Compression functions:: Descriptions of the compression functions +* Decompression functions:: Descriptions of the decompression functions +* Error codes:: Meaning of codes returned by functions +* Error messages:: Error messages corresponding to error codes +* Invoking minilzip:: Command line interface of the test program +* Data format:: Detailed format of the compressed data +* Examples:: A small tutorial with examples +* Problems:: Reporting bugs +* Concept index:: Index of concepts + + + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. + + +File: lzlib.info, Node: Introduction, Next: Library version, Prev: Top, Up: Top + +1 Introduction +************** + +Lzlib is a data compression library providing in-memory LZMA compression and +decompression functions, including integrity checking of the decompressed +data. The compressed data format used by the library is the lzip format. +Lzlib is written in C. + + The lzip file format is designed for data sharing and long-term +archiving, taking into account both data integrity and decoder availability: + + * The lzip format provides very safe integrity checking and some data + recovery means. The program lziprecover can repair bit flip errors + (one of the most common forms of data corruption) in lzip files, and + provides data recovery capabilities, including error-checked merging + of damaged copies of a file. *Note Data safety: (lziprecover)Data + safety. + + * The lzip format is as simple as possible (but not simpler). The lzip + manual provides the source code of a simple decompressor along with a + detailed explanation of how it works, so that with the only help of the + lzip manual it would be possible for a digital archaeologist to extract + the data from a lzip file long after quantum computers eventually + render LZMA obsolete. + + * Additionally the lzip reference implementation is copylefted, which + guarantees that it will remain free forever. + + A nice feature of the lzip format is that a corrupt byte is easier to +repair the nearer it is from the beginning of the file. Therefore, with the +help of lziprecover, losing an entire archive just because of a corrupt +byte near the beginning is a thing of the past. + + The functions and variables forming the interface of the compression +library are declared in the file 'lzlib.h'. Usage examples of the library +are given in the files 'bbexample.c', 'ffexample.c', and 'minilzip.c' from +the source distribution. + + All the library functions are thread safe. The library does not install +any signal handler. The decoder checks the consistency of the compressed +data, so the library should never crash even in case of corrupted input. + + Compression/decompression is done by repeatedly calling a couple of +read/write functions until all the data have been processed by the library. +This interface is safer and less error prone than the traditional zlib +interface. + + Compression/decompression is done when the read function is called. This +means the value returned by the position functions will not be updated until +a read call, even if a lot of data are written. If you want the data to be +compressed in advance, just call the read function with a SIZE equal to 0. + + If all the data to be compressed are written in advance, lzlib will +automatically adjust the header of the compressed data to use the largest +dictionary size that does not exceed neither the data size nor the limit +given to 'LZ_compress_open'. This feature reduces the amount of memory +needed for decompression and allows minilzip to produce identical compressed +output as lzip. + + Lzlib will correctly decompress a data stream which is the concatenation +of two or more compressed data streams. The result is the concatenation of +the corresponding decompressed data streams. Integrity testing of +concatenated compressed data streams is also supported. + + Lzlib is able to compress and decompress streams of unlimited size by +automatically creating multimember output. The members so created are large, +about 2 PiB each. + + In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option '-0' of lzip uses the scheme in almost the +simplest way possible; issuing the longest match it can find, or a literal +byte if it can't find a match. Inversely, a much more elaborated way of +finding coding sequences of minimum size than the one currently used by lzip +could be developed, and the resulting sequence could also be coded using the +LZMA coding scheme. + + Lzlib currently implements two variants of the LZMA algorithm: fast +(used by option '-0' of minilzip) and normal (used by all other compression +levels). + + The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77/78) and markov models (the +thing used by every compression algorithm that uses a range encoder or +similar order-0 entropy coder as its last stage) with segregation of +contexts according to what the bits are used for. + + The ideas embodied in lzlib are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). + + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never +have been compressed. Decompressed is used to refer to data which have +undergone the process of decompression. + + +File: lzlib.info, Node: Library version, Next: Buffering, Prev: Introduction, Up: Top + +2 Library version +***************** + +One goal of lzlib is to keep perfect backward compatibility with older +versions of itself down to 1.0. Any application working with an older lzlib +should work with a newer lzlib. Installing a newer lzlib should not break +anything. This chapter describes the constants and functions that the +application can use to discover the version of the library being used. All +of them are declared in 'lzlib.h'. + + -- Constant: LZ_API_VERSION + This constant is defined in 'lzlib.h' and works as a version test + macro. The application should verify at compile time that + LZ_API_VERSION is greater than or equal to the version required by the + application: + + #if !defined LZ_API_VERSION || LZ_API_VERSION < 1012 + #error "lzlib 1.12 or newer needed." + #endif + + Before version 1.8, lzlib didn't define LZ_API_VERSION. + LZ_API_VERSION was first defined in lzlib 1.8 to 1. + Since lzlib 1.12, LZ_API_VERSION is defined as (major * 1000 + minor). + + NOTE: Version test macros are the library's way of announcing +functionality to the application. They should not be confused with feature +test macros, which allow the application to announce to the library its +desire to have certain symbols and prototypes exposed. + + -- Function: int LZ_api_version ( void ) + If LZ_API_VERSION >= 1012, this function is declared in 'lzlib.h' (else + it doesn't exist). It returns the LZ_API_VERSION of the library object + code being used. The application should verify at run time that the + value returned by 'LZ_api_version' is greater than or equal to the + version required by the application. An application may be dinamically + linked at run time with a different version of lzlib than the one it + was compiled for, and this should not break the program as long as the + library used provides the functionality required by the application. + + #if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + if( LZ_api_version() < 1012 ) + show_error( "lzlib 1.12 or newer needed." ); + #endif + + -- Constant: const char * LZ_version_string + This string constant is defined in the header file 'lzlib.h' and + represents the version of the library being used at compile time. + + -- Function: const char * LZ_version ( void ) + This function returns a string representing the version of the library + being used at run time. + + +File: lzlib.info, Node: Buffering, Next: Parameter limits, Prev: Library version, Up: Top + +3 Buffering +*********** + +Lzlib internal functions need access to a memory chunk at least as large as +the dictionary size (sliding window). For efficiency reasons, the input +buffer for compression is twice or sixteen times as large as the dictionary +size. + + Finally, for safety reasons, lzlib uses two more internal buffers. + + These are the four buffers used by lzlib, and their guaranteed minimum +sizes: + + * Input compression buffer. Written to by the function + 'LZ_compress_write'. For the normal variant of LZMA, its size is two + times the dictionary size set with the function 'LZ_compress_open' or + 64 KiB, whichever is larger. For the fast variant, its size is 1 MiB. + + * Output compression buffer. Read from by the function + 'LZ_compress_read'. Its size is 64 KiB. + + * Input decompression buffer. Written to by the function + 'LZ_decompress_write'. Its size is 64 KiB. + + * Output decompression buffer. Read from by the function + 'LZ_decompress_read'. Its size is the dictionary size set in the header + of the member currently being decompressed or 64 KiB, whichever is + larger. + + +File: lzlib.info, Node: Parameter limits, Next: Compression functions, Prev: Buffering, Up: Top + +4 Parameter limits +****************** + +These functions provide minimum and maximum values for some parameters. +Current values are shown in square brackets. + + -- Function: int LZ_min_dictionary_bits ( void ) + Returns the base 2 logarithm of the smallest valid dictionary size + [12]. + + -- Function: int LZ_min_dictionary_size ( void ) + Returns the smallest valid dictionary size [4 KiB]. + + -- Function: int LZ_max_dictionary_bits ( void ) + Returns the base 2 logarithm of the largest valid dictionary size [29]. + + -- Function: int LZ_max_dictionary_size ( void ) + Returns the largest valid dictionary size [512 MiB]. + + -- Function: int LZ_min_match_len_limit ( void ) + Returns the smallest valid match length limit [5]. + + -- Function: int LZ_max_match_len_limit ( void ) + Returns the largest valid match length limit [273]. + + +File: lzlib.info, Node: Compression functions, Next: Decompression functions, Prev: Parameter limits, Up: Top + +5 Compression functions +*********************** + +These are the functions used to compress data. In case of error, all of +them return -1 or 0, for signed and unsigned return values respectively, +except 'LZ_compress_open' whose return value must be verified by calling +'LZ_compress_errno' before using it. + + -- Function: struct LZ_Encoder * LZ_compress_open ( const int + DICTIONARY_SIZE, const int MATCH_LEN_LIMIT, const unsigned long + long MEMBER_SIZE ) + Initializes the internal stream state for compression and returns a + pointer that can only be used as the ENCODER argument for the other + LZ_compress functions, or a null pointer if the encoder could not be + allocated. + + The returned pointer must be verified by calling 'LZ_compress_errno' + before using it. If 'LZ_compress_errno' does not return 'LZ_ok', the + returned pointer must not be used and should be freed with + 'LZ_compress_close' to avoid memory leaks. + + DICTIONARY_SIZE sets the dictionary size to be used, in bytes. Valid + values range from 4 KiB to 512 MiB. Note that dictionary sizes are + quantized. If the size specified does not match one of the valid + sizes, it will be rounded upwards by adding up to + (DICTIONARY_SIZE / 8) to it. + + MATCH_LEN_LIMIT sets the match length limit in bytes. Valid values + range from 5 to 273. Larger values usually give better compression + ratios but longer compression times. + + If DICTIONARY_SIZE is 65535 and MATCH_LEN_LIMIT is 16, the fast + variant of LZMA is chosen, which produces identical compressed output + as 'lzip -0'. (The dictionary size used will be rounded upwards to + 64 KiB). + + MEMBER_SIZE sets the member size limit in bytes. Valid values range + from 4 KiB to 2 PiB. A small member size may degrade compression + ratio, so use it only when needed. To produce a single-member data + stream, give MEMBER_SIZE a value larger than the amount of data to be + produced. Values larger than 2 PiB will be reduced to 2 PiB to prevent + the uncompressed size of the member from overflowing. + + -- Function: int LZ_compress_close ( struct LZ_Encoder * const ENCODER ) + Frees all dynamically allocated data structures for this stream. This + function discards any unprocessed input and does not flush any pending + output. After a call to 'LZ_compress_close', ENCODER can no longer be + used as an argument to any LZ_compress function. It is safe to call + 'LZ_compress_close' with a null argument. + + -- Function: int LZ_compress_finish ( struct LZ_Encoder * const ENCODER ) + Use this function to tell 'lzlib' that all the data for this member + have already been written (with the function 'LZ_compress_write'). It + is safe to call 'LZ_compress_finish' as many times as needed. After + all the compressed data have been read with 'LZ_compress_read' and + 'LZ_compress_member_finished' returns 1, a new member can be started + with 'LZ_compress_restart_member'. + + -- Function: int LZ_compress_restart_member ( struct LZ_Encoder * const + ENCODER, const unsigned long long MEMBER_SIZE ) + Use this function to start a new member in a multimember data stream. + Call this function only after 'LZ_compress_member_finished' indicates + that the current member has been fully read (with the function + 'LZ_compress_read'). *Note member_size::, for a description of + MEMBER_SIZE. + + -- Function: int LZ_compress_sync_flush ( struct LZ_Encoder * const + ENCODER ) + Use this function to make available to 'LZ_compress_read' all the data + already written with the function 'LZ_compress_write'. First call + 'LZ_compress_sync_flush'. Then call 'LZ_compress_read' until it + returns 0. + + This function writes at least one LZMA marker '3' ("Sync Flush" marker) + to the compressed output. Note that the sync flush marker is not + allowed in lzip files; it is a device for interactive communication + between applications using lzlib, but is useless and wasteful in a + file, and is excluded from the media type 'application/lzip'. The LZMA + marker '2' ("End Of Stream" marker) is the only marker allowed in lzip + files. *Note Data format::. + + Repeated use of 'LZ_compress_sync_flush' may degrade compression + ratio, so use it only when needed. If the interval between calls to + 'LZ_compress_sync_flush' is large (comparable to dictionary size), + creating a multimember data stream with 'LZ_compress_restart_member' + may be an alternative. + + Combining multimember stream creation with flushing may be tricky. If + there are more bytes available than those needed to complete + MEMBER_SIZE, 'LZ_compress_restart_member' needs to be called when + 'LZ_compress_member_finished' returns 1, followed by a new call to + 'LZ_compress_sync_flush'. + + -- Function: int LZ_compress_read ( struct LZ_Encoder * const ENCODER, + uint8_t * const BUFFER, const int SIZE ) + Reads up to SIZE bytes from the stream pointed to by ENCODER, storing + the results in BUFFER. If LZ_API_VERSION >= 1012, BUFFER may be a null + pointer, in which case the bytes read are discarded. + + Returns the number of bytes actually read. This might be less than + SIZE; for example, if there aren't that many bytes left in the stream + or if more bytes have to be yet written with the function + 'LZ_compress_write'. Note that reading less than SIZE bytes is not an + error. + + -- Function: int LZ_compress_write ( struct LZ_Encoder * const ENCODER, + uint8_t * const BUFFER, const int SIZE ) + Writes up to SIZE bytes from BUFFER to the stream pointed to by + ENCODER. Returns the number of bytes actually written. This might be + less than SIZE. Note that writing less than SIZE bytes is not an error. + + -- Function: int LZ_compress_write_size ( struct LZ_Encoder * const + ENCODER ) + Returns the maximum number of bytes that can be immediately written + through 'LZ_compress_write'. For efficiency reasons, once the input + buffer is full and 'LZ_compress_write_size' returns 0, almost all the + buffer must be compressed before a size greater than 0 is returned + again. (This is done to minimize the amount of data that must be + copied to the beginning of the buffer before new data can be accepted). + + It is guaranteed that an immediate call to 'LZ_compress_write' will + accept a SIZE up to the returned number of bytes. + + -- Function: enum LZ_Errno LZ_compress_errno ( struct LZ_Encoder * const + ENCODER ) + Returns the current error code for ENCODER. *Note Error codes::. It is + safe to call 'LZ_compress_errno' with a null argument, in which case + it returns 'LZ_bad_argument'. + + -- Function: int LZ_compress_finished ( struct LZ_Encoder * const ENCODER ) + Returns 1 if all the data have been read and 'LZ_compress_close' can + be safely called. Otherwise it returns 0. 'LZ_compress_finished' + implies 'LZ_compress_member_finished'. + + -- Function: int LZ_compress_member_finished ( struct LZ_Encoder * const + ENCODER ) + Returns 1 if the current member, in a multimember data stream, has been + fully read and 'LZ_compress_restart_member' can be safely called. + Otherwise it returns 0. + + -- Function: unsigned long long LZ_compress_data_position ( struct + LZ_Encoder * const ENCODER ) + Returns the number of input bytes already compressed in the current + member. + + -- Function: unsigned long long LZ_compress_member_position ( struct + LZ_Encoder * const ENCODER ) + Returns the number of compressed bytes already produced, but perhaps + not yet read, in the current member. + + -- Function: unsigned long long LZ_compress_total_in_size ( struct + LZ_Encoder * const ENCODER ) + Returns the total number of input bytes already compressed. + + -- Function: unsigned long long LZ_compress_total_out_size ( struct + LZ_Encoder * const ENCODER ) + Returns the total number of compressed bytes already produced, but + perhaps not yet read. + + +File: lzlib.info, Node: Decompression functions, Next: Error codes, Prev: Compression functions, Up: Top + +6 Decompression functions +************************* + +These are the functions used to decompress data. In case of error, all of +them return -1 or 0, for signed and unsigned return values respectively, +except 'LZ_decompress_open' whose return value must be verified by calling +'LZ_decompress_errno' before using it. + + -- Function: struct LZ_Decoder * LZ_decompress_open ( void ) + Initializes the internal stream state for decompression and returns a + pointer that can only be used as the DECODER argument for the other + LZ_decompress functions, or a null pointer if the decoder could not be + allocated. + + The returned pointer must be verified by calling 'LZ_decompress_errno' + before using it. If 'LZ_decompress_errno' does not return 'LZ_ok', the + returned pointer must not be used and should be freed with + 'LZ_decompress_close' to avoid memory leaks. + + -- Function: int LZ_decompress_close ( struct LZ_Decoder * const DECODER ) + Frees all dynamically allocated data structures for this stream. This + function discards any unprocessed input and does not flush any pending + output. After a call to 'LZ_decompress_close', DECODER can no longer + be used as an argument to any LZ_decompress function. It is safe to + call 'LZ_decompress_close' with a null argument. + + -- Function: int LZ_decompress_finish ( struct LZ_Decoder * const DECODER ) + Use this function to tell 'lzlib' that all the data for this stream + have already been written (with the function 'LZ_decompress_write'). + It is safe to call 'LZ_decompress_finish' as many times as needed. It + is not required to call 'LZ_decompress_finish' if the input stream + only contains whole members, but not calling it prevents lzlib from + detecting a truncated member. + + -- Function: int LZ_decompress_reset ( struct LZ_Decoder * const DECODER ) + Resets the internal state of DECODER as it was just after opening it + with the function 'LZ_decompress_open'. Data stored in the internal + buffers is discarded. Position counters are set to 0. + + -- Function: int LZ_decompress_sync_to_member ( struct LZ_Decoder * const + DECODER ) + Resets the error state of DECODER and enters a search state that lasts + until a new member header (or the end of the stream) is found. After a + successful call to 'LZ_decompress_sync_to_member', data written with + 'LZ_decompress_write' will be consumed and 'LZ_decompress_read' will + return 0 until a header is found. + + This function is useful to discard any data preceding the first member, + or to discard the rest of the current member, for example in case of a + data error. If the decoder is already at the beginning of a member, + this function does nothing. + + -- Function: int LZ_decompress_read ( struct LZ_Decoder * const DECODER, + uint8_t * const BUFFER, const int SIZE ) + Reads up to SIZE bytes from the stream pointed to by DECODER, storing + the results in BUFFER. If LZ_API_VERSION >= 1012, BUFFER may be a null + pointer, in which case the bytes read are discarded. + + Returns the number of bytes actually read. This might be less than + SIZE; for example, if there aren't that many bytes left in the stream + or if more bytes have to be yet written with the function + 'LZ_decompress_write'. Note that reading less than SIZE bytes is not + an error. + + 'LZ_decompress_read' returns at least once per member so that + 'LZ_decompress_member_finished' can be called (and trailer data + retrieved) for each member, even for empty members. Therefore, + 'LZ_decompress_read' returning 0 does not mean that the end of the + stream has been reached. The increase in the value returned by + 'LZ_decompress_total_in_size' can be used to tell the end of the stream + from an empty member. + + In case of decompression error caused by corrupt or truncated data, + 'LZ_decompress_read' does not signal the error immediately to the + application, but waits until all the bytes decoded have been read. This + allows tools like tarlz to recover as much data as possible from each + damaged member. *Note tarlz manual: (tarlz)Top. + + -- Function: int LZ_decompress_write ( struct LZ_Decoder * const DECODER, + uint8_t * const BUFFER, const int SIZE ) + Writes up to SIZE bytes from BUFFER to the stream pointed to by + DECODER. Returns the number of bytes actually written. This might be + less than SIZE. Note that writing less than SIZE bytes is not an error. + + -- Function: int LZ_decompress_write_size ( struct LZ_Decoder * const + DECODER ) + Returns the maximum number of bytes that can be immediately written + through 'LZ_decompress_write'. This number varies smoothly; each + compressed byte consumed may be overwritten immediately, increasing by + 1 the value returned. + + It is guaranteed that an immediate call to 'LZ_decompress_write' will + accept a SIZE up to the returned number of bytes. + + -- Function: enum LZ_Errno LZ_decompress_errno ( struct LZ_Decoder * const + DECODER ) + Returns the current error code for DECODER. *Note Error codes::. It is + safe to call 'LZ_decompress_errno' with a null argument, in which case + it returns 'LZ_bad_argument'. + + -- Function: int LZ_decompress_finished ( struct LZ_Decoder * const + DECODER ) + Returns 1 if all the data have been read and 'LZ_decompress_close' can + be safely called. Otherwise it returns 0. 'LZ_decompress_finished' + does not imply 'LZ_decompress_member_finished'. + + -- Function: int LZ_decompress_member_finished ( struct LZ_Decoder * const + DECODER ) + Returns 1 if the previous call to 'LZ_decompress_read' finished reading + the current member, indicating that final values for the member are + available through 'LZ_decompress_data_crc', + 'LZ_decompress_data_position', and 'LZ_decompress_member_position'. + Otherwise it returns 0. + + -- Function: int LZ_decompress_member_version ( struct LZ_Decoder * const + DECODER ) + Returns the version of the current member, read from the member header. + + -- Function: int LZ_decompress_dictionary_size ( struct LZ_Decoder * const + DECODER ) + Returns the dictionary size of the current member, read from the + member header. + + -- Function: unsigned LZ_decompress_data_crc ( struct LZ_Decoder * const + DECODER ) + Returns the 32 bit Cyclic Redundancy Check of the data decompressed + from the current member. The value returned is valid only when + 'LZ_decompress_member_finished' returns 1. + + -- Function: unsigned long long LZ_decompress_data_position ( struct + LZ_Decoder * const DECODER ) + Returns the number of decompressed bytes already produced, but perhaps + not yet read, in the current member. + + -- Function: unsigned long long LZ_decompress_member_position ( struct + LZ_Decoder * const DECODER ) + Returns the number of input bytes already decompressed in the current + member. + + -- Function: unsigned long long LZ_decompress_total_in_size ( struct + LZ_Decoder * const DECODER ) + Returns the total number of input bytes already decompressed. + + -- Function: unsigned long long LZ_decompress_total_out_size ( struct + LZ_Decoder * const DECODER ) + Returns the total number of decompressed bytes already produced, but + perhaps not yet read. + + +File: lzlib.info, Node: Error codes, Next: Error messages, Prev: Decompression functions, Up: Top + +7 Error codes +************* + +Most library functions return -1 to indicate that they have failed. But +this return value only tells you that an error has occurred. To find out +what kind of error it was, you need to verify the error code by calling +'LZ_(de)compress_errno'. + + Library functions don't change the value returned by +'LZ_(de)compress_errno' when they succeed; thus, the value returned by +'LZ_(de)compress_errno' after a successful call is not necessarily LZ_ok, +and you should not use 'LZ_(de)compress_errno' to determine whether a call +failed. If the call failed, then you can examine 'LZ_(de)compress_errno'. + + The error codes are defined in the header file 'lzlib.h'. + + -- Constant: enum LZ_Errno LZ_ok + The value of this constant is 0 and is used to indicate that there is + no error. + + -- Constant: enum LZ_Errno LZ_bad_argument + At least one of the arguments passed to the library function was + invalid. + + -- Constant: enum LZ_Errno LZ_mem_error + No memory available. The system cannot allocate more virtual memory + because its capacity is full. + + -- Constant: enum LZ_Errno LZ_sequence_error + A library function was called in the wrong order. For example + 'LZ_compress_restart_member' was called before + 'LZ_compress_member_finished' indicates that the current member is + finished. + + -- Constant: enum LZ_Errno LZ_header_error + An invalid member header (one with the wrong magic bytes) was read. If + this happens at the end of the data stream it may indicate trailing + data. + + -- Constant: enum LZ_Errno LZ_unexpected_eof + The end of the data stream was reached in the middle of a member. + + -- Constant: enum LZ_Errno LZ_data_error + The data stream is corrupt. If 'LZ_decompress_member_position' is 6 or + less, it indicates either a format version not supported, an invalid + dictionary size, a corrupt header in a multimember data stream, or + trailing data too similar to a valid lzip header. Lziprecover can be + used to remove conflicting trailing data from a file. + + -- Constant: enum LZ_Errno LZ_library_error + A bug was detected in the library. Please, report it. *Note Problems::. + + +File: lzlib.info, Node: Error messages, Next: Invoking minilzip, Prev: Error codes, Up: Top + +8 Error messages +**************** + + -- Function: const char * LZ_strerror ( const enum LZ_Errno LZ_ERRNO ) + Returns the standard error message for a given error code. The messages + are fairly short; there are no multi-line messages or embedded + newlines. This function makes it easy for your program to report + informative error messages about the failure of a library call. + + The value of LZ_ERRNO normally comes from a call to + 'LZ_(de)compress_errno'. + + +File: lzlib.info, Node: Invoking minilzip, Next: Data format, Prev: Error messages, Up: Top + +9 Invoking minilzip +******************* + +Minilzip is a test program for the compression library lzlib, fully +compatible with lzip 1.4 or newer. + + Lzip is a lossless data compressor with a user interface similar to the +one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov +chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity +checking to maximize interoperability and optimize safety. Lzip can compress +about as fast as gzip (lzip -0) or compress most files more than bzip2 +(lzip -9). Decompression speed is intermediate between gzip and bzip2. Lzip +is better than gzip and bzip2 from a data recovery perspective. Lzip has +been designed, written, and tested with great care to replace gzip and +bzip2 as the standard general-purpose compressed format for unix-like +systems. + +The format for running minilzip is: + + minilzip [OPTIONS] [FILES] + +If no file names are specified, minilzip compresses (or decompresses) from +standard input to standard output. A hyphen '-' used as a FILE argument +means standard input. It can be mixed with other FILES and is read just +once, the first time it appears in the command line. + + minilzip supports the following options: *Note Argument syntax: +(arg_parser)Argument syntax. + +'-h' +'--help' + Print an informative help message describing the options and exit. + +'-V' +'--version' + Print the version number of minilzip on the standard output and exit. + This version number should be included in all bug reports. + +'-a' +'--trailing-error' + Exit with error status 2 if any remaining input is detected after + decompressing the last member. Such remaining input is usually trailing + garbage that can be safely ignored. + +'-b BYTES' +'--member-size=BYTES' + When compressing, set the member size limit to BYTES. It is advisable + to keep members smaller than RAM size so that they can be repaired with + lziprecover in case of corruption. A small member size may degrade + compression ratio, so use it only when needed. Valid values range from + 100 kB to 2 PiB. Defaults to 2 PiB. + +'-c' +'--stdout' + Compress or decompress to standard output; keep input files unchanged. + If compressing several files, each file is compressed independently. + (The output consists of a sequence of independently compressed + members). This option (or '-o') is needed when reading from a named + pipe (fifo) or from a device. Use it also to recover as much of the + decompressed data as possible when decompressing a corrupt file. '-c' + overrides '-o' and '-S'. '-c' has no effect when testing or listing. + +'-d' +'--decompress' + Decompress the files specified. If a file does not exist, can't be + opened, or the destination file already exists and '--force' has not + been specified, minilzip continues decompressing the rest of the files + and exits with error status 1. If a file fails to decompress, or is a + terminal, minilzip exits immediately with error status 2 without + decompressing the rest of the files. A terminal is considered an + uncompressed file, and therefore invalid. + +'-f' +'--force' + Force overwrite of output files. + +'-F' +'--recompress' + When compressing, force re-compression of files whose name already has + the '.lz' or '.tlz' suffix. + +'-k' +'--keep' + Keep (don't delete) input files during compression or decompression. + +'-m BYTES' +'--match-length=BYTES' + When compressing, set the match length limit in bytes. After a match + this long is found, the search is finished. Valid values range from 5 + to 273. Larger values usually give better compression ratios but longer + compression times. + +'-o FILE' +'--output=FILE' + If '-c' has not been also specified, write the (de)compressed output to + FILE; keep input files unchanged. If compressing several files, each + file is compressed independently. (The output consists of a sequence of + independently compressed members). This option (or '-c') is needed when + reading from a named pipe (fifo) or from a device. '-o -' is + equivalent to '-c'. '-o' has no effect when testing or listing. + + When compressing and splitting the output in volumes, FILE is used as + a prefix, and several files named 'FILE00001.lz', 'FILE00002.lz', etc, + are created. In this case, only one input file is allowed. + +'-q' +'--quiet' + Quiet operation. Suppress all messages. + +'-s BYTES' +'--dictionary-size=BYTES' + When compressing, set the dictionary size limit in bytes. Minilzip + will use for each file the largest dictionary size that does not + exceed neither the file size nor this limit. Valid values range from + 4 KiB to 512 MiB. Values 12 to 29 are interpreted as powers of two, + meaning 2^12 to 2^29 bytes. Dictionary sizes are quantized so that + they can be coded in just one byte (*note coded-dict-size::). If the + size specified does not match one of the valid sizes, it will be + rounded upwards by adding up to (BYTES / 8) to it. + + For maximum compression you should use a dictionary size limit as large + as possible, but keep in mind that the decompression memory requirement + is affected at compression time by the choice of dictionary size limit. + +'-S BYTES' +'--volume-size=BYTES' + When compressing, and '-c' has not been also specified, split the + compressed output into several volume files with names + 'original_name00001.lz', 'original_name00002.lz', etc, and set the + volume size limit to BYTES. Input files are kept unchanged. Each + volume is a complete, maybe multimember, lzip file. A small volume + size may degrade compression ratio, so use it only when needed. Valid + values range from 100 kB to 4 EiB. + +'-t' +'--test' + Check integrity of the files specified, but don't decompress them. This + really performs a trial decompression and throws away the result. Use + it together with '-v' to see information about the files. If a file + fails the test, does not exist, can't be opened, or is a terminal, + minilzip continues checking the rest of the files. A final diagnostic + is shown at verbosity level 1 or higher if any file fails the test + when testing multiple files. + +'-v' +'--verbose' + Verbose mode. + When compressing, show the compression ratio and size for each file + processed. + When decompressing or testing, further -v's (up to 4) increase the + verbosity level, showing status, compression ratio, dictionary size, + and trailer contents (CRC, data size, member size). + +'-0 .. -9' + Compression level. Set the compression parameters (dictionary size and + match length limit) as shown in the table below. The default + compression level is '-6', equivalent to '-s8MiB -m36'. Note that '-9' + can be much slower than '-0'. These options have no effect when + decompressing or testing. + + The bidimensional parameter space of LZMA can't be mapped to a linear + scale optimal for all files. If your files are large, very repetitive, + etc, you may need to use the options '--dictionary-size' and + '--match-length' directly to achieve optimal performance. + + If several compression levels or '-s' or '-m' options are given, the + last setting is used. For example '-9 -s64MiB' is equivalent to + '-s64MiB -m273' + + Level Dictionary size (-s) Match length limit (-m) + -0 64 KiB 16 bytes + -1 1 MiB 5 bytes + -2 1.5 MiB 6 bytes + -3 2 MiB 8 bytes + -4 3 MiB 12 bytes + -5 4 MiB 20 bytes + -6 8 MiB 36 bytes + -7 16 MiB 68 bytes + -8 24 MiB 132 bytes + -9 32 MiB 273 bytes + +'--fast' +'--best' + Aliases for GNU gzip compatibility. + +'--loose-trailing' + When decompressing or testing, allow trailing data whose first bytes + are so similar to the magic bytes of a lzip header that they can be + confused with a corrupt header. Use this option if a file triggers a + "corrupt header" error and the cause is not indeed a corrupt header. + +'--check-lib' + Compare the version of lzlib used to compile minilzip with the version + actually being used at run time and exit. Report any differences + found. Exit with error status 1 if differences are found. A mismatch + may indicate that lzlib is not correctly installed or that a different + version of lzlib has been installed after compiling the shared version + of minilzip. Exit with error status 2 if LZ_API_VERSION and + LZ_version_string don't match. 'minilzip -v --check-lib' shows the + version of lzlib being used and the value of LZ_API_VERSION (if + defined). *Note Library version::. + + + Numbers given as arguments to options may be followed by a multiplier +and an optional 'B' for "byte". + + Table of SI and binary prefixes (unit multipliers): + +Prefix Value | Prefix Value +k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024) +M megabyte (10^6) | Mi mebibyte (2^20) +G gigabyte (10^9) | Gi gibibyte (2^30) +T terabyte (10^12) | Ti tebibyte (2^40) +P petabyte (10^15) | Pi pebibyte (2^50) +E exabyte (10^18) | Ei exbibyte (2^60) +Z zettabyte (10^21) | Zi zebibyte (2^70) +Y yottabyte (10^24) | Yi yobibyte (2^80) + + + Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid +input file, 3 for an internal consistency error (e.g., bug) which caused +minilzip to panic. + + +File: lzlib.info, Node: Data format, Next: Examples, Prev: Invoking minilzip, Up: Top + +10 Data format +************** + +Perfection is reached, not when there is no longer anything to add, but +when there is no longer anything to take away. +-- Antoine de Saint-Exupery + + + In the diagram below, a box like this: + ++---+ +| | <-- the vertical bars might be missing ++---+ + + represents one byte; a box like this: + ++==============+ +| | ++==============+ + + represents a variable number of bytes. + + + Lzip data consist of a series of independent "members" (compressed data +sets). The members simply appear one after another in the data stream, with +no additional information before, between, or after them. Each member can +encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The +size of a multimember data stream is unlimited. + + Each member has the following structure: + ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + All multibyte values are stored in little endian order. + +'ID string (the "magic" bytes)' + A four byte string, identifying the lzip format, with the value "LZIP" + (0x4C, 0x5A, 0x49, 0x50). + +'VN (version number, 1 byte)' + Just in case something needs to be modified in the future. 1 for now. + +'DS (coded dictionary size, 1 byte)' + The dictionary size is calculated by taking a power of 2 (the base + size) and subtracting from it a fraction between 0/16 and 7/16 of the + base size. + Bits 4-0 contain the base 2 logarithm of the base size (12 to 29). + Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract + from the base size to obtain the dictionary size. + Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB + Valid values for dictionary size range from 4 KiB to 512 MiB. + +'LZMA stream' + The LZMA stream, finished by an "End Of Stream" marker. Uses default + values for encoder properties. *Note Stream format: (lzip)Stream + format, for a complete description. + Lzip only uses the LZMA marker '2' ("End Of Stream" marker). Lzlib + also uses the LZMA marker '3' ("Sync Flush" marker). *Note + sync_flush::. + +'CRC32 (4 bytes)' + Cyclic Redundancy Check (CRC) of the original uncompressed data. + +'Data size (8 bytes)' + Size of the original uncompressed data. + +'Member size (8 bytes)' + Total size of the member, including header and trailer. This field acts + as a distributed index, allows the verification of stream integrity, + and facilitates the safe recovery of undamaged members from + multimember files. Member size should be limited to 2 PiB to prevent + the data size field from overflowing. + + + +File: lzlib.info, Node: Examples, Next: Problems, Prev: Data format, Up: Top + +11 A small tutorial with examples +********************************* + +This chapter provides real code examples for the most common uses of the +library. See these examples in context in the files 'bbexample.c' and +'ffexample.c' from the source distribution of lzlib. + + Note that the interface of lzlib is symmetrical. That is, the code for +normal compression and decompression is identical except because one calls +LZ_compress* functions while the other calls LZ_decompress* functions. + +* Menu: + +* Buffer compression:: Buffer-to-buffer single-member compression +* Buffer decompression:: Buffer-to-buffer decompression +* File compression:: File-to-file single-member compression +* File decompression:: File-to-file decompression +* File compression mm:: File-to-file multimember compression +* Skipping data errors:: Decompression with automatic resynchronization + + +File: lzlib.info, Node: Buffer compression, Next: Buffer decompression, Up: Examples + +11.1 Buffer compression +======================= + +Buffer-to-buffer single-member compression (MEMBER_SIZE > total output). + +/* Compress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the compressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbcompress( const uint8_t * const inbuf, const int insize, + const int dictionary_size, const int match_len_limit, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + struct LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, INT64_MAX ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { LZ_compress_close( encoder ); return false; } + + while( true ) + { + int ret = LZ_compress_write( encoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_compress_close( encoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +File: lzlib.info, Node: Buffer decompression, Next: File compression, Prev: Buffer compression, Up: Examples + +11.2 Buffer decompression +========================= + +Buffer-to-buffer decompression. + +/* Decompress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the decompressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbdecompress( const uint8_t * const inbuf, const int insize, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + struct LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { LZ_decompress_close( decoder ); return false; } + + while( true ) + { + int ret = LZ_decompress_write( decoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_decompress_close( decoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } + + +File: lzlib.info, Node: File compression, Next: File decompression, Prev: Buffer decompression, Up: Examples + +11.3 File compression +===================== + +File-to-file compression using LZ_compress_write_size. + +int ffcompress( struct LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_finished( encoder ) == 1 ) return 0; + } + return 1; + } + + +File: lzlib.info, Node: File decompression, Next: File compression mm, Prev: File compression, Up: Examples + +11.4 File decompression +======================= + +File-to-file decompression using LZ_decompress_write_size. + +int ffdecompress( struct LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +File: lzlib.info, Node: File compression mm, Next: Skipping data errors, Prev: File decompression, Up: Examples + +11.5 File-to-file multimember compression +========================================= + +Example 1: Multimember compression with members of fixed size +(MEMBER_SIZE < total output). + +int ffmmcompress( FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384, member_size = 4096 }; + uint8_t buffer[buffer_size]; + bool done = false; + struct LZ_Encoder * const encoder = + LZ_compress_open( 65535, 16, member_size ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); return 1; } + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( LZ_compress_finished( encoder ) == 1 ) { done = true; break; } + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) break; + } + } + if( LZ_compress_close( encoder ) < 0 ) done = false; + return done; + } + + +Example 2: Multimember compression (user-restarted members). (Call +LZ_compress_open with MEMBER_SIZE > largest member). + +/* Compress 'infile' to 'outfile' as a multimember stream with one member + for each line of text terminated by a newline character or by EOF. + Return 0 if success, 1 if error. +*/ +int fflfcompress( struct LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + for( len = 0; len < size; ) + { + int ch = getc( infile ); + if( ch == EOF || ( buffer[len++] = ch ) == '\n' ) break; + } + /* avoid writing an empty member to outfile */ + if( len == 0 && LZ_compress_data_position( encoder ) == 0 ) return 0; + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) || buffer[len-1] == '\n' ) + LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( feof( infile ) && LZ_compress_finished( encoder ) == 1 ) return 0; + if( LZ_compress_restart_member( encoder, INT64_MAX ) < 0 ) break; + } + } + return 1; + } + + +File: lzlib.info, Node: Skipping data errors, Prev: File compression mm, Up: Examples + +11.6 Skipping data errors +========================= + +/* Decompress 'infile' to 'outfile' with automatic resynchronization to + next member in case of data error, including the automatic removal of + leading garbage. +*/ +int ffrsdecompress( struct LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) + { + if( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) + { LZ_decompress_sync_to_member( decoder ); continue; } + else break; + } + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +File: lzlib.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top + +12 Reporting bugs +***************** + +There are probably bugs in lzlib. There are certainly errors and omissions +in this manual. If you report them, they will get fixed. If you don't, no +one will ever know about them and they will remain unfixed for all +eternity, if not longer. + + If you find a bug in lzlib, please send electronic mail to +<lzip-bug@nongnu.org>. Include the version number, which you can find by +running 'minilzip --version' and 'minilzip -v --check-lib'. + + +File: lzlib.info, Node: Concept index, Prev: Problems, Up: Top + +Concept index +************* + + +* Menu: + +* buffer compression: Buffer compression. (line 6) +* buffer decompression: Buffer decompression. (line 6) +* buffering: Buffering. (line 6) +* bugs: Problems. (line 6) +* compression functions: Compression functions. (line 6) +* data format: Data format. (line 6) +* decompression functions: Decompression functions. (line 6) +* error codes: Error codes. (line 6) +* error messages: Error messages. (line 6) +* examples: Examples. (line 6) +* file compression: File compression. (line 6) +* file decompression: File decompression. (line 6) +* getting help: Problems. (line 6) +* introduction: Introduction. (line 6) +* invoking: Invoking minilzip. (line 6) +* library version: Library version. (line 6) +* multimember compression: File compression mm. (line 6) +* options: Invoking minilzip. (line 6) +* parameter limits: Parameter limits. (line 6) +* skipping data errors: Skipping data errors. (line 6) + + + +Tag Table: +Node: Top215 +Node: Introduction1338 +Node: Library version6413 +Node: Buffering8957 +Node: Parameter limits10182 +Node: Compression functions11136 +Ref: member_size12946 +Ref: sync_flush14712 +Node: Decompression functions19400 +Node: Error codes26968 +Node: Error messages29259 +Node: Invoking minilzip29838 +Node: Data format39786 +Ref: coded-dict-size41232 +Node: Examples42641 +Node: Buffer compression43602 +Node: Buffer decompression45122 +Node: File compression46536 +Node: File decompression47519 +Node: File compression mm48523 +Node: Skipping data errors51552 +Node: Problems52862 +Node: Concept index53423 + +End Tag Table + + +Local Variables: +coding: iso-8859-15 +End: diff --git a/doc/lzlib.texi b/doc/lzlib.texi new file mode 100644 index 0000000..3caf9dd --- /dev/null +++ b/doc/lzlib.texi @@ -0,0 +1,1395 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename lzlib.info +@documentencoding ISO-8859-15 +@settitle Lzlib Manual +@finalout +@c %**end of header + +@set UPDATED 23 January 2022 +@set VERSION 1.13 + +@dircategory Compression +@direntry +* Lzlib: (lzlib). Compression library for the lzip format +@end direntry + + +@ifnothtml +@titlepage +@title Lzlib +@subtitle Compression library for the lzip format +@subtitle for Lzlib version @value{VERSION}, @value{UPDATED} +@author by Antonio Diaz Diaz + +@page +@vskip 0pt plus 1filll +@end titlepage + +@contents +@end ifnothtml + +@ifnottex +@node Top +@top + +This manual is for Lzlib (version @value{VERSION}, @value{UPDATED}). + +@menu +* Introduction:: Purpose and features of lzlib +* Library version:: Checking library version +* Buffering:: Sizes of lzlib's buffers +* Parameter limits:: Min / max values for some parameters +* Compression functions:: Descriptions of the compression functions +* Decompression functions:: Descriptions of the decompression functions +* Error codes:: Meaning of codes returned by functions +* Error messages:: Error messages corresponding to error codes +* Invoking minilzip:: Command line interface of the test program +* Data format:: Detailed format of the compressed data +* Examples:: A small tutorial with examples +* Problems:: Reporting bugs +* Concept index:: Index of concepts +@end menu + +@sp 1 +Copyright @copyright{} 2009-2022 Antonio Diaz Diaz. + +This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. +@end ifnottex + + +@node Introduction +@chapter Introduction +@cindex introduction + +@uref{http://www.nongnu.org/lzip/lzlib.html,,Lzlib} +is a data compression library providing in-memory LZMA compression and +decompression functions, including integrity checking of the decompressed +data. The compressed data format used by the library is the lzip format. +Lzlib is written in C. + +The lzip file format is designed for data sharing and long-term archiving, +taking into account both data integrity and decoder availability: + +@itemize @bullet +@item +The lzip format provides very safe integrity checking and some data +recovery means. The program +@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} +can repair bit flip errors (one of the most common forms of data corruption) +in lzip files, and provides data recovery capabilities, including +error-checked merging of damaged copies of a file. +@ifnothtml +@xref{Data safety,,,lziprecover}. +@end ifnothtml + +@item +The lzip format is as simple as possible (but not simpler). The lzip +manual provides the source code of a simple decompressor along with a +detailed explanation of how it works, so that with the only help of the +lzip manual it would be possible for a digital archaeologist to extract +the data from a lzip file long after quantum computers eventually +render LZMA obsolete. + +@item +Additionally the lzip reference implementation is copylefted, which +guarantees that it will remain free forever. +@end itemize + +A nice feature of the lzip format is that a corrupt byte is easier to repair +the nearer it is from the beginning of the file. Therefore, with the help of +lziprecover, losing an entire archive just because of a corrupt byte near +the beginning is a thing of the past. + +The functions and variables forming the interface of the compression library +are declared in the file @samp{lzlib.h}. Usage examples of the library are +given in the files @samp{bbexample.c}, @samp{ffexample.c}, and +@samp{minilzip.c} from the source distribution. + +All the library functions are thread safe. The library does not install any +signal handler. The decoder checks the consistency of the compressed data, +so the library should never crash even in case of corrupted input. + +Compression/decompression is done by repeatedly calling a couple of +read/write functions until all the data have been processed by the library. +This interface is safer and less error prone than the traditional zlib +interface. + +Compression/decompression is done when the read function is called. This +means the value returned by the position functions will not be updated until +a read call, even if a lot of data are written. If you want the data to be +compressed in advance, just call the read function with a @var{size} equal +to 0. + +If all the data to be compressed are written in advance, lzlib will +automatically adjust the header of the compressed data to use the largest +dictionary size that does not exceed neither the data size nor the limit +given to @samp{LZ_compress_open}. This feature reduces the amount of memory +needed for decompression and allows minilzip to produce identical compressed +output as lzip. + +Lzlib will correctly decompress a data stream which is the concatenation of +two or more compressed data streams. The result is the concatenation of the +corresponding decompressed data streams. Integrity testing of concatenated +compressed data streams is also supported. + +Lzlib is able to compress and decompress streams of unlimited size by +automatically creating multimember output. The members so created are large, +about @w{2 PiB} each. + +In spite of its name (Lempel-Ziv-Markov chain-Algorithm), LZMA is not a +concrete algorithm; it is more like "any algorithm using the LZMA coding +scheme". For example, the option @samp{-0} of lzip uses the scheme in almost the +simplest way possible; issuing the longest match it can find, or a literal +byte if it can't find a match. Inversely, a much more elaborated way of +finding coding sequences of minimum size than the one currently used by lzip +could be developed, and the resulting sequence could also be coded using the +LZMA coding scheme. + +Lzlib currently implements two variants of the LZMA algorithm: fast (used by +option @samp{-0} of minilzip) and normal (used by all other compression levels). + +The high compression of LZMA comes from combining two basic, well-proven +compression ideas: sliding dictionaries (LZ77/78) and markov models (the +thing used by every compression algorithm that uses a range encoder or +similar order-0 entropy coder as its last stage) with segregation of +contexts according to what the bits are used for. + +The ideas embodied in lzlib are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). + +LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have +been compressed. Decompressed is used to refer to data which have undergone +the process of decompression. + + +@node Library version +@chapter Library version +@cindex library version + +One goal of lzlib is to keep perfect backward compatibility with older +versions of itself down to 1.0. Any application working with an older lzlib +should work with a newer lzlib. Installing a newer lzlib should not break +anything. This chapter describes the constants and functions that the +application can use to discover the version of the library being used. All +of them are declared in @samp{lzlib.h}. + +@defvr Constant LZ_API_VERSION +This constant is defined in @samp{lzlib.h} and works as a version test +macro. The application should verify at compile time that LZ_API_VERSION is +greater than or equal to the version required by the application: + +@example +#if !defined LZ_API_VERSION || LZ_API_VERSION < 1012 +#error "lzlib 1.12 or newer needed." +#endif +@end example + +Before version 1.8, lzlib didn't define LZ_API_VERSION.@* +LZ_API_VERSION was first defined in lzlib 1.8 to 1.@* +Since lzlib 1.12, LZ_API_VERSION is defined as (major * 1000 + minor). +@end defvr + +NOTE: Version test macros are the library's way of announcing functionality +to the application. They should not be confused with feature test macros, +which allow the application to announce to the library its desire to have +certain symbols and prototypes exposed. + +@deftypefun int LZ_api_version ( void ) +If LZ_API_VERSION >= 1012, this function is declared in @samp{lzlib.h} (else +it doesn't exist). It returns the LZ_API_VERSION of the library object code +being used. The application should verify at run time that the value +returned by @code{LZ_api_version} is greater than or equal to the version +required by the application. An application may be dinamically linked at run +time with a different version of lzlib than the one it was compiled for, and +this should not break the program as long as the library used provides the +functionality required by the application. + +@example +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + if( LZ_api_version() < 1012 ) + show_error( "lzlib 1.12 or newer needed." ); +#endif +@end example +@end deftypefun + +@deftypevr Constant {const char *} LZ_version_string +This string constant is defined in the header file @samp{lzlib.h} and +represents the version of the library being used at compile time. +@end deftypevr + +@deftypefun {const char *} LZ_version ( void ) +This function returns a string representing the version of the library being +used at run time. +@end deftypefun + + +@node Buffering +@chapter Buffering +@cindex buffering + +Lzlib internal functions need access to a memory chunk at least as large +as the dictionary size (sliding window). For efficiency reasons, the +input buffer for compression is twice or sixteen times as large as the +dictionary size. + +Finally, for safety reasons, lzlib uses two more internal buffers. + +These are the four buffers used by lzlib, and their guaranteed minimum sizes: + +@itemize @bullet +@item Input compression buffer. Written to by the function +@samp{LZ_compress_write}. For the normal variant of LZMA, its size is two +times the dictionary size set with the function @samp{LZ_compress_open} or +@w{64 KiB}, whichever is larger. For the fast variant, its size is @w{1 MiB}. + +@item Output compression buffer. Read from by the function +@samp{LZ_compress_read}. Its size is @w{64 KiB}. + +@item Input decompression buffer. Written to by the function +@samp{LZ_decompress_write}. Its size is @w{64 KiB}. + +@item Output decompression buffer. Read from by the function +@samp{LZ_decompress_read}. Its size is the dictionary size set in the header +of the member currently being decompressed or @w{64 KiB}, whichever is larger. +@end itemize + + +@node Parameter limits +@chapter Parameter limits +@cindex parameter limits + +These functions provide minimum and maximum values for some parameters. +Current values are shown in square brackets. + +@deftypefun int LZ_min_dictionary_bits ( void ) +Returns the base 2 logarithm of the smallest valid dictionary size [12]. +@end deftypefun + +@deftypefun int LZ_min_dictionary_size ( void ) +Returns the smallest valid dictionary size [4 KiB]. +@end deftypefun + +@deftypefun int LZ_max_dictionary_bits ( void ) +Returns the base 2 logarithm of the largest valid dictionary size [29]. +@end deftypefun + +@deftypefun int LZ_max_dictionary_size ( void ) +Returns the largest valid dictionary size [512 MiB]. +@end deftypefun + +@deftypefun int LZ_min_match_len_limit ( void ) +Returns the smallest valid match length limit [5]. +@end deftypefun + +@deftypefun int LZ_max_match_len_limit ( void ) +Returns the largest valid match length limit [273]. +@end deftypefun + + +@node Compression functions +@chapter Compression functions +@cindex compression functions + +These are the functions used to compress data. In case of error, all of +them return -1 or 0, for signed and unsigned return values respectively, +except @samp{LZ_compress_open} whose return value must be verified by +calling @samp{LZ_compress_errno} before using it. + + +@deftypefun {struct LZ_Encoder *} LZ_compress_open ( const int @var{dictionary_size}, const int @var{match_len_limit}, const unsigned long long @var{member_size} ) +Initializes the internal stream state for compression and returns a +pointer that can only be used as the @var{encoder} argument for the +other LZ_compress functions, or a null pointer if the encoder could not +be allocated. + +The returned pointer must be verified by calling +@samp{LZ_compress_errno} before using it. If @samp{LZ_compress_errno} +does not return @samp{LZ_ok}, the returned pointer must not be used and +should be freed with @samp{LZ_compress_close} to avoid memory leaks. + +@var{dictionary_size} sets the dictionary size to be used, in bytes. +Valid values range from @w{4 KiB} to @w{512 MiB}. Note that dictionary +sizes are quantized. If the size specified does not match one of the +valid sizes, it will be rounded upwards by adding up to +@w{(@var{dictionary_size} / 8)} to it. + +@var{match_len_limit} sets the match length limit in bytes. Valid values +range from 5 to 273. Larger values usually give better compression ratios +but longer compression times. + +If @var{dictionary_size} is 65535 and @var{match_len_limit} is 16, the fast +variant of LZMA is chosen, which produces identical compressed output as +@w{@samp{lzip -0}}. (The dictionary size used will be rounded upwards to +@w{64 KiB}). + +@anchor{member_size} +@var{member_size} sets the member size limit in bytes. Valid values range +from @w{4 KiB} to @w{2 PiB}. A small member size may degrade compression +ratio, so use it only when needed. To produce a single-member data stream, +give @var{member_size} a value larger than the amount of data to be +produced. Values larger than @w{2 PiB} will be reduced to @w{2 PiB} to +prevent the uncompressed size of the member from overflowing. +@end deftypefun + + +@deftypefun int LZ_compress_close ( struct LZ_Encoder * const @var{encoder} ) +Frees all dynamically allocated data structures for this stream. This +function discards any unprocessed input and does not flush any pending +output. After a call to @samp{LZ_compress_close}, @var{encoder} can no +longer be used as an argument to any LZ_compress function. +It is safe to call @samp{LZ_compress_close} with a null argument. +@end deftypefun + + +@deftypefun int LZ_compress_finish ( struct LZ_Encoder * const @var{encoder} ) +Use this function to tell @samp{lzlib} that all the data for this member +have already been written (with the function @samp{LZ_compress_write}). +It is safe to call @samp{LZ_compress_finish} as many times as needed. +After all the compressed data have been read with @samp{LZ_compress_read} +and @samp{LZ_compress_member_finished} returns 1, a new member can be +started with @samp{LZ_compress_restart_member}. +@end deftypefun + + +@deftypefun int LZ_compress_restart_member ( struct LZ_Encoder * const @var{encoder}, const unsigned long long @var{member_size} ) +Use this function to start a new member in a multimember data stream. Call +this function only after @samp{LZ_compress_member_finished} indicates that +the current member has been fully read (with the function +@samp{LZ_compress_read}). @xref{member_size}, for a description of +@var{member_size}. +@end deftypefun + + +@anchor{sync_flush} +@deftypefun int LZ_compress_sync_flush ( struct LZ_Encoder * const @var{encoder} ) +Use this function to make available to @samp{LZ_compress_read} all the data +already written with the function @samp{LZ_compress_write}. First call +@samp{LZ_compress_sync_flush}. Then call @samp{LZ_compress_read} until it +returns 0. + +This function writes at least one LZMA marker @samp{3} ("Sync Flush" marker) +to the compressed output. Note that the sync flush marker is not allowed in +lzip files; it is a device for interactive communication between +applications using lzlib, but is useless and wasteful in a file, and is +excluded from the media type @samp{application/lzip}. The LZMA marker +@samp{2} ("End Of Stream" marker) is the only marker allowed in lzip files. +@xref{Data format}. + +Repeated use of @samp{LZ_compress_sync_flush} may degrade compression +ratio, so use it only when needed. If the interval between calls to +@samp{LZ_compress_sync_flush} is large (comparable to dictionary size), +creating a multimember data stream with @samp{LZ_compress_restart_member} +may be an alternative. + +Combining multimember stream creation with flushing may be tricky. If there +are more bytes available than those needed to complete @var{member_size}, +@samp{LZ_compress_restart_member} needs to be called when +@samp{LZ_compress_member_finished} returns 1, followed by a new call to +@samp{LZ_compress_sync_flush}. +@end deftypefun + + +@deftypefun int LZ_compress_read ( struct LZ_Encoder * const @var{encoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Reads up to @var{size} bytes from the stream pointed to by @var{encoder}, +storing the results in @var{buffer}. If @w{LZ_API_VERSION >= 1012}, +@var{buffer} may be a null pointer, in which case the bytes read are +discarded. + +Returns the number of bytes actually read. This might be less than +@var{size}; for example, if there aren't that many bytes left in the stream +or if more bytes have to be yet written with the function +@samp{LZ_compress_write}. Note that reading less than @var{size} bytes is +not an error. +@end deftypefun + + +@deftypefun int LZ_compress_write ( struct LZ_Encoder * const @var{encoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Writes up to @var{size} bytes from @var{buffer} to the stream pointed to by +@var{encoder}. Returns the number of bytes actually written. This might be +less than @var{size}. Note that writing less than @var{size} bytes is not an +error. +@end deftypefun + + +@deftypefun int LZ_compress_write_size ( struct LZ_Encoder * const @var{encoder} ) +Returns the maximum number of bytes that can be immediately written through +@samp{LZ_compress_write}. For efficiency reasons, once the input buffer is +full and @samp{LZ_compress_write_size} returns 0, almost all the buffer must +be compressed before a size greater than 0 is returned again. (This is done +to minimize the amount of data that must be copied to the beginning of the +buffer before new data can be accepted). + +It is guaranteed that an immediate call to @samp{LZ_compress_write} will +accept a @var{size} up to the returned number of bytes. +@end deftypefun + + +@deftypefun {enum LZ_Errno} LZ_compress_errno ( struct LZ_Encoder * const @var{encoder} ) +Returns the current error code for @var{encoder}. @xref{Error codes}. +It is safe to call @samp{LZ_compress_errno} with a null argument, in which +case it returns @samp{LZ_bad_argument}. +@end deftypefun + + +@deftypefun int LZ_compress_finished ( struct LZ_Encoder * const @var{encoder} ) +Returns 1 if all the data have been read and @samp{LZ_compress_close} +can be safely called. Otherwise it returns 0. @samp{LZ_compress_finished} +implies @samp{LZ_compress_member_finished}. +@end deftypefun + + +@deftypefun int LZ_compress_member_finished ( struct LZ_Encoder * const @var{encoder} ) +Returns 1 if the current member, in a multimember data stream, has been +fully read and @samp{LZ_compress_restart_member} can be safely called. +Otherwise it returns 0. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_compress_data_position ( struct LZ_Encoder * const @var{encoder} ) +Returns the number of input bytes already compressed in the current member. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_compress_member_position ( struct LZ_Encoder * const @var{encoder} ) +Returns the number of compressed bytes already produced, but perhaps not +yet read, in the current member. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_compress_total_in_size ( struct LZ_Encoder * const @var{encoder} ) +Returns the total number of input bytes already compressed. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_compress_total_out_size ( struct LZ_Encoder * const @var{encoder} ) +Returns the total number of compressed bytes already produced, but +perhaps not yet read. +@end deftypefun + + +@node Decompression functions +@chapter Decompression functions +@cindex decompression functions + +These are the functions used to decompress data. In case of error, all of +them return -1 or 0, for signed and unsigned return values respectively, +except @samp{LZ_decompress_open} whose return value must be verified by +calling @samp{LZ_decompress_errno} before using it. + + +@deftypefun {struct LZ_Decoder *} LZ_decompress_open ( void ) +Initializes the internal stream state for decompression and returns a +pointer that can only be used as the @var{decoder} argument for the +other LZ_decompress functions, or a null pointer if the decoder could +not be allocated. + +The returned pointer must be verified by calling +@samp{LZ_decompress_errno} before using it. If +@samp{LZ_decompress_errno} does not return @samp{LZ_ok}, the returned +pointer must not be used and should be freed with +@samp{LZ_decompress_close} to avoid memory leaks. +@end deftypefun + + +@deftypefun int LZ_decompress_close ( struct LZ_Decoder * const @var{decoder} ) +Frees all dynamically allocated data structures for this stream. This +function discards any unprocessed input and does not flush any pending +output. After a call to @samp{LZ_decompress_close}, @var{decoder} can no +longer be used as an argument to any LZ_decompress function. +It is safe to call @samp{LZ_decompress_close} with a null argument. +@end deftypefun + + +@deftypefun int LZ_decompress_finish ( struct LZ_Decoder * const @var{decoder} ) +Use this function to tell @samp{lzlib} that all the data for this stream +have already been written (with the function @samp{LZ_decompress_write}). +It is safe to call @samp{LZ_decompress_finish} as many times as needed. +It is not required to call @samp{LZ_decompress_finish} if the input stream +only contains whole members, but not calling it prevents lzlib from +detecting a truncated member. +@end deftypefun + + +@deftypefun int LZ_decompress_reset ( struct LZ_Decoder * const @var{decoder} ) +Resets the internal state of @var{decoder} as it was just after opening +it with the function @samp{LZ_decompress_open}. Data stored in the +internal buffers is discarded. Position counters are set to 0. +@end deftypefun + + +@deftypefun int LZ_decompress_sync_to_member ( struct LZ_Decoder * const @var{decoder} ) +Resets the error state of @var{decoder} and enters a search state that +lasts until a new member header (or the end of the stream) is found. +After a successful call to @samp{LZ_decompress_sync_to_member}, data +written with @samp{LZ_decompress_write} will be consumed and +@samp{LZ_decompress_read} will return 0 until a header is found. + +This function is useful to discard any data preceding the first member, +or to discard the rest of the current member, for example in case of a +data error. If the decoder is already at the beginning of a member, this +function does nothing. +@end deftypefun + + +@deftypefun int LZ_decompress_read ( struct LZ_Decoder * const @var{decoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Reads up to @var{size} bytes from the stream pointed to by @var{decoder}, +storing the results in @var{buffer}. If @w{LZ_API_VERSION >= 1012}, +@var{buffer} may be a null pointer, in which case the bytes read are +discarded. + +Returns the number of bytes actually read. This might be less than +@var{size}; for example, if there aren't that many bytes left in the stream +or if more bytes have to be yet written with the function +@samp{LZ_decompress_write}. Note that reading less than @var{size} bytes is +not an error. + +@samp{LZ_decompress_read} returns at least once per member so that +@samp{LZ_decompress_member_finished} can be called (and trailer data +retrieved) for each member, even for empty members. Therefore, +@samp{LZ_decompress_read} returning 0 does not mean that the end of the +stream has been reached. The increase in the value returned by +@samp{LZ_decompress_total_in_size} can be used to tell the end of the stream +from an empty member. + +In case of decompression error caused by corrupt or truncated data, +@samp{LZ_decompress_read} does not signal the error immediately to the +application, but waits until all the bytes decoded have been read. This +allows tools like +@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} to +recover as much data as possible from each damaged member. +@ifnothtml +@xref{Top,tarlz manual,,tarlz}. +@end ifnothtml +@end deftypefun + + +@deftypefun int LZ_decompress_write ( struct LZ_Decoder * const @var{decoder}, uint8_t * const @var{buffer}, const int @var{size} ) +Writes up to @var{size} bytes from @var{buffer} to the stream pointed to by +@var{decoder}. Returns the number of bytes actually written. This might be +less than @var{size}. Note that writing less than @var{size} bytes is not an +error. +@end deftypefun + + +@deftypefun int LZ_decompress_write_size ( struct LZ_Decoder * const @var{decoder} ) +Returns the maximum number of bytes that can be immediately written through +@samp{LZ_decompress_write}. This number varies smoothly; each compressed +byte consumed may be overwritten immediately, increasing by 1 the value +returned. + +It is guaranteed that an immediate call to @samp{LZ_decompress_write} will +accept a @var{size} up to the returned number of bytes. +@end deftypefun + + +@deftypefun {enum LZ_Errno} LZ_decompress_errno ( struct LZ_Decoder * const @var{decoder} ) +Returns the current error code for @var{decoder}. @xref{Error codes}. +It is safe to call @samp{LZ_decompress_errno} with a null argument, in which +case it returns @samp{LZ_bad_argument}. +@end deftypefun + + +@deftypefun int LZ_decompress_finished ( struct LZ_Decoder * const @var{decoder} ) +Returns 1 if all the data have been read and @samp{LZ_decompress_close} +can be safely called. Otherwise it returns 0. @samp{LZ_decompress_finished} +does not imply @samp{LZ_decompress_member_finished}. +@end deftypefun + + +@deftypefun int LZ_decompress_member_finished ( struct LZ_Decoder * const @var{decoder} ) +Returns 1 if the previous call to @samp{LZ_decompress_read} finished reading +the current member, indicating that final values for the member are available +through @samp{LZ_decompress_data_crc}, @samp{LZ_decompress_data_position}, +and @samp{LZ_decompress_member_position}. Otherwise it returns 0. +@end deftypefun + + +@deftypefun int LZ_decompress_member_version ( struct LZ_Decoder * const @var{decoder} ) +Returns the version of the current member, read from the member header. +@end deftypefun + + +@deftypefun int LZ_decompress_dictionary_size ( struct LZ_Decoder * const @var{decoder} ) +Returns the dictionary size of the current member, read from the member header. +@end deftypefun + + +@deftypefun {unsigned} LZ_decompress_data_crc ( struct LZ_Decoder * const @var{decoder} ) +Returns the 32 bit Cyclic Redundancy Check of the data decompressed from +the current member. The value returned is valid only when +@samp{LZ_decompress_member_finished} returns 1. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_decompress_data_position ( struct LZ_Decoder * const @var{decoder} ) +Returns the number of decompressed bytes already produced, but perhaps +not yet read, in the current member. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_decompress_member_position ( struct LZ_Decoder * const @var{decoder} ) +Returns the number of input bytes already decompressed in the current member. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_decompress_total_in_size ( struct LZ_Decoder * const @var{decoder} ) +Returns the total number of input bytes already decompressed. +@end deftypefun + + +@deftypefun {unsigned long long} LZ_decompress_total_out_size ( struct LZ_Decoder * const @var{decoder} ) +Returns the total number of decompressed bytes already produced, but +perhaps not yet read. +@end deftypefun + + +@node Error codes +@chapter Error codes +@cindex error codes + +Most library functions return -1 to indicate that they have failed. But +this return value only tells you that an error has occurred. To find out +what kind of error it was, you need to verify the error code by calling +@samp{LZ_(de)compress_errno}. + +Library functions don't change the value returned by +@samp{LZ_(de)compress_errno} when they succeed; thus, the value returned +by @samp{LZ_(de)compress_errno} after a successful call is not +necessarily LZ_ok, and you should not use @samp{LZ_(de)compress_errno} +to determine whether a call failed. If the call failed, then you can +examine @samp{LZ_(de)compress_errno}. + +The error codes are defined in the header file @samp{lzlib.h}. + +@deftypevr Constant {enum LZ_Errno} LZ_ok +The value of this constant is 0 and is used to indicate that there is no error. +@end deftypevr + +@deftypevr Constant {enum LZ_Errno} LZ_bad_argument +At least one of the arguments passed to the library function was invalid. +@end deftypevr + +@deftypevr Constant {enum LZ_Errno} LZ_mem_error +No memory available. The system cannot allocate more virtual memory +because its capacity is full. +@end deftypevr + +@deftypevr Constant {enum LZ_Errno} LZ_sequence_error +A library function was called in the wrong order. For example +@samp{LZ_compress_restart_member} was called before +@samp{LZ_compress_member_finished} indicates that the current member is +finished. +@end deftypevr + +@deftypevr Constant {enum LZ_Errno} LZ_header_error +An invalid member header (one with the wrong magic bytes) was read. If +this happens at the end of the data stream it may indicate trailing data. +@end deftypevr + +@deftypevr Constant {enum LZ_Errno} LZ_unexpected_eof +The end of the data stream was reached in the middle of a member. +@end deftypevr + +@deftypevr Constant {enum LZ_Errno} LZ_data_error +The data stream is corrupt. If @samp{LZ_decompress_member_position} is 6 +or less, it indicates either a format version not supported, an invalid +dictionary size, a corrupt header in a multimember data stream, or +trailing data too similar to a valid lzip header. Lziprecover can be +used to remove conflicting trailing data from a file. +@end deftypevr + +@deftypevr Constant {enum LZ_Errno} LZ_library_error +A bug was detected in the library. Please, report it. @xref{Problems}. +@end deftypevr + + +@node Error messages +@chapter Error messages +@cindex error messages + +@deftypefun {const char *} LZ_strerror ( const enum LZ_Errno @var{lz_errno} ) +Returns the standard error message for a given error code. The messages +are fairly short; there are no multi-line messages or embedded newlines. +This function makes it easy for your program to report informative error +messages about the failure of a library call. + +The value of @var{lz_errno} normally comes from a call to +@samp{LZ_(de)compress_errno}. +@end deftypefun + + +@node Invoking minilzip +@chapter Invoking minilzip +@cindex invoking +@cindex options + +Minilzip is a test program for the compression library lzlib, fully +compatible with lzip 1.4 or newer. + +@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} +is a lossless data compressor with a user interface similar to the one +of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov +chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity +checking to maximize interoperability and optimize safety. Lzip can compress +about as fast as gzip @w{(lzip -0)} or compress most files more than bzip2 +@w{(lzip -9)}. Decompression speed is intermediate between gzip and bzip2. +Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip +has been designed, written, and tested with great care to replace gzip and +bzip2 as the standard general-purpose compressed format for unix-like +systems. + +@noindent +The format for running minilzip is: + +@example +minilzip [@var{options}] [@var{files}] +@end example + +@noindent +If no file names are specified, minilzip compresses (or decompresses) from +standard input to standard output. A hyphen @samp{-} used as a @var{file} +argument means standard input. It can be mixed with other @var{files} and is +read just once, the first time it appears in the command line. + +minilzip supports the following +@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}: +@ifnothtml +@xref{Argument syntax,,,arg_parser}. +@end ifnothtml + +@table @code +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of minilzip on the standard output and exit. +This version number should be included in all bug reports. + +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. + +@item -b @var{bytes} +@itemx --member-size=@var{bytes} +When compressing, set the member size limit to @var{bytes}. It is advisable +to keep members smaller than RAM size so that they can be repaired with +lziprecover in case of corruption. A small member size may degrade +compression ratio, so use it only when needed. Valid values range from +@w{100 kB} to @w{2 PiB}. Defaults to @w{2 PiB}. + +@item -c +@itemx --stdout +Compress or decompress to standard output; keep input files unchanged. If +compressing several files, each file is compressed independently. (The +output consists of a sequence of independently compressed members). This +option (or @samp{-o}) is needed when reading from a named pipe (fifo) or +from a device. Use it also to recover as much of the decompressed data as +possible when decompressing a corrupt file. @samp{-c} overrides @samp{-o} +and @samp{-S}. @samp{-c} has no effect when testing or listing. + +@item -d +@itemx --decompress +Decompress the files specified. If a file does not exist, can't be opened, +or the destination file already exists and @samp{--force} has not been +specified, minilzip continues decompressing the rest of the files and exits with +error status 1. If a file fails to decompress, or is a terminal, minilzip exits +immediately with error status 2 without decompressing the rest of the files. +A terminal is considered an uncompressed file, and therefore invalid. + +@item -f +@itemx --force +Force overwrite of output files. + +@item -F +@itemx --recompress +When compressing, force re-compression of files whose name already has +the @samp{.lz} or @samp{.tlz} suffix. + +@item -k +@itemx --keep +Keep (don't delete) input files during compression or decompression. + +@item -m @var{bytes} +@itemx --match-length=@var{bytes} +When compressing, set the match length limit in bytes. After a match +this long is found, the search is finished. Valid values range from 5 to +273. Larger values usually give better compression ratios but longer +compression times. + +@item -o @var{file} +@itemx --output=@var{file} +If @samp{-c} has not been also specified, write the (de)compressed output to +@var{file}; keep input files unchanged. If compressing several files, each +file is compressed independently. (The output consists of a sequence of +independently compressed members). This option (or @samp{-c}) is needed when +reading from a named pipe (fifo) or from a device. @w{@samp{-o -}} is +equivalent to @samp{-c}. @samp{-o} has no effect when testing or listing. + +When compressing and splitting the output in volumes, @var{file} is used as +a prefix, and several files named @samp{@var{file}00001.lz}, +@samp{@var{file}00002.lz}, etc, are created. In this case, only one input +file is allowed. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@item -s @var{bytes} +@itemx --dictionary-size=@var{bytes} +When compressing, set the dictionary size limit in bytes. Minilzip will use +for each file the largest dictionary size that does not exceed neither +the file size nor this limit. Valid values range from @w{4 KiB} to +@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning +2^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be +coded in just one byte (@pxref{coded-dict-size}). If the size specified +does not match one of the valid sizes, it will be rounded upwards by +adding up to @w{(@var{bytes} / 8)} to it. + +For maximum compression you should use a dictionary size limit as large +as possible, but keep in mind that the decompression memory requirement +is affected at compression time by the choice of dictionary size limit. + +@item -S @var{bytes} +@itemx --volume-size=@var{bytes} +When compressing, and @samp{-c} has not been also specified, split the +compressed output into several volume files with names +@samp{original_name00001.lz}, @samp{original_name00002.lz}, etc, and set the +volume size limit to @var{bytes}. Input files are kept unchanged. Each +volume is a complete, maybe multimember, lzip file. A small volume size may +degrade compression ratio, so use it only when needed. Valid values range +from @w{100 kB} to @w{4 EiB}. + +@item -t +@itemx --test +Check integrity of the files specified, but don't decompress them. This +really performs a trial decompression and throws away the result. Use it +together with @samp{-v} to see information about the files. If a file +fails the test, does not exist, can't be opened, or is a terminal, minilzip +continues checking the rest of the files. A final diagnostic is shown at +verbosity level 1 or higher if any file fails the test when testing +multiple files. + +@item -v +@itemx --verbose +Verbose mode.@* +When compressing, show the compression ratio and size for each file +processed.@* +When decompressing or testing, further -v's (up to 4) increase the +verbosity level, showing status, compression ratio, dictionary size, +and trailer contents (CRC, data size, member size). + +@item -0 .. -9 +Compression level. Set the compression parameters (dictionary size and +match length limit) as shown in the table below. The default compression +level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that +@samp{-9} can be much slower than @samp{-0}. These options have no +effect when decompressing or testing. + +The bidimensional parameter space of LZMA can't be mapped to a linear +scale optimal for all files. If your files are large, very repetitive, +etc, you may need to use the options @samp{--dictionary-size} and +@samp{--match-length} directly to achieve optimal performance. + +If several compression levels or @samp{-s} or @samp{-m} options are +given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is +equivalent to @w{@samp{-s64MiB -m273}} + +@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)} +@item Level @tab Dictionary size (-s) @tab Match length limit (-m) +@item -0 @tab 64 KiB @tab 16 bytes +@item -1 @tab 1 MiB @tab 5 bytes +@item -2 @tab 1.5 MiB @tab 6 bytes +@item -3 @tab 2 MiB @tab 8 bytes +@item -4 @tab 3 MiB @tab 12 bytes +@item -5 @tab 4 MiB @tab 20 bytes +@item -6 @tab 8 MiB @tab 36 bytes +@item -7 @tab 16 MiB @tab 68 bytes +@item -8 @tab 24 MiB @tab 132 bytes +@item -9 @tab 32 MiB @tab 273 bytes +@end multitable + +@item --fast +@itemx --best +Aliases for GNU gzip compatibility. + +@item --loose-trailing +When decompressing or testing, allow trailing data whose first bytes are +so similar to the magic bytes of a lzip header that they can be confused +with a corrupt header. Use this option if a file triggers a "corrupt +header" error and the cause is not indeed a corrupt header. + +@item --check-lib +Compare the @uref{#Library-version,,version of lzlib} used to compile +minilzip with the version actually being used at run time and exit. Report +any differences found. Exit with error status 1 if differences are found. A +mismatch may indicate that lzlib is not correctly installed or that a +different version of lzlib has been installed after compiling the shared +version of minilzip. Exit with error status 2 if LZ_API_VERSION and +LZ_version_string don't match. @w{@samp{minilzip -v --check-lib}} shows the +version of lzlib being used and the value of LZ_API_VERSION (if defined). +@ifnothtml +@xref{Library version}. +@end ifnothtml + +@end table + +Numbers given as arguments to options may be followed by a multiplier +and an optional @samp{B} for "byte". + +Table of SI and binary prefixes (unit multipliers): + +@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} +@item Prefix @tab Value @tab | @tab Prefix @tab Value +@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) +@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) +@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) +@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40) +@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50) +@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60) +@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70) +@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80) +@end multitable + +@sp 1 +Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or invalid +input file, 3 for an internal consistency error (e.g., bug) which caused +minilzip to panic. + + +@node Data format +@chapter Data format +@cindex data format + +Perfection is reached, not when there is no longer anything to add, but +when there is no longer anything to take away.@* +--- Antoine de Saint-Exupery + +@sp 1 +In the diagram below, a box like this: + +@verbatim ++---+ +| | <-- the vertical bars might be missing ++---+ +@end verbatim + +represents one byte; a box like this: + +@verbatim ++==============+ +| | ++==============+ +@end verbatim + +represents a variable number of bytes. + +@sp 1 +Lzip data consist of a series of independent "members" (compressed data +sets). The members simply appear one after another in the data stream, with +no additional information before, between, or after them. Each member can +encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data. +The size of a multimember data stream is unlimited. + +Each member has the following structure: + +@verbatim ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +@end verbatim + +All multibyte values are stored in little endian order. + +@table @samp +@item ID string (the "magic" bytes) +A four byte string, identifying the lzip format, with the value "LZIP" +(0x4C, 0x5A, 0x49, 0x50). + +@item VN (version number, 1 byte) +Just in case something needs to be modified in the future. 1 for now. + +@anchor{coded-dict-size} +@item DS (coded dictionary size, 1 byte) +The dictionary size is calculated by taking a power of 2 (the base size) +and subtracting from it a fraction between 0/16 and 7/16 of the base size.@* +Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@* +Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract +from the base size to obtain the dictionary size.@* +Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* +Valid values for dictionary size range from 4 KiB to 512 MiB. + +@item LZMA stream +The LZMA stream, finished by an "End Of Stream" marker. Uses default values +for encoder properties. +@ifnothtml +@xref{Stream format,,,lzip}, +@end ifnothtml +@ifhtml +See +@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format} +@end ifhtml +for a complete description.@* +Lzip only uses the LZMA marker @samp{2} ("End Of Stream" marker). Lzlib +also uses the LZMA marker @samp{3} ("Sync Flush" marker). @xref{sync_flush}. + +@item CRC32 (4 bytes) +Cyclic Redundancy Check (CRC) of the original uncompressed data. + +@item Data size (8 bytes) +Size of the original uncompressed data. + +@item Member size (8 bytes) +Total size of the member, including header and trailer. This field acts +as a distributed index, allows the verification of stream integrity, and +facilitates the safe recovery of undamaged members from multimember files. +Member size should be limited to @w{2 PiB} to prevent the data size field +from overflowing. + +@end table + + +@node Examples +@chapter A small tutorial with examples +@cindex examples + +This chapter provides real code examples for the most common uses of the +library. See these examples in context in the files @samp{bbexample.c} and +@samp{ffexample.c} from the source distribution of lzlib. + +Note that the interface of lzlib is symmetrical. That is, the code for +normal compression and decompression is identical except because one calls +LZ_compress* functions while the other calls LZ_decompress* functions. + +@menu +* Buffer compression:: Buffer-to-buffer single-member compression +* Buffer decompression:: Buffer-to-buffer decompression +* File compression:: File-to-file single-member compression +* File decompression:: File-to-file decompression +* File compression mm:: File-to-file multimember compression +* Skipping data errors:: Decompression with automatic resynchronization +@end menu + + +@node Buffer compression +@section Buffer compression +@cindex buffer compression + +Buffer-to-buffer single-member compression +@w{(@var{member_size} > total output)}. + +@verbatim +/* Compress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the compressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbcompress( const uint8_t * const inbuf, const int insize, + const int dictionary_size, const int match_len_limit, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + struct LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, INT64_MAX ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { LZ_compress_close( encoder ); return false; } + + while( true ) + { + int ret = LZ_compress_write( encoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_compress_finish( encoder ); + ret = LZ_compress_read( encoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_compress_close( encoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } +@end verbatim + + +@node Buffer decompression +@section Buffer decompression +@cindex buffer decompression + +Buffer-to-buffer decompression. + +@verbatim +/* Decompress 'insize' bytes from 'inbuf' to 'outbuf'. + Return the size of the decompressed data in '*outlenp'. + In case of error, or if 'outsize' is too small, return false and do not + modify '*outlenp'. +*/ +bool bbdecompress( const uint8_t * const inbuf, const int insize, + uint8_t * const outbuf, const int outsize, + int * const outlenp ) + { + int inpos = 0, outpos = 0; + bool error = false; + struct LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { LZ_decompress_close( decoder ); return false; } + + while( true ) + { + int ret = LZ_decompress_write( decoder, inbuf + inpos, insize - inpos ); + if( ret < 0 ) { error = true; break; } + inpos += ret; + if( inpos >= insize ) LZ_decompress_finish( decoder ); + ret = LZ_decompress_read( decoder, outbuf + outpos, outsize - outpos ); + if( ret < 0 ) { error = true; break; } + outpos += ret; + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( outpos >= outsize ) { error = true; break; } + } + + if( LZ_decompress_close( decoder ) < 0 ) error = true; + if( error ) return false; + *outlenp = outpos; + return true; + } +@end verbatim + + +@node File compression +@section File compression +@cindex file compression + +File-to-file compression using LZ_compress_write_size. + +@verbatim +int ffcompress( struct LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_finished( encoder ) == 1 ) return 0; + } + return 1; + } +@end verbatim + + +@node File decompression +@section File decompression +@cindex file decompression + +File-to-file decompression using LZ_decompress_write_size. + +@verbatim +int ffdecompress( struct LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } +@end verbatim + + +@node File compression mm +@section File-to-file multimember compression +@cindex multimember compression + +Example 1: Multimember compression with members of fixed size +@w{(@var{member_size} < total output)}. + +@verbatim +int ffmmcompress( FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384, member_size = 4096 }; + uint8_t buffer[buffer_size]; + bool done = false; + struct LZ_Encoder * const encoder = + LZ_compress_open( 65535, 16, member_size ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); return 1; } + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( LZ_compress_finished( encoder ) == 1 ) { done = true; break; } + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) break; + } + } + if( LZ_compress_close( encoder ) < 0 ) done = false; + return done; + } +@end verbatim + +@sp 1 +@noindent +Example 2: Multimember compression (user-restarted members). +(Call LZ_compress_open with @var{member_size} > largest member). + +@verbatim +/* Compress 'infile' to 'outfile' as a multimember stream with one member + for each line of text terminated by a newline character or by EOF. + Return 0 if success, 1 if error. +*/ +int fflfcompress( struct LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + for( len = 0; len < size; ) + { + int ch = getc( infile ); + if( ch == EOF || ( buffer[len++] = ch ) == '\n' ) break; + } + /* avoid writing an empty member to outfile */ + if( len == 0 && LZ_compress_data_position( encoder ) == 0 ) return 0; + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) || buffer[len-1] == '\n' ) + LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( feof( infile ) && LZ_compress_finished( encoder ) == 1 ) return 0; + if( LZ_compress_restart_member( encoder, INT64_MAX ) < 0 ) break; + } + } + return 1; + } +@end verbatim + + +@node Skipping data errors +@section Skipping data errors +@cindex skipping data errors + +@verbatim +/* Decompress 'infile' to 'outfile' with automatic resynchronization to + next member in case of data error, including the automatic removal of + leading garbage. +*/ +int ffrsdecompress( struct LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) + { + if( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) + { LZ_decompress_sync_to_member( decoder ); continue; } + else break; + } + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } +@end verbatim + + +@node Problems +@chapter Reporting bugs +@cindex bugs +@cindex getting help + +There are probably bugs in lzlib. There are certainly errors and +omissions in this manual. If you report them, they will get fixed. If +you don't, no one will ever know about them and they will remain unfixed +for all eternity, if not longer. + +If you find a bug in lzlib, please send electronic mail to +@email{lzip-bug@@nongnu.org}. Include the version number, which you can +find by running @w{@samp{minilzip --version}} and +@w{@samp{minilzip -v --check-lib}}. + + +@node Concept index +@unnumbered Concept index + +@printindex cp + +@bye diff --git a/doc/minilzip.1 b/doc/minilzip.1 new file mode 100644 index 0000000..0c4c06d --- /dev/null +++ b/doc/minilzip.1 @@ -0,0 +1,134 @@ +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.16. +.TH MINILZIP "1" "January 2022" "minilzip 1.13" "User Commands" +.SH NAME +minilzip \- reduces the size of files +.SH SYNOPSIS +.B minilzip +[\fI\,options\/\fR] [\fI\,files\/\fR] +.SH DESCRIPTION +Minilzip is a test program for the compression library lzlib, fully +compatible with lzip 1.4 or newer. +.PP +Lzip is a lossless data compressor with a user interface similar to the one +of gzip or bzip2. Lzip uses a simplified form of the 'Lempel\-Ziv\-Markov +chain\-Algorithm' (LZMA) stream format and provides a 3 factor integrity +checking to maximize interoperability and optimize safety. Lzip can compress +about as fast as gzip (lzip \fB\-0\fR) or compress most files more than bzip2 +(lzip \fB\-9\fR). Decompression speed is intermediate between gzip and bzip2. +Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip +has been designed, written, and tested with great care to replace gzip and +bzip2 as the standard general\-purpose compressed format for unix\-like +systems. +.SH OPTIONS +.TP +\fB\-h\fR, \fB\-\-help\fR +display this help and exit +.TP +\fB\-V\fR, \fB\-\-version\fR +output version information and exit +.TP +\fB\-a\fR, \fB\-\-trailing\-error\fR +exit with error status if trailing data +.TP +\fB\-b\fR, \fB\-\-member\-size=\fR<bytes> +set member size limit in bytes +.TP +\fB\-c\fR, \fB\-\-stdout\fR +write to standard output, keep input files +.TP +\fB\-d\fR, \fB\-\-decompress\fR +decompress +.TP +\fB\-f\fR, \fB\-\-force\fR +overwrite existing output files +.TP +\fB\-F\fR, \fB\-\-recompress\fR +force re\-compression of compressed files +.TP +\fB\-k\fR, \fB\-\-keep\fR +keep (don't delete) input files +.TP +\fB\-m\fR, \fB\-\-match\-length=\fR<bytes> +set match length limit in bytes [36] +.TP +\fB\-o\fR, \fB\-\-output=\fR<file> +write to <file>, keep input files +.TP +\fB\-q\fR, \fB\-\-quiet\fR +suppress all messages +.TP +\fB\-s\fR, \fB\-\-dictionary\-size=\fR<bytes> +set dictionary size limit in bytes [8 MiB] +.TP +\fB\-S\fR, \fB\-\-volume\-size=\fR<bytes> +set volume size limit in bytes +.TP +\fB\-t\fR, \fB\-\-test\fR +test compressed file integrity +.TP +\fB\-v\fR, \fB\-\-verbose\fR +be verbose (a 2nd \fB\-v\fR gives more) +.TP +\fB\-0\fR .. \fB\-9\fR +set compression level [default 6] +.TP +\fB\-\-fast\fR +alias for \fB\-0\fR +.TP +\fB\-\-best\fR +alias for \fB\-9\fR +.TP +\fB\-\-loose\-trailing\fR +allow trailing data seeming corrupt header +.TP +\fB\-\-check\-lib\fR +compare version of lzlib.h with liblz.{a,so} +.PP +If no file names are given, or if a file is '\-', minilzip compresses or +decompresses from standard input to standard output. +Numbers may be followed by a multiplier: k = kB = 10^3 = 1000, +Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc... +Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 +to 2^29 bytes. +.PP +The bidimensional parameter space of LZMA can't be mapped to a linear +scale optimal for all files. If your files are large, very repetitive, +etc, you may need to use the options \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR +directly to achieve optimal performance. +.PP +To extract all the files from archive 'foo.tar.lz', use the commands +\&'tar \fB\-xf\fR foo.tar.lz' or 'minilzip \fB\-cd\fR foo.tar.lz | tar \fB\-xf\fR \-'. +.PP +Exit status: 0 for a normal exit, 1 for environmental problems (file +not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or +invalid input file, 3 for an internal consistency error (e.g., bug) which +caused minilzip to panic. +.PP +The ideas embodied in lzlib are due to (at least) the following people: +Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the +definition of Markov chains), G.N.N. Martin (for the definition of range +encoding), Igor Pavlov (for putting all the above together in LZMA), and +Julian Seward (for bzip2's CLI). +.SH "REPORTING BUGS" +Report bugs to lzip\-bug@nongnu.org +.br +Lzlib home page: http://www.nongnu.org/lzip/lzlib.html +.SH COPYRIGHT +Copyright \(co 2022 Antonio Diaz Diaz. +Using lzlib 1.13 +License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html> +.br +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +.SH "SEE ALSO" +The full documentation for +.B minilzip +is maintained as a Texinfo manual. If the +.B info +and +.B minilzip +programs are properly installed at your site, the command +.IP +.B info lzlib +.PP +should give you access to the complete manual. diff --git a/encoder.c b/encoder.c new file mode 100644 index 0000000..b76dafa --- /dev/null +++ b/encoder.c @@ -0,0 +1,586 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +static int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs ) + { + int32_t * ptr0 = e->eb.mb.pos_array + ( e->eb.mb.cyclic_pos << 1 ); + int32_t * ptr1 = ptr0 + 1; + int len_limit = e->match_len_limit; + if( len_limit > Mb_available_bytes( &e->eb.mb ) ) + { + e->been_flushed = true; + len_limit = Mb_available_bytes( &e->eb.mb ); + if( len_limit < 4 ) { *ptr0 = *ptr1 = 0; return 0; } + } + + int maxlen = 3; /* only used if pairs != 0 */ + int num_pairs = 0; + const int min_pos = ( e->eb.mb.pos > e->eb.mb.dictionary_size ) ? + e->eb.mb.pos - e->eb.mb.dictionary_size : 0; + const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); + + unsigned tmp = crc32[data[0]] ^ data[1]; + const int key2 = tmp & ( num_prev_positions2 - 1 ); + tmp ^= (unsigned)data[2] << 8; + const int key3 = num_prev_positions2 + ( tmp & ( num_prev_positions3 - 1 ) ); + const int key4 = num_prev_positions2 + num_prev_positions3 + + ( ( tmp ^ ( crc32[data[3]] << 5 ) ) & e->eb.mb.key4_mask ); + + if( pairs ) + { + const int np2 = e->eb.mb.prev_positions[key2]; + const int np3 = e->eb.mb.prev_positions[key3]; + if( np2 > min_pos && e->eb.mb.buffer[np2-1] == data[0] ) + { + pairs[0].dis = e->eb.mb.pos - np2; + pairs[0].len = maxlen = 2 + ( np2 == np3 ); + num_pairs = 1; + } + if( np2 != np3 && np3 > min_pos && e->eb.mb.buffer[np3-1] == data[0] ) + { + maxlen = 3; + pairs[num_pairs++].dis = e->eb.mb.pos - np3; + } + if( num_pairs > 0 ) + { + const int delta = pairs[num_pairs-1].dis + 1; + while( maxlen < len_limit && data[maxlen-delta] == data[maxlen] ) + ++maxlen; + pairs[num_pairs-1].len = maxlen; + if( maxlen < 3 ) maxlen = 3; + if( maxlen >= len_limit ) pairs = 0; /* done. now just skip */ + } + } + + const int pos1 = e->eb.mb.pos + 1; + e->eb.mb.prev_positions[key2] = pos1; + e->eb.mb.prev_positions[key3] = pos1; + int newpos1 = e->eb.mb.prev_positions[key4]; + e->eb.mb.prev_positions[key4] = pos1; + + int len = 0, len0 = 0, len1 = 0; + + int count; + for( count = e->cycles; ; ) + { + if( newpos1 <= min_pos || --count < 0 ) { *ptr0 = *ptr1 = 0; break; } + + if( e->been_flushed ) len = 0; + const int delta = pos1 - newpos1; + int32_t * const newptr = e->eb.mb.pos_array + + ( ( e->eb.mb.cyclic_pos - delta + + ( (e->eb.mb.cyclic_pos >= delta) ? 0 : e->eb.mb.dictionary_size + 1 ) ) << 1 ); + if( data[len-delta] == data[len] ) + { + while( ++len < len_limit && data[len-delta] == data[len] ) {} + if( pairs && maxlen < len ) + { + pairs[num_pairs].dis = delta - 1; + pairs[num_pairs].len = maxlen = len; + ++num_pairs; + } + if( len >= len_limit ) + { + *ptr0 = newptr[0]; + *ptr1 = newptr[1]; + break; + } + } + if( data[len-delta] < data[len] ) + { + *ptr0 = newpos1; + ptr0 = newptr + 1; + newpos1 = *ptr0; + len0 = len; if( len1 < len ) len = len1; + } + else + { + *ptr1 = newpos1; + ptr1 = newptr; + newpos1 = *ptr1; + len1 = len; if( len0 < len ) len = len0; + } + } + return num_pairs; + } + + +static void LZe_update_distance_prices( struct LZ_encoder * const e ) + { + int dis, len_state; + for( dis = start_dis_model; dis < modeled_distances; ++dis ) + { + const int dis_slot = dis_slots[dis]; + const int direct_bits = ( dis_slot >> 1 ) - 1; + const int base = ( 2 | ( dis_slot & 1 ) ) << direct_bits; + const int price = price_symbol_reversed( e->eb.bm_dis + ( base - dis_slot ), + dis - base, direct_bits ); + for( len_state = 0; len_state < len_states; ++len_state ) + e->dis_prices[len_state][dis] = price; + } + + for( len_state = 0; len_state < len_states; ++len_state ) + { + int * const dsp = e->dis_slot_prices[len_state]; + const Bit_model * const bmds = e->eb.bm_dis_slot[len_state]; + int slot = 0; + for( ; slot < end_dis_model; ++slot ) + dsp[slot] = price_symbol6( bmds, slot ); + for( ; slot < e->num_dis_slots; ++slot ) + dsp[slot] = price_symbol6( bmds, slot ) + + (((( slot >> 1 ) - 1 ) - dis_align_bits ) << price_shift_bits ); + + int * const dp = e->dis_prices[len_state]; + for( dis = 0; dis < start_dis_model; ++dis ) + dp[dis] = dsp[dis]; + for( ; dis < modeled_distances; ++dis ) + dp[dis] += dsp[dis_slots[dis]]; + } + } + + +/* Return the number of bytes advanced (ahead). + trials[0]..trials[ahead-1] contain the steps to encode. + ( trials[0].dis4 == -1 ) means literal. + A match/rep longer or equal than match_len_limit finishes the sequence. +*/ +static int LZe_sequence_optimizer( struct LZ_encoder * const e, + const int reps[num_rep_distances], + const State state ) + { + int num_pairs, num_trials; + int i, rep, len; + + if( e->pending_num_pairs > 0 ) /* from previous call */ + { + num_pairs = e->pending_num_pairs; + e->pending_num_pairs = 0; + } + else + num_pairs = LZe_read_match_distances( e ); + const int main_len = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0; + + int replens[num_rep_distances]; + int rep_index = 0; + for( i = 0; i < num_rep_distances; ++i ) + { + replens[i] = Mb_true_match_len( &e->eb.mb, 0, reps[i] + 1 ); + if( replens[i] > replens[rep_index] ) rep_index = i; + } + if( replens[rep_index] >= e->match_len_limit ) + { + e->trials[0].price = replens[rep_index]; + e->trials[0].dis4 = rep_index; + if( !LZe_move_and_update( e, replens[rep_index] ) ) return 0; + return replens[rep_index]; + } + + if( main_len >= e->match_len_limit ) + { + e->trials[0].price = main_len; + e->trials[0].dis4 = e->pairs[num_pairs-1].dis + num_rep_distances; + if( !LZe_move_and_update( e, main_len ) ) return 0; + return main_len; + } + + const int pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask; + const int match_price = price1( e->eb.bm_match[state][pos_state] ); + const int rep_match_price = match_price + price1( e->eb.bm_rep[state] ); + const uint8_t prev_byte = Mb_peek( &e->eb.mb, 1 ); + const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 ); + const uint8_t match_byte = Mb_peek( &e->eb.mb, reps[0] + 1 ); + + e->trials[1].price = price0( e->eb.bm_match[state][pos_state] ); + if( St_is_char( state ) ) + e->trials[1].price += LZeb_price_literal( &e->eb, prev_byte, cur_byte ); + else + e->trials[1].price += LZeb_price_matched( &e->eb, prev_byte, cur_byte, match_byte ); + e->trials[1].dis4 = -1; /* literal */ + + if( match_byte == cur_byte ) + Tr_update( &e->trials[1], rep_match_price + + LZeb_price_shortrep( &e->eb, state, pos_state ), 0, 0 ); + + num_trials = max( main_len, replens[rep_index] ); + + if( num_trials < min_match_len ) + { + e->trials[0].price = 1; + e->trials[0].dis4 = e->trials[1].dis4; + if( !Mb_move_pos( &e->eb.mb ) ) return 0; + return 1; + } + + e->trials[0].state = state; + for( i = 0; i < num_rep_distances; ++i ) + e->trials[0].reps[i] = reps[i]; + + for( len = min_match_len; len <= num_trials; ++len ) + e->trials[len].price = infinite_price; + + for( rep = 0; rep < num_rep_distances; ++rep ) + { + if( replens[rep] < min_match_len ) continue; + const int price = rep_match_price + LZeb_price_rep( &e->eb, rep, state, pos_state ); + for( len = min_match_len; len <= replens[rep]; ++len ) + Tr_update( &e->trials[len], price + + Lp_price( &e->rep_len_prices, len, pos_state ), rep, 0 ); + } + + if( main_len > replens[0] ) + { + const int normal_match_price = match_price + price0( e->eb.bm_rep[state] ); + int i = 0, len = max( replens[0] + 1, min_match_len ); + while( len > e->pairs[i].len ) ++i; + while( true ) + { + const int dis = e->pairs[i].dis; + Tr_update( &e->trials[len], normal_match_price + + LZe_price_pair( e, dis, len, pos_state ), + dis + num_rep_distances, 0 ); + if( ++len > e->pairs[i].len && ++i >= num_pairs ) break; + } + } + + int cur = 0; + while( true ) /* price optimization loop */ + { + if( !Mb_move_pos( &e->eb.mb ) ) return 0; + if( ++cur >= num_trials ) /* no more initialized trials */ + { + LZe_backward( e, cur ); + return cur; + } + + const int num_pairs = LZe_read_match_distances( e ); + const int newlen = ( num_pairs > 0 ) ? e->pairs[num_pairs-1].len : 0; + if( newlen >= e->match_len_limit ) + { + e->pending_num_pairs = num_pairs; + LZe_backward( e, cur ); + return cur; + } + + /* give final values to current trial */ + struct Trial * cur_trial = &e->trials[cur]; + State cur_state; + { + const int dis4 = cur_trial->dis4; + int prev_index = cur_trial->prev_index; + const int prev_index2 = cur_trial->prev_index2; + + if( prev_index2 == single_step_trial ) + { + cur_state = e->trials[prev_index].state; + if( prev_index + 1 == cur ) /* len == 1 */ + { + if( dis4 == 0 ) cur_state = St_set_short_rep( cur_state ); + else cur_state = St_set_char( cur_state ); /* literal */ + } + else if( dis4 < num_rep_distances ) cur_state = St_set_rep( cur_state ); + else cur_state = St_set_match( cur_state ); + } + else + { + if( prev_index2 == dual_step_trial ) /* dis4 == 0 (rep0) */ + --prev_index; + else /* prev_index2 >= 0 */ + prev_index = prev_index2; + cur_state = St_set_char_rep(); + } + cur_trial->state = cur_state; + for( i = 0; i < num_rep_distances; ++i ) + cur_trial->reps[i] = e->trials[prev_index].reps[i]; + mtf_reps( dis4, cur_trial->reps ); /* literal is ignored */ + } + + const int pos_state = Mb_data_position( &e->eb.mb ) & pos_state_mask; + const uint8_t prev_byte = Mb_peek( &e->eb.mb, 1 ); + const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 ); + const uint8_t match_byte = Mb_peek( &e->eb.mb, cur_trial->reps[0] + 1 ); + + int next_price = cur_trial->price + + price0( e->eb.bm_match[cur_state][pos_state] ); + if( St_is_char( cur_state ) ) + next_price += LZeb_price_literal( &e->eb, prev_byte, cur_byte ); + else + next_price += LZeb_price_matched( &e->eb, prev_byte, cur_byte, match_byte ); + + /* try last updates to next trial */ + struct Trial * next_trial = &e->trials[cur+1]; + + Tr_update( next_trial, next_price, -1, cur ); /* literal */ + + const int match_price = cur_trial->price + price1( e->eb.bm_match[cur_state][pos_state] ); + const int rep_match_price = match_price + price1( e->eb.bm_rep[cur_state] ); + + if( match_byte == cur_byte && next_trial->dis4 != 0 && + next_trial->prev_index2 == single_step_trial ) + { + const int price = rep_match_price + + LZeb_price_shortrep( &e->eb, cur_state, pos_state ); + if( price <= next_trial->price ) + { + next_trial->price = price; + next_trial->dis4 = 0; /* rep0 */ + next_trial->prev_index = cur; + } + } + + const int triable_bytes = + min( Mb_available_bytes( &e->eb.mb ), max_num_trials - 1 - cur ); + if( triable_bytes < min_match_len ) continue; + + const int len_limit = min( e->match_len_limit, triable_bytes ); + + /* try literal + rep0 */ + if( match_byte != cur_byte && next_trial->prev_index != cur ) + { + const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); + const int dis = cur_trial->reps[0] + 1; + const int limit = min( e->match_len_limit + 1, triable_bytes ); + int len = 1; + while( len < limit && data[len-dis] == data[len] ) ++len; + if( --len >= min_match_len ) + { + const int pos_state2 = ( pos_state + 1 ) & pos_state_mask; + const State state2 = St_set_char( cur_state ); + const int price = next_price + + price1( e->eb.bm_match[state2][pos_state2] ) + + price1( e->eb.bm_rep[state2] ) + + LZe_price_rep0_len( e, len, state2, pos_state2 ); + while( num_trials < cur + 1 + len ) + e->trials[++num_trials].price = infinite_price; + Tr_update2( &e->trials[cur+1+len], price, cur + 1 ); + } + } + + int start_len = min_match_len; + + /* try rep distances */ + for( rep = 0; rep < num_rep_distances; ++rep ) + { + const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); + const int dis = cur_trial->reps[rep] + 1; + + if( data[0-dis] != data[0] || data[1-dis] != data[1] ) continue; + for( len = min_match_len; len < len_limit; ++len ) + if( data[len-dis] != data[len] ) break; + while( num_trials < cur + len ) + e->trials[++num_trials].price = infinite_price; + int price = rep_match_price + LZeb_price_rep( &e->eb, rep, cur_state, pos_state ); + for( i = min_match_len; i <= len; ++i ) + Tr_update( &e->trials[cur+i], price + + Lp_price( &e->rep_len_prices, i, pos_state ), rep, cur ); + + if( rep == 0 ) start_len = len + 1; /* discard shorter matches */ + + /* try rep + literal + rep0 */ + int len2 = len + 1; + const int limit = min( e->match_len_limit + len2, triable_bytes ); + while( len2 < limit && data[len2-dis] == data[len2] ) ++len2; + len2 -= len + 1; + if( len2 < min_match_len ) continue; + + int pos_state2 = ( pos_state + len ) & pos_state_mask; + State state2 = St_set_rep( cur_state ); + price += Lp_price( &e->rep_len_prices, len, pos_state ) + + price0( e->eb.bm_match[state2][pos_state2] ) + + LZeb_price_matched( &e->eb, data[len-1], data[len], data[len-dis] ); + pos_state2 = ( pos_state2 + 1 ) & pos_state_mask; + state2 = St_set_char( state2 ); + price += price1( e->eb.bm_match[state2][pos_state2] ) + + price1( e->eb.bm_rep[state2] ) + + LZe_price_rep0_len( e, len2, state2, pos_state2 ); + while( num_trials < cur + len + 1 + len2 ) + e->trials[++num_trials].price = infinite_price; + Tr_update3( &e->trials[cur+len+1+len2], price, rep, cur + len + 1, cur ); + } + + /* try matches */ + if( newlen >= start_len && newlen <= len_limit ) + { + const int normal_match_price = match_price + + price0( e->eb.bm_rep[cur_state] ); + + while( num_trials < cur + newlen ) + e->trials[++num_trials].price = infinite_price; + + int i = 0; + while( e->pairs[i].len < start_len ) ++i; + int dis = e->pairs[i].dis; + for( len = start_len; ; ++len ) + { + int price = normal_match_price + LZe_price_pair( e, dis, len, pos_state ); + Tr_update( &e->trials[cur+len], price, dis + num_rep_distances, cur ); + + /* try match + literal + rep0 */ + if( len == e->pairs[i].len ) + { + const uint8_t * const data = Mb_ptr_to_current_pos( &e->eb.mb ); + const int dis2 = dis + 1; + int len2 = len + 1; + const int limit = min( e->match_len_limit + len2, triable_bytes ); + while( len2 < limit && data[len2-dis2] == data[len2] ) ++len2; + len2 -= len + 1; + if( len2 >= min_match_len ) + { + int pos_state2 = ( pos_state + len ) & pos_state_mask; + State state2 = St_set_match( cur_state ); + price += price0( e->eb.bm_match[state2][pos_state2] ) + + LZeb_price_matched( &e->eb, data[len-1], data[len], data[len-dis2] ); + pos_state2 = ( pos_state2 + 1 ) & pos_state_mask; + state2 = St_set_char( state2 ); + price += price1( e->eb.bm_match[state2][pos_state2] ) + + price1( e->eb.bm_rep[state2] ) + + LZe_price_rep0_len( e, len2, state2, pos_state2 ); + + while( num_trials < cur + len + 1 + len2 ) + e->trials[++num_trials].price = infinite_price; + Tr_update3( &e->trials[cur+len+1+len2], price, + dis + num_rep_distances, cur + len + 1, cur ); + } + if( ++i >= num_pairs ) break; + dis = e->pairs[i].dis; + } + } + } + } + } + + +static bool LZe_encode_member( struct LZ_encoder * const e ) + { + const bool best = ( e->match_len_limit > 12 ); + const int dis_price_count = best ? 1 : 512; + const int align_price_count = best ? 1 : dis_align_size; + const int price_count = ( e->match_len_limit > 36 ) ? 1013 : 4093; + int i; + State * const state = &e->eb.state; + + if( e->eb.member_finished ) return true; + if( Re_member_position( &e->eb.renc ) >= e->eb.member_size_limit ) + { LZeb_try_full_flush( &e->eb ); return true; } + + if( Mb_data_position( &e->eb.mb ) == 0 && + !Mb_data_finished( &e->eb.mb ) ) /* encode first byte */ + { + if( !Mb_enough_available_bytes( &e->eb.mb ) || + !Re_enough_free_bytes( &e->eb.renc ) ) return true; + const uint8_t prev_byte = 0; + const uint8_t cur_byte = Mb_peek( &e->eb.mb, 0 ); + Re_encode_bit( &e->eb.renc, &e->eb.bm_match[*state][0], 0 ); + LZeb_encode_literal( &e->eb, prev_byte, cur_byte ); + CRC32_update_byte( &e->eb.crc, cur_byte ); + LZe_get_match_pairs( e, 0 ); + if( !Mb_move_pos( &e->eb.mb ) ) return false; + } + + while( !Mb_data_finished( &e->eb.mb ) ) + { + if( !Mb_enough_available_bytes( &e->eb.mb ) || + !Re_enough_free_bytes( &e->eb.renc ) ) return true; + if( e->price_counter <= 0 && e->pending_num_pairs == 0 ) + { + e->price_counter = price_count; /* recalculate prices every these bytes */ + if( e->dis_price_counter <= 0 ) + { e->dis_price_counter = dis_price_count; LZe_update_distance_prices( e ); } + if( e->align_price_counter <= 0 ) + { + e->align_price_counter = align_price_count; + for( i = 0; i < dis_align_size; ++i ) + e->align_prices[i] = price_symbol_reversed( e->eb.bm_align, i, dis_align_bits ); + } + Lp_update_prices( &e->match_len_prices ); + Lp_update_prices( &e->rep_len_prices ); + } + + int ahead = LZe_sequence_optimizer( e, e->eb.reps, *state ); + e->price_counter -= ahead; + + for( i = 0; ahead > 0; ) + { + const int pos_state = + ( Mb_data_position( &e->eb.mb ) - ahead ) & pos_state_mask; + const int len = e->trials[i].price; + int dis = e->trials[i].dis4; + + bool bit = ( dis < 0 ); + Re_encode_bit( &e->eb.renc, &e->eb.bm_match[*state][pos_state], !bit ); + if( bit ) /* literal byte */ + { + const uint8_t prev_byte = Mb_peek( &e->eb.mb, ahead + 1 ); + const uint8_t cur_byte = Mb_peek( &e->eb.mb, ahead ); + CRC32_update_byte( &e->eb.crc, cur_byte ); + if( ( *state = St_set_char( *state ) ) < 4 ) + LZeb_encode_literal( &e->eb, prev_byte, cur_byte ); + else + { + const uint8_t match_byte = Mb_peek( &e->eb.mb, ahead + e->eb.reps[0] + 1 ); + LZeb_encode_matched( &e->eb, prev_byte, cur_byte, match_byte ); + } + } + else /* match or repeated match */ + { + CRC32_update_buf( &e->eb.crc, Mb_ptr_to_current_pos( &e->eb.mb ) - ahead, len ); + mtf_reps( dis, e->eb.reps ); + bit = ( dis < num_rep_distances ); + Re_encode_bit( &e->eb.renc, &e->eb.bm_rep[*state], bit ); + if( bit ) /* repeated match */ + { + bit = ( dis == 0 ); + Re_encode_bit( &e->eb.renc, &e->eb.bm_rep0[*state], !bit ); + if( bit ) + Re_encode_bit( &e->eb.renc, &e->eb.bm_len[*state][pos_state], len > 1 ); + else + { + Re_encode_bit( &e->eb.renc, &e->eb.bm_rep1[*state], dis > 1 ); + if( dis > 1 ) + Re_encode_bit( &e->eb.renc, &e->eb.bm_rep2[*state], dis > 2 ); + } + if( len == 1 ) *state = St_set_short_rep( *state ); + else + { + Re_encode_len( &e->eb.renc, &e->eb.rep_len_model, len, pos_state ); + Lp_decrement_counter( &e->rep_len_prices, pos_state ); + *state = St_set_rep( *state ); + } + } + else /* match */ + { + dis -= num_rep_distances; + LZeb_encode_pair( &e->eb, dis, len, pos_state ); + if( dis >= modeled_distances ) --e->align_price_counter; + --e->dis_price_counter; + Lp_decrement_counter( &e->match_len_prices, pos_state ); + *state = St_set_match( *state ); + } + } + ahead -= len; i += len; + if( Re_member_position( &e->eb.renc ) >= e->eb.member_size_limit ) + { + if( !Mb_dec_pos( &e->eb.mb, ahead ) ) return false; + LZeb_try_full_flush( &e->eb ); + return true; + } + } + } + LZeb_try_full_flush( &e->eb ); + return true; + } diff --git a/encoder.h b/encoder.h new file mode 100644 index 0000000..f17bb99 --- /dev/null +++ b/encoder.h @@ -0,0 +1,326 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +struct Len_prices + { + const struct Len_model * lm; + int len_symbols; + int count; + int prices[pos_states][max_len_symbols]; + int counters[pos_states]; /* may decrement below 0 */ + }; + +static inline void Lp_update_low_mid_prices( struct Len_prices * const lp, + const int pos_state ) + { + int * const pps = lp->prices[pos_state]; + int tmp = price0( lp->lm->choice1 ); + int len = 0; + for( ; len < len_low_symbols && len < lp->len_symbols; ++len ) + pps[len] = tmp + price_symbol3( lp->lm->bm_low[pos_state], len ); + if( len >= lp->len_symbols ) return; + tmp = price1( lp->lm->choice1 ) + price0( lp->lm->choice2 ); + for( ; len < len_low_symbols + len_mid_symbols && len < lp->len_symbols; ++len ) + pps[len] = tmp + + price_symbol3( lp->lm->bm_mid[pos_state], len - len_low_symbols ); + } + +static inline void Lp_update_high_prices( struct Len_prices * const lp ) + { + const int tmp = price1( lp->lm->choice1 ) + price1( lp->lm->choice2 ); + int len; + for( len = len_low_symbols + len_mid_symbols; len < lp->len_symbols; ++len ) + /* using 4 slots per value makes "Lp_price" faster */ + lp->prices[3][len] = lp->prices[2][len] = + lp->prices[1][len] = lp->prices[0][len] = tmp + + price_symbol8( lp->lm->bm_high, len - len_low_symbols - len_mid_symbols ); + } + +static inline void Lp_reset( struct Len_prices * const lp ) + { int i; for( i = 0; i < pos_states; ++i ) lp->counters[i] = 0; } + +static inline void Lp_init( struct Len_prices * const lp, + const struct Len_model * const lm, + const int match_len_limit ) + { + lp->lm = lm; + lp->len_symbols = match_len_limit + 1 - min_match_len; + lp->count = ( match_len_limit > 12 ) ? 1 : lp->len_symbols; + Lp_reset( lp ); + } + +static inline void Lp_decrement_counter( struct Len_prices * const lp, + const int pos_state ) + { --lp->counters[pos_state]; } + +static inline void Lp_update_prices( struct Len_prices * const lp ) + { + int pos_state; + bool high_pending = false; + for( pos_state = 0; pos_state < pos_states; ++pos_state ) + if( lp->counters[pos_state] <= 0 ) + { lp->counters[pos_state] = lp->count; + Lp_update_low_mid_prices( lp, pos_state ); high_pending = true; } + if( high_pending && lp->len_symbols > len_low_symbols + len_mid_symbols ) + Lp_update_high_prices( lp ); + } + +static inline int Lp_price( const struct Len_prices * const lp, + const int len, const int pos_state ) + { return lp->prices[pos_state][len - min_match_len]; } + + +struct Pair /* distance-length pair */ + { + int dis; + int len; + }; + +enum { infinite_price = 0x0FFFFFFF, + max_num_trials = 1 << 13, + single_step_trial = -2, + dual_step_trial = -1 }; + +struct Trial + { + State state; + int price; /* dual use var; cumulative price, match length */ + int dis4; /* -1 for literal, or rep, or match distance + 4 */ + int prev_index; /* index of prev trial in trials[] */ + int prev_index2; /* -2 trial is single step */ + /* -1 literal + rep0 */ + /* >= 0 ( rep or match ) + literal + rep0 */ + int reps[num_rep_distances]; + }; + +static inline void Tr_update( struct Trial * const trial, const int pr, + const int distance4, const int p_i ) + { + if( pr < trial->price ) + { trial->price = pr; trial->dis4 = distance4; trial->prev_index = p_i; + trial->prev_index2 = single_step_trial; } + } + +static inline void Tr_update2( struct Trial * const trial, const int pr, + const int p_i ) + { + if( pr < trial->price ) + { trial->price = pr; trial->dis4 = 0; trial->prev_index = p_i; + trial->prev_index2 = dual_step_trial; } + } + +static inline void Tr_update3( struct Trial * const trial, const int pr, + const int distance4, const int p_i, + const int p_i2 ) + { + if( pr < trial->price ) + { trial->price = pr; trial->dis4 = distance4; trial->prev_index = p_i; + trial->prev_index2 = p_i2; } + } + + +struct LZ_encoder + { + struct LZ_encoder_base eb; + int cycles; + int match_len_limit; + struct Len_prices match_len_prices; + struct Len_prices rep_len_prices; + int pending_num_pairs; + struct Pair pairs[max_match_len+1]; + struct Trial trials[max_num_trials]; + + int dis_slot_prices[len_states][2*max_dictionary_bits]; + int dis_prices[len_states][modeled_distances]; + int align_prices[dis_align_size]; + int num_dis_slots; + int price_counter; /* counters may decrement below 0 */ + int dis_price_counter; + int align_price_counter; + bool been_flushed; + }; + +static inline bool Mb_dec_pos( struct Matchfinder_base * const mb, + const int ahead ) + { + if( ahead < 0 || mb->pos < ahead ) return false; + mb->pos -= ahead; + if( mb->cyclic_pos < ahead ) mb->cyclic_pos += mb->dictionary_size + 1; + mb->cyclic_pos -= ahead; + return true; + } + +static int LZe_get_match_pairs( struct LZ_encoder * const e, struct Pair * pairs ); + + /* move-to-front dis in/into reps; do nothing if( dis4 <= 0 ) */ +static inline void mtf_reps( const int dis4, int reps[num_rep_distances] ) + { + if( dis4 >= num_rep_distances ) /* match */ + { + reps[3] = reps[2]; reps[2] = reps[1]; reps[1] = reps[0]; + reps[0] = dis4 - num_rep_distances; + } + else if( dis4 > 0 ) /* repeated match */ + { + const int distance = reps[dis4]; + int i; for( i = dis4; i > 0; --i ) reps[i] = reps[i-1]; + reps[0] = distance; + } + } + +static inline int LZeb_price_shortrep( const struct LZ_encoder_base * const eb, + const State state, const int pos_state ) + { + return price0( eb->bm_rep0[state] ) + price0( eb->bm_len[state][pos_state] ); + } + +static inline int LZeb_price_rep( const struct LZ_encoder_base * const eb, + const int rep, const State state, + const int pos_state ) + { + if( rep == 0 ) return price0( eb->bm_rep0[state] ) + + price1( eb->bm_len[state][pos_state] ); + int price = price1( eb->bm_rep0[state] ); + if( rep == 1 ) + price += price0( eb->bm_rep1[state] ); + else + { + price += price1( eb->bm_rep1[state] ); + price += price_bit( eb->bm_rep2[state], rep - 2 ); + } + return price; + } + +static inline int LZe_price_rep0_len( const struct LZ_encoder * const e, + const int len, const State state, + const int pos_state ) + { + return LZeb_price_rep( &e->eb, 0, state, pos_state ) + + Lp_price( &e->rep_len_prices, len, pos_state ); + } + +static inline int LZe_price_pair( const struct LZ_encoder * const e, + const int dis, const int len, + const int pos_state ) + { + const int price = Lp_price( &e->match_len_prices, len, pos_state ); + const int len_state = get_len_state( len ); + if( dis < modeled_distances ) + return price + e->dis_prices[len_state][dis]; + else + return price + e->dis_slot_prices[len_state][get_slot( dis )] + + e->align_prices[dis & (dis_align_size - 1)]; + } + +static inline int LZe_read_match_distances( struct LZ_encoder * const e ) + { + const int num_pairs = LZe_get_match_pairs( e, e->pairs ); + if( num_pairs > 0 ) + { + const int len = e->pairs[num_pairs-1].len; + if( len == e->match_len_limit && len < max_match_len ) + e->pairs[num_pairs-1].len = + Mb_true_match_len( &e->eb.mb, len, e->pairs[num_pairs-1].dis + 1 ); + } + return num_pairs; + } + +static inline bool LZe_move_and_update( struct LZ_encoder * const e, int n ) + { + while( true ) + { + if( !Mb_move_pos( &e->eb.mb ) ) return false; + if( --n <= 0 ) break; + LZe_get_match_pairs( e, 0 ); + } + return true; + } + +static inline void LZe_backward( struct LZ_encoder * const e, int cur ) + { + int dis4 = e->trials[cur].dis4; + while( cur > 0 ) + { + const int prev_index = e->trials[cur].prev_index; + struct Trial * const prev_trial = &e->trials[prev_index]; + + if( e->trials[cur].prev_index2 != single_step_trial ) + { + prev_trial->dis4 = -1; /* literal */ + prev_trial->prev_index = prev_index - 1; + prev_trial->prev_index2 = single_step_trial; + if( e->trials[cur].prev_index2 >= 0 ) + { + struct Trial * const prev_trial2 = &e->trials[prev_index-1]; + prev_trial2->dis4 = dis4; dis4 = 0; /* rep0 */ + prev_trial2->prev_index = e->trials[cur].prev_index2; + prev_trial2->prev_index2 = single_step_trial; + } + } + prev_trial->price = cur - prev_index; /* len */ + cur = dis4; dis4 = prev_trial->dis4; prev_trial->dis4 = cur; + cur = prev_index; + } + } + +enum { num_prev_positions3 = 1 << 16, + num_prev_positions2 = 1 << 10 }; + +static inline bool LZe_init( struct LZ_encoder * const e, + const int dict_size, const int len_limit, + const unsigned long long member_size ) + { + enum { before_size = max_num_trials, + /* bytes to keep in buffer after pos */ + after_size = max_num_trials + ( 2 * max_match_len ) + 1, + dict_factor = 2, + num_prev_positions23 = num_prev_positions2 + num_prev_positions3, + pos_array_factor = 2, + min_free_bytes = 2 * max_num_trials }; + + if( !LZeb_init( &e->eb, before_size, dict_size, after_size, dict_factor, + num_prev_positions23, pos_array_factor, min_free_bytes, + member_size ) ) return false; + e->cycles = ( len_limit < max_match_len ) ? 16 + ( len_limit / 2 ) : 256; + e->match_len_limit = len_limit; + Lp_init( &e->match_len_prices, &e->eb.match_len_model, e->match_len_limit ); + Lp_init( &e->rep_len_prices, &e->eb.rep_len_model, e->match_len_limit ); + e->pending_num_pairs = 0; + e->num_dis_slots = 2 * real_bits( e->eb.mb.dictionary_size - 1 ); + e->trials[1].prev_index = 0; + e->trials[1].prev_index2 = single_step_trial; + e->price_counter = 0; + e->dis_price_counter = 0; + e->align_price_counter = 0; + e->been_flushed = false; + return true; + } + +static inline void LZe_reset( struct LZ_encoder * const e, + const unsigned long long member_size ) + { + LZeb_reset( &e->eb, member_size ); + Lp_reset( &e->match_len_prices ); + Lp_reset( &e->rep_len_prices ); + e->pending_num_pairs = 0; + e->price_counter = 0; + e->dis_price_counter = 0; + e->align_price_counter = 0; + e->been_flushed = false; + } diff --git a/encoder_base.c b/encoder_base.c new file mode 100644 index 0000000..4535352 --- /dev/null +++ b/encoder_base.c @@ -0,0 +1,196 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +static bool Mb_normalize_pos( struct Matchfinder_base * const mb ) + { + if( mb->pos > mb->stream_pos ) + { mb->pos = mb->stream_pos; return false; } + if( !mb->at_stream_end ) + { + int i; + /* offset is int32_t for the min below */ + const int32_t offset = mb->pos - mb->before_size - mb->dictionary_size; + const int size = mb->stream_pos - offset; + memmove( mb->buffer, mb->buffer + offset, size ); + mb->partial_data_pos += offset; + mb->pos -= offset; /* pos = before_size + dictionary_size */ + mb->stream_pos -= offset; + for( i = 0; i < mb->num_prev_positions; ++i ) + mb->prev_positions[i] -= min( mb->prev_positions[i], offset ); + for( i = 0; i < mb->pos_array_size; ++i ) + mb->pos_array[i] -= min( mb->pos_array[i], offset ); + } + return true; + } + + +static bool Mb_init( struct Matchfinder_base * const mb, const int before_size, + const int dict_size, const int after_size, + const int dict_factor, const int num_prev_positions23, + const int pos_array_factor ) + { + const int buffer_size_limit = + ( dict_factor * dict_size ) + before_size + after_size; + int i; + + mb->partial_data_pos = 0; + mb->before_size = before_size; + mb->after_size = after_size; + mb->pos = 0; + mb->cyclic_pos = 0; + mb->stream_pos = 0; + mb->num_prev_positions23 = num_prev_positions23; + mb->at_stream_end = false; + mb->sync_flush_pending = false; + + mb->buffer_size = max( 65536, buffer_size_limit ); + mb->buffer = (uint8_t *)malloc( mb->buffer_size ); + if( !mb->buffer ) return false; + mb->saved_dictionary_size = dict_size; + mb->dictionary_size = dict_size; + mb->pos_limit = mb->buffer_size - after_size; + unsigned size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 ); + if( mb->dictionary_size > 1 << 26 ) size >>= 1; /* 64 MiB */ + mb->key4_mask = size - 1; /* increases with dictionary size */ + size += num_prev_positions23; + mb->num_prev_positions = size; + + mb->pos_array_size = pos_array_factor * ( mb->dictionary_size + 1 ); + size += mb->pos_array_size; + if( size * sizeof mb->prev_positions[0] <= size ) mb->prev_positions = 0; + else mb->prev_positions = + (int32_t *)malloc( size * sizeof mb->prev_positions[0] ); + if( !mb->prev_positions ) { free( mb->buffer ); return false; } + mb->pos_array = mb->prev_positions + mb->num_prev_positions; + for( i = 0; i < mb->num_prev_positions; ++i ) mb->prev_positions[i] = 0; + return true; + } + + +static void Mb_adjust_array( struct Matchfinder_base * const mb ) + { + int size = 1 << max( 16, real_bits( mb->dictionary_size - 1 ) - 2 ); + if( mb->dictionary_size > 1 << 26 ) size >>= 1; /* 64 MiB */ + mb->key4_mask = size - 1; + size += mb->num_prev_positions23; + mb->num_prev_positions = size; + mb->pos_array = mb->prev_positions + mb->num_prev_positions; + } + + +static void Mb_adjust_dictionary_size( struct Matchfinder_base * const mb ) + { + if( mb->stream_pos < mb->dictionary_size ) + { + mb->dictionary_size = max( min_dictionary_size, mb->stream_pos ); + Mb_adjust_array( mb ); + mb->pos_limit = mb->buffer_size; + } + } + + +static void Mb_reset( struct Matchfinder_base * const mb ) + { + int i; + if( mb->stream_pos > mb->pos ) + memmove( mb->buffer, mb->buffer + mb->pos, mb->stream_pos - mb->pos ); + mb->partial_data_pos = 0; + mb->stream_pos -= mb->pos; + mb->pos = 0; + mb->cyclic_pos = 0; + mb->at_stream_end = false; + mb->sync_flush_pending = false; + mb->dictionary_size = mb->saved_dictionary_size; + Mb_adjust_array( mb ); + mb->pos_limit = mb->buffer_size - mb->after_size; + for( i = 0; i < mb->num_prev_positions; ++i ) mb->prev_positions[i] = 0; + } + + +/* End Of Stream marker => (dis == 0xFFFFFFFFU, len == min_match_len) */ +static void LZeb_try_full_flush( struct LZ_encoder_base * const eb ) + { + if( eb->member_finished || + Cb_free_bytes( &eb->renc.cb ) < max_marker_size + eb->renc.ff_count + Lt_size ) + return; + eb->member_finished = true; + const int pos_state = Mb_data_position( &eb->mb ) & pos_state_mask; + const State state = eb->state; + Re_encode_bit( &eb->renc, &eb->bm_match[state][pos_state], 1 ); + Re_encode_bit( &eb->renc, &eb->bm_rep[state], 0 ); + LZeb_encode_pair( eb, 0xFFFFFFFFU, min_match_len, pos_state ); + Re_flush( &eb->renc ); + Lzip_trailer trailer; + Lt_set_data_crc( trailer, LZeb_crc( eb ) ); + Lt_set_data_size( trailer, Mb_data_position( &eb->mb ) ); + Lt_set_member_size( trailer, Re_member_position( &eb->renc ) + Lt_size ); + int i; + for( i = 0; i < Lt_size; ++i ) + Cb_put_byte( &eb->renc.cb, trailer[i] ); + } + + +/* Sync Flush marker => (dis == 0xFFFFFFFFU, len == min_match_len + 1) */ +static void LZeb_try_sync_flush( struct LZ_encoder_base * const eb ) + { + const unsigned min_size = eb->renc.ff_count + max_marker_size; + if( eb->member_finished || + Cb_free_bytes( &eb->renc.cb ) < min_size + max_marker_size ) return; + eb->mb.sync_flush_pending = false; + const unsigned long long old_mpos = Re_member_position( &eb->renc ); + const int pos_state = Mb_data_position( &eb->mb ) & pos_state_mask; + const State state = eb->state; + do { /* size of markers must be >= rd_min_available_bytes + 5 */ + Re_encode_bit( &eb->renc, &eb->bm_match[state][pos_state], 1 ); + Re_encode_bit( &eb->renc, &eb->bm_rep[state], 0 ); + LZeb_encode_pair( eb, 0xFFFFFFFFU, min_match_len + 1, pos_state ); + Re_flush( &eb->renc ); + } + while( Re_member_position( &eb->renc ) - old_mpos < min_size ); + } + + +static void LZeb_reset( struct LZ_encoder_base * const eb, + const unsigned long long member_size ) + { + const unsigned long long min_member_size = min_dictionary_size; + const unsigned long long max_member_size = 0x0008000000000000ULL; /* 2 PiB */ + int i; + Mb_reset( &eb->mb ); + eb->member_size_limit = + min( max( min_member_size, member_size ), max_member_size ) - + Lt_size - max_marker_size; + eb->crc = 0xFFFFFFFFU; + Bm_array_init( eb->bm_literal[0], (1 << literal_context_bits) * 0x300 ); + Bm_array_init( eb->bm_match[0], states * pos_states ); + Bm_array_init( eb->bm_rep, states ); + Bm_array_init( eb->bm_rep0, states ); + Bm_array_init( eb->bm_rep1, states ); + Bm_array_init( eb->bm_rep2, states ); + Bm_array_init( eb->bm_len[0], states * pos_states ); + Bm_array_init( eb->bm_dis_slot[0], len_states * (1 << dis_slot_bits) ); + Bm_array_init( eb->bm_dis, modeled_distances - end_dis_model + 1 ); + Bm_array_init( eb->bm_align, dis_align_size ); + Lm_init( &eb->match_len_model ); + Lm_init( &eb->rep_len_model ); + Re_reset( &eb->renc, eb->mb.dictionary_size ); + for( i = 0; i < num_rep_distances; ++i ) eb->reps[i] = 0; + eb->state = 0; + eb->member_finished = false; + } diff --git a/encoder_base.h b/encoder_base.h new file mode 100644 index 0000000..17ffc93 --- /dev/null +++ b/encoder_base.h @@ -0,0 +1,612 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +enum { price_shift_bits = 6, + price_step_bits = 2 }; + +static const uint8_t dis_slots[1<<10] = + { + 0, 1, 2, 3, 4, 4, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, + 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, + 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, + 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, + 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, + 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, + 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, + 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, + 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, + 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, + 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, + 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, + 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, + 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19 }; + +static inline uint8_t get_slot( const unsigned dis ) + { + if( dis < (1 << 10) ) return dis_slots[dis]; + if( dis < (1 << 19) ) return dis_slots[dis>> 9] + 18; + if( dis < (1 << 28) ) return dis_slots[dis>>18] + 36; + return dis_slots[dis>>27] + 54; + } + + +static const short prob_prices[bit_model_total >> price_step_bits] = +{ +640, 539, 492, 461, 438, 419, 404, 390, 379, 369, 359, 351, 343, 336, 330, 323, +318, 312, 307, 302, 298, 293, 289, 285, 281, 277, 274, 270, 267, 264, 261, 258, +255, 252, 250, 247, 244, 242, 239, 237, 235, 232, 230, 228, 226, 224, 222, 220, +218, 216, 214, 213, 211, 209, 207, 206, 204, 202, 201, 199, 198, 196, 195, 193, +192, 190, 189, 188, 186, 185, 184, 182, 181, 180, 178, 177, 176, 175, 174, 172, +171, 170, 169, 168, 167, 166, 165, 164, 163, 162, 161, 159, 158, 157, 157, 156, +155, 154, 153, 152, 151, 150, 149, 148, 147, 146, 145, 145, 144, 143, 142, 141, +140, 140, 139, 138, 137, 136, 136, 135, 134, 133, 133, 132, 131, 130, 130, 129, +128, 127, 127, 126, 125, 125, 124, 123, 123, 122, 121, 121, 120, 119, 119, 118, +117, 117, 116, 115, 115, 114, 114, 113, 112, 112, 111, 111, 110, 109, 109, 108, +108, 107, 106, 106, 105, 105, 104, 104, 103, 103, 102, 101, 101, 100, 100, 99, + 99, 98, 98, 97, 97, 96, 96, 95, 95, 94, 94, 93, 93, 92, 92, 91, + 91, 90, 90, 89, 89, 88, 88, 88, 87, 87, 86, 86, 85, 85, 84, 84, + 83, 83, 83, 82, 82, 81, 81, 80, 80, 80, 79, 79, 78, 78, 77, 77, + 77, 76, 76, 75, 75, 75, 74, 74, 73, 73, 73, 72, 72, 71, 71, 71, + 70, 70, 70, 69, 69, 68, 68, 68, 67, 67, 67, 66, 66, 65, 65, 65, + 64, 64, 64, 63, 63, 63, 62, 62, 61, 61, 61, 60, 60, 60, 59, 59, + 59, 58, 58, 58, 57, 57, 57, 56, 56, 56, 55, 55, 55, 54, 54, 54, + 53, 53, 53, 53, 52, 52, 52, 51, 51, 51, 50, 50, 50, 49, 49, 49, + 48, 48, 48, 48, 47, 47, 47, 46, 46, 46, 45, 45, 45, 45, 44, 44, + 44, 43, 43, 43, 43, 42, 42, 42, 41, 41, 41, 41, 40, 40, 40, 40, + 39, 39, 39, 38, 38, 38, 38, 37, 37, 37, 37, 36, 36, 36, 35, 35, + 35, 35, 34, 34, 34, 34, 33, 33, 33, 33, 32, 32, 32, 32, 31, 31, + 31, 31, 30, 30, 30, 30, 29, 29, 29, 29, 28, 28, 28, 28, 27, 27, + 27, 27, 26, 26, 26, 26, 26, 25, 25, 25, 25, 24, 24, 24, 24, 23, + 23, 23, 23, 22, 22, 22, 22, 22, 21, 21, 21, 21, 20, 20, 20, 20, + 20, 19, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 16, + 16, 16, 16, 15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 13, 13, 13, + 13, 13, 12, 12, 12, 12, 12, 11, 11, 11, 11, 10, 10, 10, 10, 10, + 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, + 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, + 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1 }; + +static inline int get_price( const int probability ) + { return prob_prices[probability >> price_step_bits]; } + + +static inline int price0( const Bit_model probability ) + { return get_price( probability ); } + +static inline int price1( const Bit_model probability ) + { return get_price( bit_model_total - probability ); } + +static inline int price_bit( const Bit_model bm, const bool bit ) + { return ( bit ? price1( bm ) : price0( bm ) ); } + + +static inline int price_symbol3( const Bit_model bm[], int symbol ) + { + bool bit = symbol & 1; + symbol |= 8; symbol >>= 1; + int price = price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + return price + price_bit( bm[1], symbol & 1 ); + } + + +static inline int price_symbol6( const Bit_model bm[], unsigned symbol ) + { + bool bit = symbol & 1; + symbol |= 64; symbol >>= 1; + int price = price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + return price + price_bit( bm[1], symbol & 1 ); + } + + +static inline int price_symbol8( const Bit_model bm[], int symbol ) + { + bool bit = symbol & 1; + symbol |= 0x100; symbol >>= 1; + int price = price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + bit = symbol & 1; symbol >>= 1; price += price_bit( bm[symbol], bit ); + return price + price_bit( bm[1], symbol & 1 ); + } + + +static inline int price_symbol_reversed( const Bit_model bm[], int symbol, + const int num_bits ) + { + int price = 0; + int model = 1; + int i; + for( i = num_bits; i > 0; --i ) + { + const bool bit = symbol & 1; + symbol >>= 1; + price += price_bit( bm[model], bit ); + model <<= 1; model |= bit; + } + return price; + } + + +static inline int price_matched( const Bit_model bm[], unsigned symbol, + unsigned match_byte ) + { + int price = 0; + unsigned mask = 0x100; + symbol |= mask; + while( true ) + { + const unsigned match_bit = ( match_byte <<= 1 ) & mask; + const bool bit = ( symbol <<= 1 ) & 0x100; + price += price_bit( bm[(symbol>>9)+match_bit+mask], bit ); + if( symbol >= 0x10000 ) return price; + mask &= ~(match_bit ^ symbol); /* if( match_bit != bit ) mask = 0; */ + } + } + + +struct Matchfinder_base + { + unsigned long long partial_data_pos; + uint8_t * buffer; /* input buffer */ + int32_t * prev_positions; /* 1 + last seen position of key. else 0 */ + int32_t * pos_array; /* may be tree or chain */ + int before_size; /* bytes to keep in buffer before dictionary */ + int after_size; /* bytes to keep in buffer after pos */ + int buffer_size; + int dictionary_size; /* bytes to keep in buffer before pos */ + int pos; /* current pos in buffer */ + int cyclic_pos; /* cycles through [0, dictionary_size] */ + int stream_pos; /* first byte not yet read from file */ + int pos_limit; /* when reached, a new block must be read */ + int key4_mask; + int num_prev_positions23; + int num_prev_positions; /* size of prev_positions */ + int pos_array_size; + int saved_dictionary_size; /* dictionary_size restored by Mb_reset */ + bool at_stream_end; /* stream_pos shows real end of file */ + bool sync_flush_pending; + }; + +static bool Mb_normalize_pos( struct Matchfinder_base * const mb ); + +static bool Mb_init( struct Matchfinder_base * const mb, const int before_size, + const int dict_size, const int after_size, + const int dict_factor, const int num_prev_positions23, + const int pos_array_factor ); + +static inline void Mb_free( struct Matchfinder_base * const mb ) + { free( mb->prev_positions ); free( mb->buffer ); } + +static inline uint8_t Mb_peek( const struct Matchfinder_base * const mb, + const int distance ) + { return mb->buffer[mb->pos-distance]; } + +static inline int Mb_available_bytes( const struct Matchfinder_base * const mb ) + { return mb->stream_pos - mb->pos; } + +static inline unsigned long long +Mb_data_position( const struct Matchfinder_base * const mb ) + { return mb->partial_data_pos + mb->pos; } + +static inline void Mb_finish( struct Matchfinder_base * const mb ) + { mb->at_stream_end = true; mb->sync_flush_pending = false; } + +static inline bool Mb_data_finished( const struct Matchfinder_base * const mb ) + { return mb->at_stream_end && mb->pos >= mb->stream_pos; } + +static inline bool Mb_flushing_or_end( const struct Matchfinder_base * const mb ) + { return mb->at_stream_end || mb->sync_flush_pending; } + +static inline int Mb_free_bytes( const struct Matchfinder_base * const mb ) + { if( Mb_flushing_or_end( mb ) ) return 0; + return mb->buffer_size - mb->stream_pos; } + +static inline bool +Mb_enough_available_bytes( const struct Matchfinder_base * const mb ) + { return ( mb->pos + mb->after_size <= mb->stream_pos || + ( Mb_flushing_or_end( mb ) && mb->pos < mb->stream_pos ) ); } + +static inline const uint8_t * +Mb_ptr_to_current_pos( const struct Matchfinder_base * const mb ) + { return mb->buffer + mb->pos; } + +static int Mb_write_data( struct Matchfinder_base * const mb, + const uint8_t * const inbuf, const int size ) + { + const int sz = min( mb->buffer_size - mb->stream_pos, size ); + if( Mb_flushing_or_end( mb ) || sz <= 0 ) return 0; + memcpy( mb->buffer + mb->stream_pos, inbuf, sz ); + mb->stream_pos += sz; + return sz; + } + +static inline int Mb_true_match_len( const struct Matchfinder_base * const mb, + const int index, const int distance ) + { + const uint8_t * const data = mb->buffer + mb->pos; + int i = index; + const int len_limit = min( Mb_available_bytes( mb ), max_match_len ); + while( i < len_limit && data[i-distance] == data[i] ) ++i; + return i; + } + +static inline bool Mb_move_pos( struct Matchfinder_base * const mb ) + { + if( ++mb->cyclic_pos > mb->dictionary_size ) mb->cyclic_pos = 0; + if( ++mb->pos >= mb->pos_limit ) return Mb_normalize_pos( mb ); + return true; + } + + +struct Range_encoder + { + struct Circular_buffer cb; + unsigned min_free_bytes; + uint64_t low; + unsigned long long partial_member_pos; + uint32_t range; + unsigned ff_count; + uint8_t cache; + Lzip_header header; + }; + +static inline void Re_shift_low( struct Range_encoder * const renc ) + { + if( renc->low >> 24 != 0xFF ) + { + const bool carry = ( renc->low > 0xFFFFFFFFU ); + Cb_put_byte( &renc->cb, renc->cache + carry ); + for( ; renc->ff_count > 0; --renc->ff_count ) + Cb_put_byte( &renc->cb, 0xFF + carry ); + renc->cache = renc->low >> 24; + } + else ++renc->ff_count; + renc->low = ( renc->low & 0x00FFFFFFU ) << 8; + } + +static inline void Re_reset( struct Range_encoder * const renc, + const unsigned dictionary_size ) + { + int i; + Cb_reset( &renc->cb ); + renc->low = 0; + renc->partial_member_pos = 0; + renc->range = 0xFFFFFFFFU; + renc->ff_count = 0; + renc->cache = 0; + Lh_set_dictionary_size( renc->header, dictionary_size ); + for( i = 0; i < Lh_size; ++i ) + Cb_put_byte( &renc->cb, renc->header[i] ); + } + +static inline bool Re_init( struct Range_encoder * const renc, + const unsigned dictionary_size, + const unsigned min_free_bytes ) + { + if( !Cb_init( &renc->cb, 65536 + min_free_bytes ) ) return false; + renc->min_free_bytes = min_free_bytes; + Lh_set_magic( renc->header ); + Re_reset( renc, dictionary_size ); + return true; + } + +static inline void Re_free( struct Range_encoder * const renc ) + { Cb_free( &renc->cb ); } + +static inline unsigned long long +Re_member_position( const struct Range_encoder * const renc ) + { return renc->partial_member_pos + Cb_used_bytes( &renc->cb ) + renc->ff_count; } + +static inline bool Re_enough_free_bytes( const struct Range_encoder * const renc ) + { return Cb_free_bytes( &renc->cb ) >= renc->min_free_bytes + renc->ff_count; } + +static inline int Re_read_data( struct Range_encoder * const renc, + uint8_t * const out_buffer, const int out_size ) + { + const int size = Cb_read_data( &renc->cb, out_buffer, out_size ); + if( size > 0 ) renc->partial_member_pos += size; + return size; + } + +static inline void Re_flush( struct Range_encoder * const renc ) + { + int i; for( i = 0; i < 5; ++i ) Re_shift_low( renc ); + renc->low = 0; + renc->range = 0xFFFFFFFFU; + renc->ff_count = 0; + renc->cache = 0; + } + +static inline void Re_encode( struct Range_encoder * const renc, + const int symbol, const int num_bits ) + { + unsigned mask; + for( mask = 1 << ( num_bits - 1 ); mask > 0; mask >>= 1 ) + { + renc->range >>= 1; + if( symbol & mask ) renc->low += renc->range; + if( renc->range <= 0x00FFFFFFU ) + { renc->range <<= 8; Re_shift_low( renc ); } + } + } + +static inline void Re_encode_bit( struct Range_encoder * const renc, + Bit_model * const probability, const bool bit ) + { + const uint32_t bound = ( renc->range >> bit_model_total_bits ) * *probability; + if( !bit ) + { + renc->range = bound; + *probability += (bit_model_total - *probability) >> bit_model_move_bits; + } + else + { + renc->low += bound; + renc->range -= bound; + *probability -= *probability >> bit_model_move_bits; + } + if( renc->range <= 0x00FFFFFFU ) { renc->range <<= 8; Re_shift_low( renc ); } + } + +static inline void Re_encode_tree3( struct Range_encoder * const renc, + Bit_model bm[], const int symbol ) + { + bool bit = ( symbol >> 2 ) & 1; + Re_encode_bit( renc, &bm[1], bit ); + int model = 2 | bit; + bit = ( symbol >> 1 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + Re_encode_bit( renc, &bm[model], symbol & 1 ); + } + +static inline void Re_encode_tree6( struct Range_encoder * const renc, + Bit_model bm[], const unsigned symbol ) + { + bool bit = ( symbol >> 5 ) & 1; + Re_encode_bit( renc, &bm[1], bit ); + int model = 2 | bit; + bit = ( symbol >> 4 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + bit = ( symbol >> 3 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + bit = ( symbol >> 2 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + bit = ( symbol >> 1 ) & 1; + Re_encode_bit( renc, &bm[model], bit ); model <<= 1; model |= bit; + Re_encode_bit( renc, &bm[model], symbol & 1 ); + } + +static inline void Re_encode_tree8( struct Range_encoder * const renc, + Bit_model bm[], const int symbol ) + { + int model = 1; + int i; + for( i = 7; i >= 0; --i ) + { + const bool bit = ( symbol >> i ) & 1; + Re_encode_bit( renc, &bm[model], bit ); + model <<= 1; model |= bit; + } + } + +static inline void Re_encode_tree_reversed( struct Range_encoder * const renc, + Bit_model bm[], int symbol, const int num_bits ) + { + int model = 1; + int i; + for( i = num_bits; i > 0; --i ) + { + const bool bit = symbol & 1; + symbol >>= 1; + Re_encode_bit( renc, &bm[model], bit ); + model <<= 1; model |= bit; + } + } + +static inline void Re_encode_matched( struct Range_encoder * const renc, + Bit_model bm[], unsigned symbol, + unsigned match_byte ) + { + unsigned mask = 0x100; + symbol |= mask; + while( true ) + { + const unsigned match_bit = ( match_byte <<= 1 ) & mask; + const bool bit = ( symbol <<= 1 ) & 0x100; + Re_encode_bit( renc, &bm[(symbol>>9)+match_bit+mask], bit ); + if( symbol >= 0x10000 ) break; + mask &= ~(match_bit ^ symbol); /* if( match_bit != bit ) mask = 0; */ + } + } + +static inline void Re_encode_len( struct Range_encoder * const renc, + struct Len_model * const lm, + int symbol, const int pos_state ) + { + bool bit = ( ( symbol -= min_match_len ) >= len_low_symbols ); + Re_encode_bit( renc, &lm->choice1, bit ); + if( !bit ) + Re_encode_tree3( renc, lm->bm_low[pos_state], symbol ); + else + { + bit = ( ( symbol -= len_low_symbols ) >= len_mid_symbols ); + Re_encode_bit( renc, &lm->choice2, bit ); + if( !bit ) + Re_encode_tree3( renc, lm->bm_mid[pos_state], symbol ); + else + Re_encode_tree8( renc, lm->bm_high, symbol - len_mid_symbols ); + } + } + + +enum { max_marker_size = 16, + num_rep_distances = 4 }; /* must be 4 */ + +struct LZ_encoder_base + { + struct Matchfinder_base mb; + unsigned long long member_size_limit; + uint32_t crc; + + Bit_model bm_literal[1<<literal_context_bits][0x300]; + Bit_model bm_match[states][pos_states]; + Bit_model bm_rep[states]; + Bit_model bm_rep0[states]; + Bit_model bm_rep1[states]; + Bit_model bm_rep2[states]; + Bit_model bm_len[states][pos_states]; + Bit_model bm_dis_slot[len_states][1<<dis_slot_bits]; + Bit_model bm_dis[modeled_distances-end_dis_model+1]; + Bit_model bm_align[dis_align_size]; + struct Len_model match_len_model; + struct Len_model rep_len_model; + struct Range_encoder renc; + int reps[num_rep_distances]; + State state; + bool member_finished; + }; + +static void LZeb_reset( struct LZ_encoder_base * const eb, + const unsigned long long member_size ); + +static inline bool LZeb_init( struct LZ_encoder_base * const eb, + const int before_size, const int dict_size, + const int after_size, const int dict_factor, + const int num_prev_positions23, + const int pos_array_factor, + const unsigned min_free_bytes, + const unsigned long long member_size ) + { + if( !Mb_init( &eb->mb, before_size, dict_size, after_size, dict_factor, + num_prev_positions23, pos_array_factor ) ) return false; + if( !Re_init( &eb->renc, eb->mb.dictionary_size, min_free_bytes ) ) + return false; + LZeb_reset( eb, member_size ); + return true; + } + +static inline bool LZeb_member_finished( const struct LZ_encoder_base * const eb ) + { return ( eb->member_finished && Cb_empty( &eb->renc.cb ) ); } + +static inline void LZeb_free( struct LZ_encoder_base * const eb ) + { Re_free( &eb->renc ); Mb_free( &eb->mb ); } + +static inline unsigned LZeb_crc( const struct LZ_encoder_base * const eb ) + { return eb->crc ^ 0xFFFFFFFFU; } + +static inline int LZeb_price_literal( const struct LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol ) + { return price_symbol8( eb->bm_literal[get_lit_state(prev_byte)], symbol ); } + +static inline int LZeb_price_matched( const struct LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol, const uint8_t match_byte ) + { return price_matched( eb->bm_literal[get_lit_state(prev_byte)], symbol, + match_byte ); } + +static inline void LZeb_encode_literal( struct LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol ) + { Re_encode_tree8( &eb->renc, eb->bm_literal[get_lit_state(prev_byte)], symbol ); } + +static inline void LZeb_encode_matched( struct LZ_encoder_base * const eb, + const uint8_t prev_byte, const uint8_t symbol, const uint8_t match_byte ) + { Re_encode_matched( &eb->renc, eb->bm_literal[get_lit_state(prev_byte)], + symbol, match_byte ); } + +static inline void LZeb_encode_pair( struct LZ_encoder_base * const eb, + const unsigned dis, const int len, + const int pos_state ) + { + Re_encode_len( &eb->renc, &eb->match_len_model, len, pos_state ); + const unsigned dis_slot = get_slot( dis ); + Re_encode_tree6( &eb->renc, eb->bm_dis_slot[get_len_state(len)], dis_slot ); + + if( dis_slot >= start_dis_model ) + { + const int direct_bits = ( dis_slot >> 1 ) - 1; + const unsigned base = ( 2 | ( dis_slot & 1 ) ) << direct_bits; + const unsigned direct_dis = dis - base; + + if( dis_slot < end_dis_model ) + Re_encode_tree_reversed( &eb->renc, eb->bm_dis + ( base - dis_slot ), + direct_dis, direct_bits ); + else + { + Re_encode( &eb->renc, direct_dis >> dis_align_bits, + direct_bits - dis_align_bits ); + Re_encode_tree_reversed( &eb->renc, eb->bm_align, direct_dis, dis_align_bits ); + } + } + } diff --git a/fast_encoder.c b/fast_encoder.c new file mode 100644 index 0000000..618c3d6 --- /dev/null +++ b/fast_encoder.c @@ -0,0 +1,175 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +static int FLZe_longest_match_len( struct FLZ_encoder * const fe, int * const distance ) + { + enum { len_limit = 16 }; + int32_t * ptr0 = fe->eb.mb.pos_array + fe->eb.mb.cyclic_pos; + const int available = min( Mb_available_bytes( &fe->eb.mb ), max_match_len ); + if( available < len_limit ) { *ptr0 = 0; return 0; } + + const uint8_t * const data = Mb_ptr_to_current_pos( &fe->eb.mb ); + fe->key4 = ( ( fe->key4 << 4 ) ^ data[3] ) & fe->eb.mb.key4_mask; + const int pos1 = fe->eb.mb.pos + 1; + int newpos1 = fe->eb.mb.prev_positions[fe->key4]; + fe->eb.mb.prev_positions[fe->key4] = pos1; + int maxlen = 0, count; + + for( count = 4; ; ) + { + int delta; + if( newpos1 <= 0 || --count < 0 || + ( delta = pos1 - newpos1 ) > fe->eb.mb.dictionary_size ) + { *ptr0 = 0; break; } + int32_t * const newptr = fe->eb.mb.pos_array + + ( fe->eb.mb.cyclic_pos - delta + + ( ( fe->eb.mb.cyclic_pos >= delta ) ? 0 : fe->eb.mb.dictionary_size + 1 ) ); + + if( data[maxlen-delta] == data[maxlen] ) + { + int len = 0; + while( len < available && data[len-delta] == data[len] ) ++len; + if( maxlen < len ) + { maxlen = len; *distance = delta - 1; + if( maxlen >= len_limit ) { *ptr0 = *newptr; break; } } + } + + *ptr0 = newpos1; + ptr0 = newptr; + newpos1 = *ptr0; + } + return maxlen; + } + + +static bool FLZe_encode_member( struct FLZ_encoder * const fe ) + { + int rep = 0, i; + State * const state = &fe->eb.state; + + if( fe->eb.member_finished ) return true; + if( Re_member_position( &fe->eb.renc ) >= fe->eb.member_size_limit ) + { LZeb_try_full_flush( &fe->eb ); return true; } + + if( Mb_data_position( &fe->eb.mb ) == 0 && + !Mb_data_finished( &fe->eb.mb ) ) /* encode first byte */ + { + if( !Mb_enough_available_bytes( &fe->eb.mb ) || + !Re_enough_free_bytes( &fe->eb.renc ) ) return true; + const uint8_t prev_byte = 0; + const uint8_t cur_byte = Mb_peek( &fe->eb.mb, 0 ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][0], 0 ); + LZeb_encode_literal( &fe->eb, prev_byte, cur_byte ); + CRC32_update_byte( &fe->eb.crc, cur_byte ); + FLZe_reset_key4( fe ); + if( !FLZe_update_and_move( fe, 1 ) ) return false; + } + + while( !Mb_data_finished( &fe->eb.mb ) && + Re_member_position( &fe->eb.renc ) < fe->eb.member_size_limit ) + { + if( !Mb_enough_available_bytes( &fe->eb.mb ) || + !Re_enough_free_bytes( &fe->eb.renc ) ) return true; + int match_distance = 0; /* avoid warning from gcc 6.1.0 */ + const int main_len = FLZe_longest_match_len( fe, &match_distance ); + const int pos_state = Mb_data_position( &fe->eb.mb ) & pos_state_mask; + int len = 0; + + for( i = 0; i < num_rep_distances; ++i ) + { + const int tlen = Mb_true_match_len( &fe->eb.mb, 0, fe->eb.reps[i] + 1 ); + if( tlen > len ) { len = tlen; rep = i; } + } + if( len > min_match_len && len + 3 > main_len ) + { + CRC32_update_buf( &fe->eb.crc, Mb_ptr_to_current_pos( &fe->eb.mb ), len ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][pos_state], 1 ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep[*state], 1 ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep0[*state], rep != 0 ); + if( rep == 0 ) + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_len[*state][pos_state], 1 ); + else + { + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep1[*state], rep > 1 ); + if( rep > 1 ) + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep2[*state], rep > 2 ); + const int distance = fe->eb.reps[rep]; + for( i = rep; i > 0; --i ) fe->eb.reps[i] = fe->eb.reps[i-1]; + fe->eb.reps[0] = distance; + } + *state = St_set_rep( *state ); + Re_encode_len( &fe->eb.renc, &fe->eb.rep_len_model, len, pos_state ); + if( !Mb_move_pos( &fe->eb.mb ) ) return false; + if( !FLZe_update_and_move( fe, len - 1 ) ) return false; + continue; + } + + if( main_len > min_match_len ) + { + CRC32_update_buf( &fe->eb.crc, Mb_ptr_to_current_pos( &fe->eb.mb ), main_len ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][pos_state], 1 ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep[*state], 0 ); + *state = St_set_match( *state ); + for( i = num_rep_distances - 1; i > 0; --i ) fe->eb.reps[i] = fe->eb.reps[i-1]; + fe->eb.reps[0] = match_distance; + LZeb_encode_pair( &fe->eb, match_distance, main_len, pos_state ); + if( !Mb_move_pos( &fe->eb.mb ) ) return false; + if( !FLZe_update_and_move( fe, main_len - 1 ) ) return false; + continue; + } + + const uint8_t prev_byte = Mb_peek( &fe->eb.mb, 1 ); + const uint8_t cur_byte = Mb_peek( &fe->eb.mb, 0 ); + const uint8_t match_byte = Mb_peek( &fe->eb.mb, fe->eb.reps[0] + 1 ); + if( !Mb_move_pos( &fe->eb.mb ) ) return false; + CRC32_update_byte( &fe->eb.crc, cur_byte ); + + if( match_byte == cur_byte ) + { + const int short_rep_price = price1( fe->eb.bm_match[*state][pos_state] ) + + price1( fe->eb.bm_rep[*state] ) + + price0( fe->eb.bm_rep0[*state] ) + + price0( fe->eb.bm_len[*state][pos_state] ); + int price = price0( fe->eb.bm_match[*state][pos_state] ); + if( St_is_char( *state ) ) + price += LZeb_price_literal( &fe->eb, prev_byte, cur_byte ); + else + price += LZeb_price_matched( &fe->eb, prev_byte, cur_byte, match_byte ); + if( short_rep_price < price ) + { + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][pos_state], 1 ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep[*state], 1 ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_rep0[*state], 0 ); + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_len[*state][pos_state], 0 ); + *state = St_set_short_rep( *state ); + continue; + } + } + + /* literal byte */ + Re_encode_bit( &fe->eb.renc, &fe->eb.bm_match[*state][pos_state], 0 ); + if( ( *state = St_set_char( *state ) ) < 4 ) + LZeb_encode_literal( &fe->eb, prev_byte, cur_byte ); + else + LZeb_encode_matched( &fe->eb, prev_byte, cur_byte, match_byte ); + } + + LZeb_try_full_flush( &fe->eb ); + return true; + } diff --git a/fast_encoder.h b/fast_encoder.h new file mode 100644 index 0000000..54756bd --- /dev/null +++ b/fast_encoder.h @@ -0,0 +1,70 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +struct FLZ_encoder + { + struct LZ_encoder_base eb; + unsigned key4; /* key made from latest 4 bytes */ + }; + +static inline void FLZe_reset_key4( struct FLZ_encoder * const fe ) + { + int i; + fe->key4 = 0; + for( i = 0; i < 3 && i < Mb_available_bytes( &fe->eb.mb ); ++i ) + fe->key4 = ( fe->key4 << 4 ) ^ fe->eb.mb.buffer[i]; + } + +static inline bool FLZe_update_and_move( struct FLZ_encoder * const fe, int n ) + { + struct Matchfinder_base * const mb = &fe->eb.mb; + while( --n >= 0 ) + { + if( Mb_available_bytes( mb ) >= 4 ) + { + fe->key4 = ( ( fe->key4 << 4 ) ^ mb->buffer[mb->pos+3] ) & mb->key4_mask; + mb->pos_array[mb->cyclic_pos] = mb->prev_positions[fe->key4]; + mb->prev_positions[fe->key4] = mb->pos + 1; + } + else mb->pos_array[mb->cyclic_pos] = 0; + if( !Mb_move_pos( mb ) ) return false; + } + return true; + } + +static inline bool FLZe_init( struct FLZ_encoder * const fe, + const unsigned long long member_size ) + { + enum { before_size = 0, + dict_size = 65536, + /* bytes to keep in buffer after pos */ + after_size = max_match_len, + dict_factor = 16, + min_free_bytes = max_marker_size, + num_prev_positions23 = 0, + pos_array_factor = 1 }; + + return LZeb_init( &fe->eb, before_size, dict_size, after_size, dict_factor, + num_prev_positions23, pos_array_factor, min_free_bytes, + member_size ); + } + +static inline void FLZe_reset( struct FLZ_encoder * const fe, + const unsigned long long member_size ) + { LZeb_reset( &fe->eb, member_size ); } diff --git a/ffexample.c b/ffexample.c new file mode 100644 index 0000000..59345ee --- /dev/null +++ b/ffexample.c @@ -0,0 +1,300 @@ +/* File to file example - Test program for the library lzlib + Copyright (C) 2010-2022 Antonio Diaz Diaz. + + This program is free software: you have unlimited permission + to copy, distribute, and modify it. + + Try 'ffexample -h' for usage information. + + This program is an example of how file-to-file + compression/decompression can be implemented using lzlib. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <errno.h> +#include <limits.h> +#include <stdbool.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ +#include <fcntl.h> +#include <io.h> +#endif + +#include "lzlib.h" + +#ifndef min + #define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif + + +static void show_help( void ) + { + printf( "ffexample is an example program showing how file-to-file (de)compression can\n" + "be implemented using lzlib. The content of infile is compressed,\n" + "decompressed, or both, and then written to outfile.\n" + "\nUsage: ffexample operation [infile [outfile]]\n" ); + printf( "\nOperation:\n" + " -h display this help and exit\n" + " -c compress infile to outfile\n" + " -d decompress infile to outfile\n" + " -b both (compress then decompress) infile to outfile\n" + " -m compress (multimember) infile to outfile\n" + " -l compress (1 member per line) infile to outfile\n" + " -r decompress with resync if data error or leading garbage\n" + "\nIf infile or outfile are omitted, or are specified as '-', standard input or\n" + "standard output are used in their place respectively.\n" + "\nReport bugs to lzip-bug@nongnu.org\n" + "Lzlib home page: http://www.nongnu.org/lzip/lzlib.html\n" ); + } + + +int ffcompress( struct LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_finished( encoder ) == 1 ) return 0; + } + return 1; + } + + +int ffdecompress( struct LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +int ffboth( struct LZ_Encoder * const encoder, + struct LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + ret = LZ_compress_read( encoder, buffer, size ); + if( ret < 0 ) break; + ret = LZ_decompress_write( decoder, buffer, ret ); + if( ret < 0 ) break; + if( LZ_compress_finished( encoder ) == 1 ) + LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +int ffmmcompress( FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384, member_size = 4096 }; + uint8_t buffer[buffer_size]; + bool done = false; + struct LZ_Encoder * const encoder = + LZ_compress_open( 65535, 16, member_size ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); return 1; } + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( LZ_compress_finished( encoder ) == 1 ) { done = true; break; } + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) break; + } + } + if( LZ_compress_close( encoder ) < 0 ) done = false; + return done; + } + + +/* Compress 'infile' to 'outfile' as a multimember stream with one member + for each line of text terminated by a newline character or by EOF. + Return 0 if success, 1 if error. +*/ +int fflfcompress( struct LZ_Encoder * const encoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_compress_write_size( encoder ) ); + if( size > 0 ) + { + for( len = 0; len < size; ) + { + int ch = getc( infile ); + if( ch == EOF || ( buffer[len++] = ch ) == '\n' ) break; + } + /* avoid writing an empty member to outfile */ + if( len == 0 && LZ_compress_data_position( encoder ) == 0 ) return 0; + ret = LZ_compress_write( encoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) || buffer[len-1] == '\n' ) + LZ_compress_finish( encoder ); + } + ret = LZ_compress_read( encoder, buffer, buffer_size ); + if( ret < 0 ) break; + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_compress_member_finished( encoder ) == 1 ) + { + if( feof( infile ) && LZ_compress_finished( encoder ) == 1 ) return 0; + if( LZ_compress_restart_member( encoder, INT64_MAX ) < 0 ) break; + } + } + return 1; + } + + +/* Decompress 'infile' to 'outfile' with automatic resynchronization to + next member in case of data error, including the automatic removal of + leading garbage. +*/ +int ffrsdecompress( struct LZ_Decoder * const decoder, + FILE * const infile, FILE * const outfile ) + { + enum { buffer_size = 16384 }; + uint8_t buffer[buffer_size]; + while( true ) + { + int len, ret; + int size = min( buffer_size, LZ_decompress_write_size( decoder ) ); + if( size > 0 ) + { + len = fread( buffer, 1, size, infile ); + ret = LZ_decompress_write( decoder, buffer, len ); + if( ret < 0 || ferror( infile ) ) break; + if( feof( infile ) ) LZ_decompress_finish( decoder ); + } + ret = LZ_decompress_read( decoder, buffer, buffer_size ); + if( ret < 0 ) + { + if( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) + { LZ_decompress_sync_to_member( decoder ); continue; } + else break; + } + len = fwrite( buffer, 1, ret, outfile ); + if( len < ret ) break; + if( LZ_decompress_finished( decoder ) == 1 ) return 0; + } + return 1; + } + + +int main( const int argc, const char * const argv[] ) + { +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ + setmode( STDIN_FILENO, O_BINARY ); + setmode( STDOUT_FILENO, O_BINARY ); +#endif + + struct LZ_Encoder * const encoder = LZ_compress_open( 65535, 16, INT64_MAX ); + struct LZ_Decoder * const decoder = LZ_decompress_open(); + FILE * const infile = ( argc >= 3 && strcmp( argv[2], "-" ) != 0 ) ? + fopen( argv[2], "rb" ) : stdin; + FILE * const outfile = ( argc >= 4 && strcmp( argv[3], "-" ) != 0 ) ? + fopen( argv[3], "wb" ) : stdout; + int retval; + + if( argc < 2 || argc > 4 || strlen( argv[1] ) != 2 || argv[1][0] != '-' ) + { show_help(); return 1; } + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok || + !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { fputs( "ffexample: Not enough memory.\n", stderr ); + LZ_compress_close( encoder ); LZ_decompress_close( decoder ); return 1; } + if( !infile ) + { fprintf( stderr, "ffexample: Can't open input file '%s': %s\n", + argv[2], strerror( errno ) ); return 1; } + if( !outfile ) + { fprintf( stderr, "ffexample: Can't open output file '%s': %s\n", + argv[3], strerror( errno ) ); return 1; } + + switch( argv[1][1] ) + { + case 'c': retval = ffcompress( encoder, infile, outfile ); break; + case 'd': retval = ffdecompress( decoder, infile, outfile ); break; + case 'b': retval = ffboth( encoder, decoder, infile, outfile ); break; + case 'm': retval = ffmmcompress( infile, outfile ); break; + case 'l': retval = fflfcompress( encoder, infile, outfile ); break; + case 'r': retval = ffrsdecompress( decoder, infile, outfile ); break; + default: show_help(); return ( argv[1][1] != 'h' ); + } + + if( LZ_decompress_close( decoder ) < 0 || LZ_compress_close( encoder ) < 0 || + fclose( outfile ) != 0 || fclose( infile ) != 0 ) retval = 1; + return retval; + } diff --git a/lzcheck.c b/lzcheck.c new file mode 100644 index 0000000..88dd4c9 --- /dev/null +++ b/lzcheck.c @@ -0,0 +1,367 @@ +/* Lzcheck - Test program for the library lzlib + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This program is free software: you have unlimited permission + to copy, distribute, and modify it. + + Usage: lzcheck [-m|-s] filename.txt... + + This program reads each text file specified and then compresses it, + line by line, to test the flushing mechanism and the member + restart/reset/sync functions. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <ctype.h> +#include <stdbool.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <sys/stat.h> + +#include "lzlib.h" + + +const unsigned long long member_size = INT64_MAX; +enum { buffer_size = 32768 }; +uint8_t in_buffer[buffer_size]; +uint8_t mid_buffer[buffer_size]; +uint8_t out_buffer[buffer_size]; + + +static void show_line( const uint8_t * const buffer, const int size ) + { + int i; + for( i = 0; i < size; ++i ) + fputc( isprint( buffer[i] ) ? buffer[i] : '.', stderr ); + fputc( '\n', stderr ); + } + + +static struct LZ_Encoder * xopen_encoder( const int dictionary_size ) + { + const int match_len_limit = 16; + struct LZ_Encoder * const encoder = + LZ_compress_open( dictionary_size, match_len_limit, member_size ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { + const bool bad_arg = + encoder && ( LZ_compress_errno( encoder ) == LZ_bad_argument ); + LZ_compress_close( encoder ); + if( bad_arg ) + { + fputs( "lzcheck: internal error: Invalid argument to encoder.\n", stderr ); + exit( 3 ); + } + fputs( "lzcheck: Not enough memory.\n", stderr ); + exit( 1 ); + } + return encoder; + } + +static struct LZ_Decoder * xopen_decoder( void ) + { + struct LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { + LZ_decompress_close( decoder ); + fputs( "lzcheck: Not enough memory.\n", stderr ); + exit( 1 ); + } + return decoder; + } + + +static void xclose_encoder( struct LZ_Encoder * const encoder, + const bool finish ) + { + if( finish ) + { + unsigned long long size = 0; + LZ_compress_finish( encoder ); + while( true ) + { + const int rd = LZ_compress_read( encoder, mid_buffer, buffer_size ); + if( rd < 0 ) + { + fprintf( stderr, "lzcheck: xclose: LZ_compress_read error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + exit( 3 ); + } + size += rd; + if( LZ_compress_finished( encoder ) == 1 ) break; + } + if( size > 0 ) + { + fprintf( stderr, "lzcheck: %lld bytes remain in encoder.\n", size ); + exit( 3 ); + } + } + if( LZ_compress_close( encoder ) < 0 ) exit( 1 ); + } + + +static void xclose_decoder( struct LZ_Decoder * const decoder, + const bool finish ) + { + if( finish ) + { + unsigned long long size = 0; + LZ_decompress_finish( decoder ); + while( true ) + { + const int rd = LZ_decompress_read( decoder, out_buffer, buffer_size ); + if( rd < 0 ) + { + fprintf( stderr, "lzcheck: xclose: LZ_decompress_read error: %s\n", + LZ_strerror( LZ_decompress_errno( decoder ) ) ); + exit( 3 ); + } + size += rd; + if( LZ_decompress_finished( decoder ) == 1 ) break; + } + if( size > 0 ) + { + fprintf( stderr, "lzcheck: %lld bytes remain in decoder.\n", size ); + exit( 3 ); + } + } + if( LZ_decompress_close( decoder ) < 0 ) exit( 1 ); + } + + +/* Return the next (usually newline-terminated) chunk of data from file. + The size returned in *sizep is always <= buffer_size. + If sizep is a null pointer, rewind the file, reset state, and return. + If file is at EOF, return an empty line. +*/ +static const uint8_t * next_line( FILE * const file, int * const sizep ) + { + static int l = 0; + static int read_size = 0; + int r; + + if( !sizep ) { rewind( file ); l = read_size = 0; return in_buffer; } + if( l >= read_size ) + { + l = 0; read_size = fread( in_buffer, 1, buffer_size, file ); + if( l >= read_size ) { *sizep = 0; return in_buffer; } /* end of file */ + } + + for( r = l + 1; r < read_size && in_buffer[r-1] != '\n'; ++r ); + *sizep = r - l; l = r; + return in_buffer + l - *sizep; + } + + +static int check_sync_flush( FILE * const file, const int dictionary_size ) + { + struct LZ_Encoder * const encoder = xopen_encoder( dictionary_size ); + struct LZ_Decoder * const decoder = xopen_decoder(); + int retval = 0; + + while( retval <= 1 ) /* test LZ_compress_sync_flush */ + { + int in_size, mid_size, out_size; + int line_size; + const uint8_t * const line_buf = next_line( file, &line_size ); + if( line_size <= 0 ) break; /* end of file */ + + in_size = LZ_compress_write( encoder, line_buf, line_size ); + if( in_size < line_size ) + fprintf( stderr, "lzcheck: sync: LZ_compress_write only accepted %d of %d bytes\n", + in_size, line_size ); + LZ_compress_sync_flush( encoder ); + if( line_buf[0] & 1 ) /* read all data at once or byte by byte */ + mid_size = LZ_compress_read( encoder, mid_buffer, buffer_size ); + else for( mid_size = 0; mid_size < buffer_size; ) + { + const int rd = LZ_compress_read( encoder, mid_buffer + mid_size, 1 ); + if( rd > 0 ) mid_size += rd; + else { if( rd < 0 ) { mid_size = -1; } break; } + } + if( mid_size < 0 ) + { + fprintf( stderr, "lzcheck: LZ_compress_read error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + retval = 3; break; + } + LZ_decompress_write( decoder, mid_buffer, mid_size ); + out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); + if( out_size < 0 ) + { + fprintf( stderr, "lzcheck: LZ_decompress_read error: %s\n", + LZ_strerror( LZ_decompress_errno( decoder ) ) ); + retval = 3; break; + } + + if( out_size != in_size || memcmp( line_buf, out_buffer, out_size ) ) + { + fprintf( stderr, "lzcheck: LZ_compress_sync_flush error: " + "in_size = %d, out_size = %d\n", in_size, out_size ); + show_line( line_buf, in_size ); + show_line( out_buffer, out_size ); + retval = 1; + } + } + + if( retval <= 1 ) + { + int rd = 0; + if( LZ_compress_finish( encoder ) < 0 || + ( rd = LZ_compress_read( encoder, mid_buffer, buffer_size ) ) < 0 ) + { + fprintf( stderr, "lzcheck: Can't drain encoder: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + retval = 3; + } + LZ_decompress_write( decoder, mid_buffer, rd ); + } + + xclose_decoder( decoder, retval == 0 ); + xclose_encoder( encoder, retval == 0 ); + return retval; + } + + +/* Test member by member decompression without calling LZ_decompress_finish, + inserting leading garbage before some members, and resetting the + decompressor sometimes. Test that the increase in total_in_size when + syncing to member is equal to the size of the leading garbage skipped. +*/ +static int check_members( FILE * const file, const int dictionary_size ) + { + struct LZ_Encoder * const encoder = xopen_encoder( dictionary_size ); + struct LZ_Decoder * const decoder = xopen_decoder(); + int retval = 0; + + while( retval <= 1 ) /* test LZ_compress_restart_member */ + { + unsigned long long garbage_begin = 0; /* avoid warning from gcc 3.3.6 */ + int leading_garbage, in_size, mid_size, out_size; + int line_size; + const uint8_t * const line_buf = next_line( file, &line_size ); + if( line_size <= 0 && /* end of file, write at least 1 member */ + LZ_decompress_total_in_size( decoder ) != 0 ) break; + + if( LZ_compress_finished( encoder ) == 1 ) + { + if( LZ_compress_restart_member( encoder, member_size ) < 0 ) + { + fprintf( stderr, "lzcheck: Can't restart member: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + retval = 3; break; + } + if( line_size >= 2 && line_buf[1] == 'h' ) + LZ_decompress_reset( decoder ); + } + in_size = LZ_compress_write( encoder, line_buf, line_size ); + if( in_size < line_size ) + fprintf( stderr, "lzcheck: member: LZ_compress_write only accepted %d of %d bytes\n", + in_size, line_size ); + LZ_compress_finish( encoder ); + if( line_size * 3 < buffer_size && line_buf[0] == 't' ) + { leading_garbage = line_size; + memset( mid_buffer, in_buffer[0], leading_garbage ); + garbage_begin = LZ_decompress_total_in_size( decoder ); } + else leading_garbage = 0; + mid_size = LZ_compress_read( encoder, mid_buffer + leading_garbage, + buffer_size - leading_garbage ); + if( mid_size < 0 ) + { + fprintf( stderr, "lzcheck: member: LZ_compress_read error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + retval = 3; break; + } + LZ_decompress_write( decoder, mid_buffer, leading_garbage + mid_size ); + out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); + if( out_size < 0 ) + { + if( leading_garbage && + ( LZ_decompress_errno( decoder ) == LZ_header_error || + LZ_decompress_errno( decoder ) == LZ_data_error ) ) + { + LZ_decompress_sync_to_member( decoder ); /* skip leading garbage */ + const unsigned long long garbage_end = + LZ_decompress_total_in_size( decoder ); + if( garbage_end - garbage_begin != (unsigned)leading_garbage ) + { + fprintf( stderr, "lzcheck: member: LZ_decompress_sync_to_member error:\n" + " garbage_begin = %llu garbage_end = %llu " + "difference = %llu expected = %d\n", garbage_begin, + garbage_end, garbage_end - garbage_begin, leading_garbage ); + retval = 3; break; + } + out_size = LZ_decompress_read( decoder, out_buffer, buffer_size ); + } + if( out_size < 0 ) + { + fprintf( stderr, "lzcheck: member: LZ_decompress_read error: %s\n", + LZ_strerror( LZ_decompress_errno( decoder ) ) ); + retval = 3; break; + } + } + + if( out_size != in_size || memcmp( line_buf, out_buffer, out_size ) ) + { + fprintf( stderr, "lzcheck: LZ_compress_restart_member error: " + "in_size = %d, out_size = %d\n", in_size, out_size ); + show_line( line_buf, in_size ); + show_line( out_buffer, out_size ); + retval = 1; + } + } + + xclose_decoder( decoder, retval == 0 ); + xclose_encoder( encoder, retval == 0 ); + return retval; + } + + +int main( const int argc, const char * const argv[] ) + { + int retval = 0, i; + int open_failures = 0; + const char opt = ( argc > 2 && + ( strcmp( argv[1], "-m" ) == 0 || strcmp( argv[1], "-s" ) == 0 ) ) ? + argv[1][1] : 0; + const int first = opt ? 2 : 1; + const bool verbose = ( opt != 0 || argc > first + 1 ); + + if( argc < 2 ) + { + fputs( "Usage: lzcheck [-m|-s] filename.txt...\n", stderr ); + return 1; + } + + for( i = first; i < argc && retval == 0; ++i ) + { + struct stat st; + if( stat( argv[i], &st ) != 0 || !S_ISREG( st.st_mode ) ) continue; + FILE * file = fopen( argv[i], "rb" ); + if( !file ) + { + fprintf( stderr, "lzcheck: Can't open file '%s' for reading.\n", argv[i] ); + ++open_failures; continue; + } + if( verbose ) fprintf( stderr, " Testing file '%s'\n", argv[i] ); + + /* 65535,16 chooses fast encoder */ + if( opt != 'm' ) retval = check_sync_flush( file, 65535 ); + if( retval == 0 && opt != 'm' ) + { next_line( file, 0 ); retval = check_sync_flush( file, 1 << 20 ); } + if( retval == 0 && opt != 's' ) + { next_line( file, 0 ); retval = check_members( file, 65535 ); } + if( retval == 0 && opt != 's' ) + { next_line( file, 0 ); retval = check_members( file, 1 << 20 ); } + fclose( file ); + } + if( open_failures > 0 && verbose ) + fprintf( stderr, "lzcheck: warning: %d %s failed to open.\n", + open_failures, ( open_failures == 1 ) ? "file" : "files" ); + if( retval == 0 && open_failures ) retval = 1; + return retval; + } @@ -0,0 +1,298 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +#ifndef max + #define max(x,y) ((x) >= (y) ? (x) : (y)) +#endif +#ifndef min + #define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif + +typedef int State; + +enum { states = 12 }; + +static inline bool St_is_char( const State st ) { return st < 7; } + +static inline State St_set_char( const State st ) + { + static const State next[states] = { 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 4, 5 }; + return next[st]; + } + +static inline State St_set_char_rep() { return 8; } + +static inline State St_set_match( const State st ) + { return ( ( st < 7 ) ? 7 : 10 ); } + +static inline State St_set_rep( const State st ) + { return ( ( st < 7 ) ? 8 : 11 ); } + +static inline State St_set_short_rep( const State st ) + { return ( ( st < 7 ) ? 9 : 11 ); } + + +enum { + min_dictionary_bits = 12, + min_dictionary_size = 1 << min_dictionary_bits, /* >= modeled_distances */ + max_dictionary_bits = 29, + max_dictionary_size = 1 << max_dictionary_bits, + literal_context_bits = 3, + literal_pos_state_bits = 0, /* not used */ + pos_state_bits = 2, + pos_states = 1 << pos_state_bits, + pos_state_mask = pos_states - 1, + + len_states = 4, + dis_slot_bits = 6, + start_dis_model = 4, + end_dis_model = 14, + modeled_distances = 1 << (end_dis_model / 2), /* 128 */ + dis_align_bits = 4, + dis_align_size = 1 << dis_align_bits, + + len_low_bits = 3, + len_mid_bits = 3, + len_high_bits = 8, + len_low_symbols = 1 << len_low_bits, + len_mid_symbols = 1 << len_mid_bits, + len_high_symbols = 1 << len_high_bits, + max_len_symbols = len_low_symbols + len_mid_symbols + len_high_symbols, + + min_match_len = 2, /* must be 2 */ + max_match_len = min_match_len + max_len_symbols - 1, /* 273 */ + min_match_len_limit = 5 }; + +static inline int get_len_state( const int len ) + { return min( len - min_match_len, len_states - 1 ); } + +static inline int get_lit_state( const uint8_t prev_byte ) + { return prev_byte >> ( 8 - literal_context_bits ); } + + +enum { bit_model_move_bits = 5, + bit_model_total_bits = 11, + bit_model_total = 1 << bit_model_total_bits }; + +typedef int Bit_model; + +static inline void Bm_init( Bit_model * const probability ) + { *probability = bit_model_total / 2; } + +static inline void Bm_array_init( Bit_model bm[], const int size ) + { int i; for( i = 0; i < size; ++i ) Bm_init( &bm[i] ); } + +struct Len_model + { + Bit_model choice1; + Bit_model choice2; + Bit_model bm_low[pos_states][len_low_symbols]; + Bit_model bm_mid[pos_states][len_mid_symbols]; + Bit_model bm_high[len_high_symbols]; + }; + +static inline void Lm_init( struct Len_model * const lm ) + { + Bm_init( &lm->choice1 ); + Bm_init( &lm->choice2 ); + Bm_array_init( lm->bm_low[0], pos_states * len_low_symbols ); + Bm_array_init( lm->bm_mid[0], pos_states * len_mid_symbols ); + Bm_array_init( lm->bm_high, len_high_symbols ); + } + + +/* Table of CRCs of all 8-bit messages. */ +static const uint32_t crc32[256] = + { + 0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA, 0x076DC419, 0x706AF48F, + 0xE963A535, 0x9E6495A3, 0x0EDB8832, 0x79DCB8A4, 0xE0D5E91E, 0x97D2D988, + 0x09B64C2B, 0x7EB17CBD, 0xE7B82D07, 0x90BF1D91, 0x1DB71064, 0x6AB020F2, + 0xF3B97148, 0x84BE41DE, 0x1ADAD47D, 0x6DDDE4EB, 0xF4D4B551, 0x83D385C7, + 0x136C9856, 0x646BA8C0, 0xFD62F97A, 0x8A65C9EC, 0x14015C4F, 0x63066CD9, + 0xFA0F3D63, 0x8D080DF5, 0x3B6E20C8, 0x4C69105E, 0xD56041E4, 0xA2677172, + 0x3C03E4D1, 0x4B04D447, 0xD20D85FD, 0xA50AB56B, 0x35B5A8FA, 0x42B2986C, + 0xDBBBC9D6, 0xACBCF940, 0x32D86CE3, 0x45DF5C75, 0xDCD60DCF, 0xABD13D59, + 0x26D930AC, 0x51DE003A, 0xC8D75180, 0xBFD06116, 0x21B4F4B5, 0x56B3C423, + 0xCFBA9599, 0xB8BDA50F, 0x2802B89E, 0x5F058808, 0xC60CD9B2, 0xB10BE924, + 0x2F6F7C87, 0x58684C11, 0xC1611DAB, 0xB6662D3D, 0x76DC4190, 0x01DB7106, + 0x98D220BC, 0xEFD5102A, 0x71B18589, 0x06B6B51F, 0x9FBFE4A5, 0xE8B8D433, + 0x7807C9A2, 0x0F00F934, 0x9609A88E, 0xE10E9818, 0x7F6A0DBB, 0x086D3D2D, + 0x91646C97, 0xE6635C01, 0x6B6B51F4, 0x1C6C6162, 0x856530D8, 0xF262004E, + 0x6C0695ED, 0x1B01A57B, 0x8208F4C1, 0xF50FC457, 0x65B0D9C6, 0x12B7E950, + 0x8BBEB8EA, 0xFCB9887C, 0x62DD1DDF, 0x15DA2D49, 0x8CD37CF3, 0xFBD44C65, + 0x4DB26158, 0x3AB551CE, 0xA3BC0074, 0xD4BB30E2, 0x4ADFA541, 0x3DD895D7, + 0xA4D1C46D, 0xD3D6F4FB, 0x4369E96A, 0x346ED9FC, 0xAD678846, 0xDA60B8D0, + 0x44042D73, 0x33031DE5, 0xAA0A4C5F, 0xDD0D7CC9, 0x5005713C, 0x270241AA, + 0xBE0B1010, 0xC90C2086, 0x5768B525, 0x206F85B3, 0xB966D409, 0xCE61E49F, + 0x5EDEF90E, 0x29D9C998, 0xB0D09822, 0xC7D7A8B4, 0x59B33D17, 0x2EB40D81, + 0xB7BD5C3B, 0xC0BA6CAD, 0xEDB88320, 0x9ABFB3B6, 0x03B6E20C, 0x74B1D29A, + 0xEAD54739, 0x9DD277AF, 0x04DB2615, 0x73DC1683, 0xE3630B12, 0x94643B84, + 0x0D6D6A3E, 0x7A6A5AA8, 0xE40ECF0B, 0x9309FF9D, 0x0A00AE27, 0x7D079EB1, + 0xF00F9344, 0x8708A3D2, 0x1E01F268, 0x6906C2FE, 0xF762575D, 0x806567CB, + 0x196C3671, 0x6E6B06E7, 0xFED41B76, 0x89D32BE0, 0x10DA7A5A, 0x67DD4ACC, + 0xF9B9DF6F, 0x8EBEEFF9, 0x17B7BE43, 0x60B08ED5, 0xD6D6A3E8, 0xA1D1937E, + 0x38D8C2C4, 0x4FDFF252, 0xD1BB67F1, 0xA6BC5767, 0x3FB506DD, 0x48B2364B, + 0xD80D2BDA, 0xAF0A1B4C, 0x36034AF6, 0x41047A60, 0xDF60EFC3, 0xA867DF55, + 0x316E8EEF, 0x4669BE79, 0xCB61B38C, 0xBC66831A, 0x256FD2A0, 0x5268E236, + 0xCC0C7795, 0xBB0B4703, 0x220216B9, 0x5505262F, 0xC5BA3BBE, 0xB2BD0B28, + 0x2BB45A92, 0x5CB36A04, 0xC2D7FFA7, 0xB5D0CF31, 0x2CD99E8B, 0x5BDEAE1D, + 0x9B64C2B0, 0xEC63F226, 0x756AA39C, 0x026D930A, 0x9C0906A9, 0xEB0E363F, + 0x72076785, 0x05005713, 0x95BF4A82, 0xE2B87A14, 0x7BB12BAE, 0x0CB61B38, + 0x92D28E9B, 0xE5D5BE0D, 0x7CDCEFB7, 0x0BDBDF21, 0x86D3D2D4, 0xF1D4E242, + 0x68DDB3F8, 0x1FDA836E, 0x81BE16CD, 0xF6B9265B, 0x6FB077E1, 0x18B74777, + 0x88085AE6, 0xFF0F6A70, 0x66063BCA, 0x11010B5C, 0x8F659EFF, 0xF862AE69, + 0x616BFFD3, 0x166CCF45, 0xA00AE278, 0xD70DD2EE, 0x4E048354, 0x3903B3C2, + 0xA7672661, 0xD06016F7, 0x4969474D, 0x3E6E77DB, 0xAED16A4A, 0xD9D65ADC, + 0x40DF0B66, 0x37D83BF0, 0xA9BCAE53, 0xDEBB9EC5, 0x47B2CF7F, 0x30B5FFE9, + 0xBDBDF21C, 0xCABAC28A, 0x53B39330, 0x24B4A3A6, 0xBAD03605, 0xCDD70693, + 0x54DE5729, 0x23D967BF, 0xB3667A2E, 0xC4614AB8, 0x5D681B02, 0x2A6F2B94, + 0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B, 0x2D02EF8D }; + + +static inline void CRC32_update_byte( uint32_t * const crc, const uint8_t byte ) + { *crc = crc32[(*crc^byte)&0xFF] ^ ( *crc >> 8 ); } + +/* about as fast as it is possible without messing with endianness */ +static inline void CRC32_update_buf( uint32_t * const crc, + const uint8_t * const buffer, + const int size ) + { + int i; + uint32_t c = *crc; + for( i = 0; i < size; ++i ) + c = crc32[(c^buffer[i])&0xFF] ^ ( c >> 8 ); + *crc = c; + } + + +static inline bool isvalid_ds( const unsigned dictionary_size ) + { return ( dictionary_size >= min_dictionary_size && + dictionary_size <= max_dictionary_size ); } + + +static inline int real_bits( unsigned value ) + { + int bits = 0; + while( value > 0 ) { value >>= 1; ++bits; } + return bits; + } + + +static const uint8_t lzip_magic[4] = { 0x4C, 0x5A, 0x49, 0x50 }; /* "LZIP" */ + +typedef uint8_t Lzip_header[6]; /* 0-3 magic bytes */ + /* 4 version */ + /* 5 coded dictionary size */ +enum { Lh_size = 6 }; + +static inline void Lh_set_magic( Lzip_header data ) + { memcpy( data, lzip_magic, 4 ); data[4] = 1; } + +static inline bool Lh_verify_magic( const Lzip_header data ) + { return ( memcmp( data, lzip_magic, 4 ) == 0 ); } + +/* detect (truncated) header */ +static inline bool Lh_verify_prefix( const Lzip_header data, const int sz ) + { + int i; for( i = 0; i < sz && i < 4; ++i ) + if( data[i] != lzip_magic[i] ) return false; + return ( sz > 0 ); + } + +/* detect corrupt header */ +static inline bool Lh_verify_corrupt( const Lzip_header data ) + { + int matches = 0; + int i; for( i = 0; i < 4; ++i ) + if( data[i] == lzip_magic[i] ) ++matches; + return ( matches > 1 && matches < 4 ); + } + +static inline uint8_t Lh_version( const Lzip_header data ) + { return data[4]; } + +static inline bool Lh_verify_version( const Lzip_header data ) + { return ( data[4] == 1 ); } + +static inline unsigned Lh_get_dictionary_size( const Lzip_header data ) + { + unsigned sz = ( 1 << ( data[5] & 0x1F ) ); + if( sz > min_dictionary_size ) + sz -= ( sz / 16 ) * ( ( data[5] >> 5 ) & 7 ); + return sz; + } + +static inline bool Lh_set_dictionary_size( Lzip_header data, const unsigned sz ) + { + if( !isvalid_ds( sz ) ) return false; + data[5] = real_bits( sz - 1 ); + if( sz > min_dictionary_size ) + { + const unsigned base_size = 1 << data[5]; + const unsigned fraction = base_size / 16; + unsigned i; + for( i = 7; i >= 1; --i ) + if( base_size - ( i * fraction ) >= sz ) + { data[5] |= ( i << 5 ); break; } + } + return true; + } + +static inline bool Lh_verify( const Lzip_header data ) + { + return Lh_verify_magic( data ) && Lh_verify_version( data ) && + isvalid_ds( Lh_get_dictionary_size( data ) ); + } + + +typedef uint8_t Lzip_trailer[20]; + /* 0-3 CRC32 of the uncompressed data */ + /* 4-11 size of the uncompressed data */ + /* 12-19 member size including header and trailer */ +enum { Lt_size = 20 }; + +static inline unsigned Lt_get_data_crc( const Lzip_trailer data ) + { + unsigned tmp = 0; + int i; for( i = 3; i >= 0; --i ) { tmp <<= 8; tmp += data[i]; } + return tmp; + } + +static inline void Lt_set_data_crc( Lzip_trailer data, unsigned crc ) + { int i; for( i = 0; i <= 3; ++i ) { data[i] = (uint8_t)crc; crc >>= 8; } } + +static inline unsigned long long Lt_get_data_size( const Lzip_trailer data ) + { + unsigned long long tmp = 0; + int i; for( i = 11; i >= 4; --i ) { tmp <<= 8; tmp += data[i]; } + return tmp; + } + +static inline void Lt_set_data_size( Lzip_trailer data, unsigned long long sz ) + { int i; for( i = 4; i <= 11; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } } + +static inline unsigned long long Lt_get_member_size( const Lzip_trailer data ) + { + unsigned long long tmp = 0; + int i; for( i = 19; i >= 12; --i ) { tmp <<= 8; tmp += data[i]; } + return tmp; + } + +static inline void Lt_set_member_size( Lzip_trailer data, unsigned long long sz ) + { int i; for( i = 12; i <= 19; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } } @@ -0,0 +1,601 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +#include <stdbool.h> +#include <stdint.h> +#include <stdlib.h> +#include <string.h> + +#include "lzlib.h" +#include "lzip.h" +#include "cbuffer.c" +#include "decoder.h" +#include "decoder.c" +#include "encoder_base.h" +#include "encoder_base.c" +#include "encoder.h" +#include "encoder.c" +#include "fast_encoder.h" +#include "fast_encoder.c" + + +struct LZ_Encoder + { + unsigned long long partial_in_size; + unsigned long long partial_out_size; + struct LZ_encoder_base * lz_encoder_base; /* these 3 pointers make a */ + struct LZ_encoder * lz_encoder; /* polymorphic encoder */ + struct FLZ_encoder * flz_encoder; + enum LZ_Errno lz_errno; + bool fatal; + }; + +static void LZ_Encoder_init( struct LZ_Encoder * const e ) + { + e->partial_in_size = 0; + e->partial_out_size = 0; + e->lz_encoder_base = 0; + e->lz_encoder = 0; + e->flz_encoder = 0; + e->lz_errno = LZ_ok; + e->fatal = false; + } + + +struct LZ_Decoder + { + unsigned long long partial_in_size; + unsigned long long partial_out_size; + struct Range_decoder * rdec; + struct LZ_decoder * lz_decoder; + enum LZ_Errno lz_errno; + Lzip_header member_header; /* header of current member */ + bool fatal; + bool first_header; /* true until first header is read */ + bool seeking; + }; + +static void LZ_Decoder_init( struct LZ_Decoder * const d ) + { + int i; + d->partial_in_size = 0; + d->partial_out_size = 0; + d->rdec = 0; + d->lz_decoder = 0; + d->lz_errno = LZ_ok; + for( i = 0; i < Lh_size; ++i ) d->member_header[i] = 0; + d->fatal = false; + d->first_header = true; + d->seeking = false; + } + + +static bool verify_encoder( struct LZ_Encoder * const e ) + { + if( !e ) return false; + if( !e->lz_encoder_base || ( !e->lz_encoder && !e->flz_encoder ) || + ( e->lz_encoder && e->flz_encoder ) ) + { e->lz_errno = LZ_bad_argument; return false; } + return true; + } + + +static bool verify_decoder( struct LZ_Decoder * const d ) + { + if( !d ) return false; + if( !d->rdec ) + { d->lz_errno = LZ_bad_argument; return false; } + return true; + } + + +/* ------------------------- Misc Functions ------------------------- */ + +int LZ_api_version( void ) { return LZ_API_VERSION; } + +const char * LZ_version( void ) { return LZ_version_string; } + +const char * LZ_strerror( const enum LZ_Errno lz_errno ) + { + switch( lz_errno ) + { + case LZ_ok : return "ok"; + case LZ_bad_argument : return "Bad argument"; + case LZ_mem_error : return "Not enough memory"; + case LZ_sequence_error: return "Sequence error"; + case LZ_header_error : return "Header error"; + case LZ_unexpected_eof: return "Unexpected EOF"; + case LZ_data_error : return "Data error"; + case LZ_library_error : return "Library error"; + } + return "Invalid error code"; + } + + +int LZ_min_dictionary_bits( void ) { return min_dictionary_bits; } +int LZ_min_dictionary_size( void ) { return min_dictionary_size; } +int LZ_max_dictionary_bits( void ) { return max_dictionary_bits; } +int LZ_max_dictionary_size( void ) { return max_dictionary_size; } +int LZ_min_match_len_limit( void ) { return min_match_len_limit; } +int LZ_max_match_len_limit( void ) { return max_match_len; } + + +/* --------------------- Compression Functions --------------------- */ + +struct LZ_Encoder * LZ_compress_open( const int dictionary_size, + const int match_len_limit, + const unsigned long long member_size ) + { + Lzip_header header; + struct LZ_Encoder * const e = + (struct LZ_Encoder *)malloc( sizeof (struct LZ_Encoder) ); + if( !e ) return 0; + LZ_Encoder_init( e ); + if( !Lh_set_dictionary_size( header, dictionary_size ) || + match_len_limit < min_match_len_limit || + match_len_limit > max_match_len || + member_size < min_dictionary_size ) + e->lz_errno = LZ_bad_argument; + else + { + if( dictionary_size == 65535 && match_len_limit == 16 ) + { + e->flz_encoder = (struct FLZ_encoder *)malloc( sizeof (struct FLZ_encoder) ); + if( e->flz_encoder && FLZe_init( e->flz_encoder, member_size ) ) + { e->lz_encoder_base = &e->flz_encoder->eb; return e; } + free( e->flz_encoder ); e->flz_encoder = 0; + } + else + { + e->lz_encoder = (struct LZ_encoder *)malloc( sizeof (struct LZ_encoder) ); + if( e->lz_encoder && LZe_init( e->lz_encoder, Lh_get_dictionary_size( header ), + match_len_limit, member_size ) ) + { e->lz_encoder_base = &e->lz_encoder->eb; return e; } + free( e->lz_encoder ); e->lz_encoder = 0; + } + e->lz_errno = LZ_mem_error; + } + e->fatal = true; + return e; + } + + +int LZ_compress_close( struct LZ_Encoder * const e ) + { + if( !e ) return -1; + if( e->lz_encoder_base ) + { LZeb_free( e->lz_encoder_base ); + free( e->lz_encoder ); free( e->flz_encoder ); } + free( e ); + return 0; + } + + +int LZ_compress_finish( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) || e->fatal ) return -1; + Mb_finish( &e->lz_encoder_base->mb ); + /* if (open --> write --> finish) use same dictionary size as lzip. */ + /* this does not save any memory. */ + if( Mb_data_position( &e->lz_encoder_base->mb ) == 0 && + Re_member_position( &e->lz_encoder_base->renc ) == Lh_size ) + { + Mb_adjust_dictionary_size( &e->lz_encoder_base->mb ); + Lh_set_dictionary_size( e->lz_encoder_base->renc.header, + e->lz_encoder_base->mb.dictionary_size ); + e->lz_encoder_base->renc.cb.buffer[5] = e->lz_encoder_base->renc.header[5]; + } + return 0; + } + + +int LZ_compress_restart_member( struct LZ_Encoder * const e, + const unsigned long long member_size ) + { + if( !verify_encoder( e ) || e->fatal ) return -1; + if( !LZeb_member_finished( e->lz_encoder_base ) ) + { e->lz_errno = LZ_sequence_error; return -1; } + if( member_size < min_dictionary_size ) + { e->lz_errno = LZ_bad_argument; return -1; } + + e->partial_in_size += Mb_data_position( &e->lz_encoder_base->mb ); + e->partial_out_size += Re_member_position( &e->lz_encoder_base->renc ); + + if( e->lz_encoder ) LZe_reset( e->lz_encoder, member_size ); + else FLZe_reset( e->flz_encoder, member_size ); + e->lz_errno = LZ_ok; + return 0; + } + + +int LZ_compress_sync_flush( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) || e->fatal ) return -1; + if( !e->lz_encoder_base->mb.at_stream_end ) + e->lz_encoder_base->mb.sync_flush_pending = true; + return 0; + } + + +int LZ_compress_read( struct LZ_Encoder * const e, + uint8_t * const buffer, const int size ) + { + if( !verify_encoder( e ) || e->fatal ) return -1; + if( size < 0 ) return 0; + + { struct LZ_encoder_base * const eb = e->lz_encoder_base; + int out_size = Re_read_data( &eb->renc, buffer, size ); + /* minimize number of calls to encode_member */ + if( out_size < size || size == 0 ) + { + if( ( e->flz_encoder && !FLZe_encode_member( e->flz_encoder ) ) || + ( e->lz_encoder && !LZe_encode_member( e->lz_encoder ) ) ) + { e->lz_errno = LZ_library_error; e->fatal = true; return -1; } + if( eb->mb.sync_flush_pending && Mb_available_bytes( &eb->mb ) <= 0 ) + LZeb_try_sync_flush( eb ); + out_size += Re_read_data( &eb->renc, buffer + out_size, size - out_size ); + } + return out_size; } + } + + +int LZ_compress_write( struct LZ_Encoder * const e, + const uint8_t * const buffer, const int size ) + { + if( !verify_encoder( e ) || e->fatal ) return -1; + return Mb_write_data( &e->lz_encoder_base->mb, buffer, size ); + } + + +int LZ_compress_write_size( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) || e->fatal ) return -1; + return Mb_free_bytes( &e->lz_encoder_base->mb ); + } + + +enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const e ) + { + if( !e ) return LZ_bad_argument; + return e->lz_errno; + } + + +int LZ_compress_finished( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) ) return -1; + return ( Mb_data_finished( &e->lz_encoder_base->mb ) && + LZeb_member_finished( e->lz_encoder_base ) ); + } + + +int LZ_compress_member_finished( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) ) return -1; + return LZeb_member_finished( e->lz_encoder_base ); + } + + +unsigned long long LZ_compress_data_position( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) ) return 0; + return Mb_data_position( &e->lz_encoder_base->mb ); + } + + +unsigned long long LZ_compress_member_position( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) ) return 0; + return Re_member_position( &e->lz_encoder_base->renc ); + } + + +unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) ) return 0; + return e->partial_in_size + Mb_data_position( &e->lz_encoder_base->mb ); + } + + +unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const e ) + { + if( !verify_encoder( e ) ) return 0; + return e->partial_out_size + Re_member_position( &e->lz_encoder_base->renc ); + } + + +/* -------------------- Decompression Functions -------------------- */ + +struct LZ_Decoder * LZ_decompress_open( void ) + { + struct LZ_Decoder * const d = + (struct LZ_Decoder *)malloc( sizeof (struct LZ_Decoder) ); + if( !d ) return 0; + LZ_Decoder_init( d ); + + d->rdec = (struct Range_decoder *)malloc( sizeof (struct Range_decoder) ); + if( !d->rdec || !Rd_init( d->rdec ) ) + { + if( d->rdec ) { Rd_free( d->rdec ); free( d->rdec ); d->rdec = 0; } + d->lz_errno = LZ_mem_error; d->fatal = true; + } + return d; + } + + +int LZ_decompress_close( struct LZ_Decoder * const d ) + { + if( !d ) return -1; + if( d->lz_decoder ) + { LZd_free( d->lz_decoder ); free( d->lz_decoder ); } + if( d->rdec ) { Rd_free( d->rdec ); free( d->rdec ); } + free( d ); + return 0; + } + + +int LZ_decompress_finish( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) || d->fatal ) return -1; + if( d->seeking ) + { d->seeking = false; d->partial_in_size += Rd_purge( d->rdec ); } + else Rd_finish( d->rdec ); + return 0; + } + + +int LZ_decompress_reset( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) ) return -1; + if( d->lz_decoder ) + { LZd_free( d->lz_decoder ); free( d->lz_decoder ); d->lz_decoder = 0; } + d->partial_in_size = 0; + d->partial_out_size = 0; + Rd_reset( d->rdec ); + d->lz_errno = LZ_ok; + d->fatal = false; + d->first_header = true; + d->seeking = false; + return 0; + } + + +int LZ_decompress_sync_to_member( struct LZ_Decoder * const d ) + { + unsigned skipped = 0; + if( !verify_decoder( d ) ) return -1; + if( d->lz_decoder ) + { LZd_free( d->lz_decoder ); free( d->lz_decoder ); d->lz_decoder = 0; } + if( Rd_find_header( d->rdec, &skipped ) ) d->seeking = false; + else + { + if( !d->rdec->at_stream_end ) d->seeking = true; + else { d->seeking = false; d->partial_in_size += Rd_purge( d->rdec ); } + } + d->partial_in_size += skipped; + d->lz_errno = LZ_ok; + d->fatal = false; + return 0; + } + + +int LZ_decompress_read( struct LZ_Decoder * const d, + uint8_t * const buffer, const int size ) + { + int result; + if( !verify_decoder( d ) ) return -1; + if( size < 0 ) return 0; + if( d->fatal ) /* don't return error until pending bytes are read */ + { if( d->lz_decoder && !Cb_empty( &d->lz_decoder->cb ) ) goto get_data; + return -1; } + if( d->seeking ) return 0; + + if( d->lz_decoder && LZd_member_finished( d->lz_decoder ) ) + { + d->partial_out_size += LZd_data_position( d->lz_decoder ); + LZd_free( d->lz_decoder ); free( d->lz_decoder ); d->lz_decoder = 0; + } + if( !d->lz_decoder ) + { + int rd; + d->partial_in_size += d->rdec->member_position; + d->rdec->member_position = 0; + if( Rd_available_bytes( d->rdec ) < Lh_size + 5 && + !d->rdec->at_stream_end ) return 0; + if( Rd_finished( d->rdec ) && !d->first_header ) return 0; + rd = Rd_read_data( d->rdec, d->member_header, Lh_size ); + if( rd < Lh_size || Rd_finished( d->rdec ) ) /* End Of File */ + { + if( rd <= 0 || Lh_verify_prefix( d->member_header, rd ) ) + d->lz_errno = LZ_unexpected_eof; + else + d->lz_errno = LZ_header_error; + d->fatal = true; + return -1; + } + if( !Lh_verify_magic( d->member_header ) ) + { + /* unreading the header prevents sync_to_member from skipping a member + if leading garbage is shorter than a full header; "lgLZIP\x01\x0C" */ + if( Rd_unread_data( d->rdec, rd ) ) + { + if( d->first_header || !Lh_verify_corrupt( d->member_header ) ) + d->lz_errno = LZ_header_error; + else + d->lz_errno = LZ_data_error; /* corrupt header */ + } + else + d->lz_errno = LZ_library_error; + d->fatal = true; + return -1; + } + if( !Lh_verify_version( d->member_header ) || + !isvalid_ds( Lh_get_dictionary_size( d->member_header ) ) ) + { + /* Skip a possible "LZIP" leading garbage; "LZIPLZIP\x01\x0C". + Leave member_pos pointing to the first error. */ + if( Rd_unread_data( d->rdec, 1 + !Lh_verify_version( d->member_header ) ) ) + d->lz_errno = LZ_data_error; /* bad version or bad dict size */ + else + d->lz_errno = LZ_library_error; + d->fatal = true; + return -1; + } + d->first_header = false; + if( Rd_available_bytes( d->rdec ) < 5 ) + { + /* set position at EOF */ + d->rdec->member_position += Cb_used_bytes( &d->rdec->cb ); + Cb_reset( &d->rdec->cb ); + d->lz_errno = LZ_unexpected_eof; + d->fatal = true; + return -1; + } + d->lz_decoder = (struct LZ_decoder *)malloc( sizeof (struct LZ_decoder) ); + if( !d->lz_decoder || !LZd_init( d->lz_decoder, d->rdec, + Lh_get_dictionary_size( d->member_header ) ) ) + { /* not enough free memory */ + if( d->lz_decoder ) + { LZd_free( d->lz_decoder ); free( d->lz_decoder ); d->lz_decoder = 0; } + d->lz_errno = LZ_mem_error; + d->fatal = true; + return -1; + } + d->rdec->reload_pending = true; + } + result = LZd_decode_member( d->lz_decoder ); + if( result != 0 ) + { + if( result == 2 ) /* set input position at EOF */ + { d->rdec->member_position += Cb_used_bytes( &d->rdec->cb ); + Cb_reset( &d->rdec->cb ); + d->lz_errno = LZ_unexpected_eof; } + else if( result == 5 ) d->lz_errno = LZ_library_error; + else d->lz_errno = LZ_data_error; + d->fatal = true; + if( Cb_empty( &d->lz_decoder->cb ) ) return -1; + } +get_data: + return Cb_read_data( &d->lz_decoder->cb, buffer, size ); + } + + +int LZ_decompress_write( struct LZ_Decoder * const d, + const uint8_t * const buffer, const int size ) + { + int result; + if( !verify_decoder( d ) || d->fatal ) return -1; + if( size < 0 ) return 0; + + result = Rd_write_data( d->rdec, buffer, size ); + while( d->seeking ) + { + int size2; + unsigned skipped = 0; + if( Rd_find_header( d->rdec, &skipped ) ) d->seeking = false; + d->partial_in_size += skipped; + if( result >= size ) break; + size2 = Rd_write_data( d->rdec, buffer + result, size - result ); + if( size2 > 0 ) result += size2; + else break; + } + return result; + } + + +int LZ_decompress_write_size( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) || d->fatal ) return -1; + return Rd_free_bytes( d->rdec ); + } + + +enum LZ_Errno LZ_decompress_errno( struct LZ_Decoder * const d ) + { + if( !d ) return LZ_bad_argument; + return d->lz_errno; + } + + +int LZ_decompress_finished( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) || d->fatal ) return -1; + return ( Rd_finished( d->rdec ) && + ( !d->lz_decoder || LZd_member_finished( d->lz_decoder ) ) ); + } + + +int LZ_decompress_member_finished( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) || d->fatal ) return -1; + return ( d->lz_decoder && LZd_member_finished( d->lz_decoder ) ); + } + + +int LZ_decompress_member_version( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) ) return -1; + return Lh_version( d->member_header ); + } + + +int LZ_decompress_dictionary_size( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) ) return -1; + return Lh_get_dictionary_size( d->member_header ); + } + + +unsigned LZ_decompress_data_crc( struct LZ_Decoder * const d ) + { + if( verify_decoder( d ) && d->lz_decoder ) + return LZd_crc( d->lz_decoder ); + return 0; + } + + +unsigned long long LZ_decompress_data_position( struct LZ_Decoder * const d ) + { + if( verify_decoder( d ) && d->lz_decoder ) + return LZd_data_position( d->lz_decoder ); + return 0; + } + + +unsigned long long LZ_decompress_member_position( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) ) return 0; + return d->rdec->member_position; + } + + +unsigned long long LZ_decompress_total_in_size( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) ) return 0; + return d->partial_in_size + d->rdec->member_position; + } + + +unsigned long long LZ_decompress_total_out_size( struct LZ_Decoder * const d ) + { + if( !verify_decoder( d ) ) return 0; + if( d->lz_decoder ) + return d->partial_out_size + LZd_data_position( d->lz_decoder ); + return d->partial_out_size; + } @@ -0,0 +1,110 @@ +/* Lzlib - Compression library for the lzip format + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +#ifdef __cplusplus +extern "C" { +#endif + +/* LZ_API_VERSION was first defined in lzlib 1.8 to 1. + Since lzlib 1.12, LZ_API_VERSION is defined as (major * 1000 + minor). */ + +#define LZ_API_VERSION 1013 + +static const char * const LZ_version_string = "1.13"; + +enum LZ_Errno { LZ_ok = 0, LZ_bad_argument, LZ_mem_error, + LZ_sequence_error, LZ_header_error, LZ_unexpected_eof, + LZ_data_error, LZ_library_error }; + + +int LZ_api_version( void ); /* new in 1.12 */ +const char * LZ_version( void ); +const char * LZ_strerror( const enum LZ_Errno lz_errno ); + +int LZ_min_dictionary_bits( void ); +int LZ_min_dictionary_size( void ); +int LZ_max_dictionary_bits( void ); +int LZ_max_dictionary_size( void ); +int LZ_min_match_len_limit( void ); +int LZ_max_match_len_limit( void ); + + +/* --------------------- Compression Functions --------------------- */ + +struct LZ_Encoder; + +struct LZ_Encoder * LZ_compress_open( const int dictionary_size, + const int match_len_limit, + const unsigned long long member_size ); +int LZ_compress_close( struct LZ_Encoder * const encoder ); + +int LZ_compress_finish( struct LZ_Encoder * const encoder ); +int LZ_compress_restart_member( struct LZ_Encoder * const encoder, + const unsigned long long member_size ); +int LZ_compress_sync_flush( struct LZ_Encoder * const encoder ); + +int LZ_compress_read( struct LZ_Encoder * const encoder, + uint8_t * const buffer, const int size ); +int LZ_compress_write( struct LZ_Encoder * const encoder, + const uint8_t * const buffer, const int size ); +int LZ_compress_write_size( struct LZ_Encoder * const encoder ); + +enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const encoder ); +int LZ_compress_finished( struct LZ_Encoder * const encoder ); +int LZ_compress_member_finished( struct LZ_Encoder * const encoder ); + +unsigned long long LZ_compress_data_position( struct LZ_Encoder * const encoder ); +unsigned long long LZ_compress_member_position( struct LZ_Encoder * const encoder ); +unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const encoder ); +unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const encoder ); + + +/* -------------------- Decompression Functions -------------------- */ + +struct LZ_Decoder; + +struct LZ_Decoder * LZ_decompress_open( void ); +int LZ_decompress_close( struct LZ_Decoder * const decoder ); + +int LZ_decompress_finish( struct LZ_Decoder * const decoder ); +int LZ_decompress_reset( struct LZ_Decoder * const decoder ); +int LZ_decompress_sync_to_member( struct LZ_Decoder * const decoder ); + +int LZ_decompress_read( struct LZ_Decoder * const decoder, + uint8_t * const buffer, const int size ); +int LZ_decompress_write( struct LZ_Decoder * const decoder, + const uint8_t * const buffer, const int size ); +int LZ_decompress_write_size( struct LZ_Decoder * const decoder ); + +enum LZ_Errno LZ_decompress_errno( struct LZ_Decoder * const decoder ); +int LZ_decompress_finished( struct LZ_Decoder * const decoder ); +int LZ_decompress_member_finished( struct LZ_Decoder * const decoder ); + +int LZ_decompress_member_version( struct LZ_Decoder * const decoder ); +int LZ_decompress_dictionary_size( struct LZ_Decoder * const decoder ); +unsigned LZ_decompress_data_crc( struct LZ_Decoder * const decoder ); + +unsigned long long LZ_decompress_data_position( struct LZ_Decoder * const decoder ); +unsigned long long LZ_decompress_member_position( struct LZ_Decoder * const decoder ); +unsigned long long LZ_decompress_total_in_size( struct LZ_Decoder * const decoder ); +unsigned long long LZ_decompress_total_out_size( struct LZ_Decoder * const decoder ); + +#ifdef __cplusplus +} +#endif diff --git a/minilzip.c b/minilzip.c new file mode 100644 index 0000000..f9313b2 --- /dev/null +++ b/minilzip.c @@ -0,0 +1,1290 @@ +/* Minilzip - Test program for the library lzlib + Copyright (C) 2009-2022 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ +/* + Exit status: 0 for a normal exit, 1 for environmental problems + (file not found, invalid flags, I/O errors, etc), 2 to indicate a + corrupt or invalid input file, 3 for an internal consistency error + (e.g., bug) which caused minilzip to panic. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <ctype.h> +#include <errno.h> +#include <fcntl.h> +#include <limits.h> +#include <signal.h> +#include <stdbool.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <utime.h> +#include <sys/stat.h> +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ +#include <io.h> +#if defined __MSVCRT__ +#define fchmod(x,y) 0 +#define fchown(x,y,z) 0 +#define strtoull strtoul +#define SIGHUP SIGTERM +#define S_ISSOCK(x) 0 +#ifndef S_IRGRP +#define S_IRGRP 0 +#define S_IWGRP 0 +#define S_IROTH 0 +#define S_IWOTH 0 +#endif +#endif +#if defined __DJGPP__ +#define S_ISSOCK(x) 0 +#define S_ISVTX 0 +#endif +#endif + +#include "carg_parser.h" +#include "lzlib.h" + +#ifndef O_BINARY +#define O_BINARY 0 +#endif + +#if CHAR_BIT != 8 +#error "Environments where CHAR_BIT != 8 are not supported." +#endif + +#if ( defined SIZE_MAX && SIZE_MAX < UINT_MAX ) || \ + ( defined SSIZE_MAX && SSIZE_MAX < INT_MAX ) +#error "Environments where 'size_t' is narrower than 'int' are not supported." +#endif + +#ifndef max + #define max(x,y) ((x) >= (y) ? (x) : (y)) +#endif +#ifndef min + #define min(x,y) ((x) <= (y) ? (x) : (y)) +#endif + +static void cleanup_and_fail( const int retval ); +static void show_error( const char * const msg, const int errcode, + const bool help ); +static void show_file_error( const char * const filename, + const char * const msg, const int errcode ); +static void internal_error( const char * const msg ); +static const char * const mem_msg = "Not enough memory."; + +int verbosity = 0; + +static const char * const program_name = "minilzip"; +static const char * const program_year = "2022"; +static const char * invocation_name = "minilzip"; /* default value */ + +static const struct { const char * from; const char * to; } known_extensions[] = { + { ".lz", "" }, + { ".tlz", ".tar" }, + { 0, 0 } }; + +struct Lzma_options + { + int dictionary_size; /* 4 KiB .. 512 MiB */ + int match_len_limit; /* 5 .. 273 */ + }; + +enum Mode { m_compress, m_decompress, m_test }; + +/* Variables used in signal handler context. + They are not declared volatile because the handler never returns. */ +static char * output_filename = 0; +static int outfd = -1; +static bool delete_output_on_interrupt = false; + + +static void show_help( void ) + { + printf( "Minilzip is a test program for the compression library lzlib, fully\n" + "compatible with lzip 1.4 or newer.\n" + "\nLzip is a lossless data compressor with a user interface similar to the one\n" + "of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov\n" + "chain-Algorithm' (LZMA) stream format and provides a 3 factor integrity\n" + "checking to maximize interoperability and optimize safety. Lzip can compress\n" + "about as fast as gzip (lzip -0) or compress most files more than bzip2\n" + "(lzip -9). Decompression speed is intermediate between gzip and bzip2.\n" + "Lzip is better than gzip and bzip2 from a data recovery perspective. Lzip\n" + "has been designed, written, and tested with great care to replace gzip and\n" + "bzip2 as the standard general-purpose compressed format for unix-like\n" + "systems.\n" + "\nUsage: %s [options] [files]\n", invocation_name ); + printf( "\nOptions:\n" + " -h, --help display this help and exit\n" + " -V, --version output version information and exit\n" + " -a, --trailing-error exit with error status if trailing data\n" + " -b, --member-size=<bytes> set member size limit in bytes\n" + " -c, --stdout write to standard output, keep input files\n" + " -d, --decompress decompress\n" + " -f, --force overwrite existing output files\n" + " -F, --recompress force re-compression of compressed files\n" + " -k, --keep keep (don't delete) input files\n" + " -m, --match-length=<bytes> set match length limit in bytes [36]\n" + " -o, --output=<file> write to <file>, keep input files\n" + " -q, --quiet suppress all messages\n" + " -s, --dictionary-size=<bytes> set dictionary size limit in bytes [8 MiB]\n" + " -S, --volume-size=<bytes> set volume size limit in bytes\n" + " -t, --test test compressed file integrity\n" + " -v, --verbose be verbose (a 2nd -v gives more)\n" + " -0 .. -9 set compression level [default 6]\n" + " --fast alias for -0\n" + " --best alias for -9\n" + " --loose-trailing allow trailing data seeming corrupt header\n" + " --check-lib compare version of lzlib.h with liblz.{a,so}\n" + "\nIf no file names are given, or if a file is '-', minilzip compresses or\n" + "decompresses from standard input to standard output.\n" + "Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,\n" + "Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...\n" + "Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12\n" + "to 2^29 bytes.\n" + "\nThe bidimensional parameter space of LZMA can't be mapped to a linear\n" + "scale optimal for all files. If your files are large, very repetitive,\n" + "etc, you may need to use the options --dictionary-size and --match-length\n" + "directly to achieve optimal performance.\n" + "\nTo extract all the files from archive 'foo.tar.lz', use the commands\n" + "'tar -xf foo.tar.lz' or 'minilzip -cd foo.tar.lz | tar -xf -'.\n" + "\nExit status: 0 for a normal exit, 1 for environmental problems (file\n" + "not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or\n" + "invalid input file, 3 for an internal consistency error (e.g., bug) which\n" + "caused minilzip to panic.\n" + "\nThe ideas embodied in lzlib are due to (at least) the following people:\n" + "Abraham Lempel and Jacob Ziv (for the LZ algorithm), Andrey Markov (for the\n" + "definition of Markov chains), G.N.N. Martin (for the definition of range\n" + "encoding), Igor Pavlov (for putting all the above together in LZMA), and\n" + "Julian Seward (for bzip2's CLI).\n" + "\nReport bugs to lzip-bug@nongnu.org\n" + "Lzlib home page: http://www.nongnu.org/lzip/lzlib.html\n" ); + } + + +static void show_version( void ) + { + printf( "%s %s\n", program_name, PROGVERSION ); + printf( "Copyright (C) %s Antonio Diaz Diaz.\n", program_year ); + printf( "Using lzlib %s\n", LZ_version() ); + printf( "License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>\n" + "This is free software: you are free to change and redistribute it.\n" + "There is NO WARRANTY, to the extent permitted by law.\n" ); + } + + +static inline void set_retval( int * retval, const int new_val ) + { if( *retval < new_val ) *retval = new_val; } + + +static int check_lzlib_ver() /* <major>.<minor> or <major>.<minor>[a-z.-]* */ + { +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + const unsigned char * p = (unsigned char *)LZ_version_string; + unsigned major = 0, minor = 0; + while( major < 100000 && isdigit( *p ) ) + { major *= 10; major += *p - '0'; ++p; } + if( *p == '.' ) ++p; + else +out: { show_error( "Invalid LZ_version_string in lzlib.h", 0, false ); return 2; } + while( minor < 100 && isdigit( *p ) ) + { minor *= 10; minor += *p - '0'; ++p; } + if( *p && *p != '-' && *p != '.' && !islower( *p ) ) goto out; + const unsigned version = major * 1000 + minor; + if( LZ_API_VERSION != version ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: Version mismatch in lzlib.h: " + "LZ_API_VERSION = %u, should be %u.\n", + program_name, LZ_API_VERSION, version ); + return 2; + } +#endif + return 0; + } + + +static int check_lib() + { + int retval = check_lzlib_ver(); + if( strcmp( LZ_version_string, LZ_version() ) != 0 ) + { set_retval( &retval, 1 ); + if( verbosity >= 0 ) + printf( "warning: LZ_version_string != LZ_version() (%s vs %s)\n", + LZ_version_string, LZ_version() ); } +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + if( LZ_API_VERSION != LZ_api_version() ) + { set_retval( &retval, 1 ); + if( verbosity >= 0 ) + printf( "warning: LZ_API_VERSION != LZ_api_version() (%u vs %u)\n", + LZ_API_VERSION, LZ_api_version() ); } +#endif + if( verbosity >= 1 ) + { + printf( "Using lzlib %s\n", LZ_version() ); +#if !defined LZ_API_VERSION + fputs( "LZ_API_VERSION is not defined.\n", stdout ); +#elif LZ_API_VERSION >= 1012 + printf( "Using LZ_API_VERSION = %u\n", LZ_api_version() ); +#else + printf( "Compiled with LZ_API_VERSION = %u. " + "Using an unknown LZ_API_VERSION\n", LZ_API_VERSION ); +#endif + } + return retval; + } + + +/* assure at least a minimum size for buffer 'buf' */ +static void * resize_buffer( void * buf, const unsigned min_size ) + { + if( buf ) buf = realloc( buf, min_size ); + else buf = malloc( min_size ); + if( !buf ) { show_error( mem_msg, 0, false ); cleanup_and_fail( 1 ); } + return buf; + } + + +struct Pretty_print + { + const char * name; + char * padded_name; + const char * stdin_name; + unsigned longest_name; + bool first_post; + }; + +static void Pp_init( struct Pretty_print * const pp, + const char * const filenames[], const int num_filenames ) + { + pp->name = 0; + pp->padded_name = 0; + pp->stdin_name = "(stdin)"; + pp->longest_name = 0; + pp->first_post = false; + + if( verbosity <= 0 ) return; + const unsigned stdin_name_len = strlen( pp->stdin_name ); + int i; + for( i = 0; i < num_filenames; ++i ) + { + const char * const s = filenames[i]; + const unsigned len = (strcmp( s, "-" ) == 0) ? stdin_name_len : strlen( s ); + if( pp->longest_name < len ) pp->longest_name = len; + } + if( pp->longest_name == 0 ) pp->longest_name = stdin_name_len; + } + +static void Pp_set_name( struct Pretty_print * const pp, + const char * const filename ) + { + unsigned name_len, padded_name_len, i = 0; + + if( filename && filename[0] && strcmp( filename, "-" ) != 0 ) + pp->name = filename; + else pp->name = pp->stdin_name; + name_len = strlen( pp->name ); + padded_name_len = max( name_len, pp->longest_name ) + 4; + pp->padded_name = resize_buffer( pp->padded_name, padded_name_len + 1 ); + while( i < 2 ) pp->padded_name[i++] = ' '; + while( i < name_len + 2 ) { pp->padded_name[i] = pp->name[i-2]; ++i; } + pp->padded_name[i++] = ':'; + while( i < padded_name_len ) pp->padded_name[i++] = ' '; + pp->padded_name[i] = 0; + pp->first_post = true; + } + +static void Pp_reset( struct Pretty_print * const pp ) + { if( pp->name && pp->name[0] ) pp->first_post = true; } + +static void Pp_show_msg( struct Pretty_print * const pp, const char * const msg ) + { + if( verbosity < 0 ) return; + if( pp->first_post ) + { + pp->first_post = false; + fputs( pp->padded_name, stderr ); + if( !msg ) fflush( stderr ); + } + if( msg ) fprintf( stderr, "%s\n", msg ); + } + + +static void show_header( const unsigned dictionary_size ) + { + enum { factor = 1024 }; + const char * const prefix[8] = + { "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi" }; + const char * p = ""; + const char * np = " "; + unsigned num = dictionary_size; + bool exact = ( num % factor == 0 ); + + int i; for( i = 0; i < 8 && ( num > 9999 || ( exact && num >= factor ) ); ++i ) + { num /= factor; if( num % factor != 0 ) exact = false; + p = prefix[i]; np = ""; } + fprintf( stderr, "dict %s%4u %sB, ", np, num, p ); + } + + +/* separate large numbers >= 100_000 in groups of 3 digits using '_' */ +static const char * format_num3( unsigned long long num ) + { + const char * const si_prefix = "kMGTPEZY"; + const char * const binary_prefix = "KMGTPEZY"; + enum { buffers = 8, bufsize = 4 * sizeof (long long) }; + static char buffer[buffers][bufsize]; /* circle of static buffers for printf */ + static int current = 0; + int i; + char * const buf = buffer[current++]; current %= buffers; + char * p = buf + bufsize - 1; /* fill the buffer backwards */ + *p = 0; /* terminator */ + if( num > 1024 ) + { + char prefix = 0; /* try binary first, then si */ + for( i = 0; i < 8 && num >= 1024 && num % 1024 == 0; ++i ) + { num /= 1024; prefix = binary_prefix[i]; } + if( prefix ) *(--p) = 'i'; + else + for( i = 0; i < 8 && num >= 1000 && num % 1000 == 0; ++i ) + { num /= 1000; prefix = si_prefix[i]; } + if( prefix ) *(--p) = prefix; + } + const bool split = num >= 100000; + + for( i = 0; ; ) + { + *(--p) = num % 10 + '0'; num /= 10; if( num == 0 ) break; + if( split && ++i >= 3 ) { i = 0; *(--p) = '_'; } + } + return p; + } + + +static unsigned long long getnum( const char * const arg, + const char * const option_name, + const unsigned long long llimit, + const unsigned long long ulimit ) + { + char * tail; + errno = 0; + unsigned long long result = strtoull( arg, &tail, 0 ); + if( tail == arg ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: Bad or missing numerical argument in " + "option '%s'.\n", program_name, option_name ); + exit( 1 ); + } + + if( !errno && tail[0] ) + { + const unsigned factor = ( tail[1] == 'i' ) ? 1024 : 1000; + int exponent = 0; /* 0 = bad multiplier */ + int i; + switch( tail[0] ) + { + case 'Y': exponent = 8; break; + case 'Z': exponent = 7; break; + case 'E': exponent = 6; break; + case 'P': exponent = 5; break; + case 'T': exponent = 4; break; + case 'G': exponent = 3; break; + case 'M': exponent = 2; break; + case 'K': if( factor == 1024 ) exponent = 1; break; + case 'k': if( factor == 1000 ) exponent = 1; break; + } + if( exponent <= 0 ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: Bad multiplier in numerical argument of " + "option '%s'.\n", program_name, option_name ); + exit( 1 ); + } + for( i = 0; i < exponent; ++i ) + { + if( ulimit / factor >= result ) result *= factor; + else { errno = ERANGE; break; } + } + } + if( !errno && ( result < llimit || result > ulimit ) ) errno = ERANGE; + if( errno ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: Numerical argument out of limits [%s,%s] " + "in option '%s'.\n", program_name, format_num3( llimit ), + format_num3( ulimit ), option_name ); + exit( 1 ); + } + return result; + } + + +static int get_dict_size( const char * const arg, const char * const option_name ) + { + char * tail; + const long bits = strtol( arg, &tail, 0 ); + if( bits >= LZ_min_dictionary_bits() && + bits <= LZ_max_dictionary_bits() && *tail == 0 ) + return 1 << bits; + int dictionary_size = getnum( arg, option_name, LZ_min_dictionary_size(), + LZ_max_dictionary_size() ); + if( dictionary_size == 65535 ) ++dictionary_size; /* no fast encoder */ + return dictionary_size; + } + + +static void set_mode( enum Mode * const program_modep, const enum Mode new_mode ) + { + if( *program_modep != m_compress && *program_modep != new_mode ) + { + show_error( "Only one operation can be specified.", 0, true ); + exit( 1 ); + } + *program_modep = new_mode; + } + + +static int extension_index( const char * const name ) + { + int eindex; + for( eindex = 0; known_extensions[eindex].from; ++eindex ) + { + const char * const ext = known_extensions[eindex].from; + const unsigned name_len = strlen( name ); + const unsigned ext_len = strlen( ext ); + if( name_len > ext_len && + strncmp( name + name_len - ext_len, ext, ext_len ) == 0 ) + return eindex; + } + return -1; + } + + +static void set_c_outname( const char * const name, const bool force_ext, + const bool multifile ) + { + output_filename = resize_buffer( output_filename, strlen( name ) + 5 + + strlen( known_extensions[0].from ) + 1 ); + strcpy( output_filename, name ); + if( multifile ) strcat( output_filename, "00001" ); + if( force_ext || multifile ) + strcat( output_filename, known_extensions[0].from ); + } + + +static void set_d_outname( const char * const name, const int eindex ) + { + const unsigned name_len = strlen( name ); + if( eindex >= 0 ) + { + const char * const from = known_extensions[eindex].from; + const unsigned from_len = strlen( from ); + if( name_len > from_len ) + { + output_filename = resize_buffer( output_filename, name_len + + strlen( known_extensions[eindex].to ) + 1 ); + strcpy( output_filename, name ); + strcpy( output_filename + name_len - from_len, known_extensions[eindex].to ); + return; + } + } + output_filename = resize_buffer( output_filename, name_len + 4 + 1 ); + strcpy( output_filename, name ); + strcat( output_filename, ".out" ); + if( verbosity >= 1 ) + fprintf( stderr, "%s: Can't guess original name for '%s' -- using '%s'\n", + program_name, name, output_filename ); + } + + +static int open_instream( const char * const name, struct stat * const in_statsp, + const enum Mode program_mode, const int eindex, + const bool one_to_one, const bool recompress ) + { + if( program_mode == m_compress && !recompress && eindex >= 0 ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: Input file '%s' already has '%s' suffix.\n", + program_name, name, known_extensions[eindex].from ); + return -1; + } + int infd = open( name, O_RDONLY | O_BINARY ); + if( infd < 0 ) + show_file_error( name, "Can't open input file", errno ); + else + { + const int i = fstat( infd, in_statsp ); + const mode_t mode = in_statsp->st_mode; + const bool can_read = ( i == 0 && + ( S_ISBLK( mode ) || S_ISCHR( mode ) || + S_ISFIFO( mode ) || S_ISSOCK( mode ) ) ); + if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || one_to_one ) ) ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: Input file '%s' is not a regular file%s.\n", + program_name, name, ( can_read && one_to_one ) ? + ",\n and neither '-c' nor '-o' were specified" : "" ); + close( infd ); + infd = -1; + } + } + return infd; + } + + +static bool open_outstream( const bool force, const bool protect ) + { + const mode_t usr_rw = S_IRUSR | S_IWUSR; + const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; + const mode_t outfd_mode = protect ? usr_rw : all_rw; + int flags = O_CREAT | O_WRONLY | O_BINARY; + if( force ) flags |= O_TRUNC; else flags |= O_EXCL; + + outfd = open( output_filename, flags, outfd_mode ); + if( outfd >= 0 ) delete_output_on_interrupt = true; + else if( verbosity >= 0 ) + { + if( errno == EEXIST ) + fprintf( stderr, "%s: Output file '%s' already exists, skipping.\n", + program_name, output_filename ); + else + fprintf( stderr, "%s: Can't create output file '%s': %s\n", + program_name, output_filename, strerror( errno ) ); + } + return ( outfd >= 0 ); + } + + +static void set_signals( void (*action)(int) ) + { + signal( SIGHUP, action ); + signal( SIGINT, action ); + signal( SIGTERM, action ); + } + + +static void cleanup_and_fail( const int retval ) + { + set_signals( SIG_IGN ); /* ignore signals */ + if( delete_output_on_interrupt ) + { + delete_output_on_interrupt = false; + if( verbosity >= 0 ) + fprintf( stderr, "%s: Deleting output file '%s', if it exists.\n", + program_name, output_filename ); + if( outfd >= 0 ) { close( outfd ); outfd = -1; } + if( remove( output_filename ) != 0 && errno != ENOENT ) + show_error( "WARNING: deletion of output file (apparently) failed.", 0, false ); + } + exit( retval ); + } + + +static void signal_handler( int sig ) + { + if( sig ) {} /* keep compiler happy */ + show_error( "Control-C or similar caught, quitting.", 0, false ); + cleanup_and_fail( 1 ); + } + + +static bool check_tty_in( const char * const input_filename, const int infd, + const enum Mode program_mode, int * const retval ) + { + if( ( program_mode == m_decompress || program_mode == m_test ) && + isatty( infd ) ) /* for example /dev/tty */ + { show_file_error( input_filename, + "I won't read compressed data from a terminal.", 0 ); + close( infd ); set_retval( retval, 2 ); + if( program_mode != m_test ) cleanup_and_fail( *retval ); + return false; } + return true; + } + +static bool check_tty_out( const enum Mode program_mode ) + { + if( program_mode == m_compress && isatty( outfd ) ) + { show_file_error( output_filename[0] ? + output_filename : "(stdout)", + "I won't write compressed data to a terminal.", 0 ); + return false; } + return true; + } + + +/* Set permissions, owner, and times. */ +static void close_and_set_permissions( const struct stat * const in_statsp ) + { + bool warning = false; + if( in_statsp ) + { + const mode_t mode = in_statsp->st_mode; + /* fchown will in many cases return with EPERM, which can be safely ignored. */ + if( fchown( outfd, in_statsp->st_uid, in_statsp->st_gid ) == 0 ) + { if( fchmod( outfd, mode ) != 0 ) warning = true; } + else + if( errno != EPERM || + fchmod( outfd, mode & ~( S_ISUID | S_ISGID | S_ISVTX ) ) != 0 ) + warning = true; + } + if( close( outfd ) != 0 ) + { + show_error( "Error closing output file", errno, false ); + cleanup_and_fail( 1 ); + } + outfd = -1; + delete_output_on_interrupt = false; + if( in_statsp ) + { + struct utimbuf t; + t.actime = in_statsp->st_atime; + t.modtime = in_statsp->st_mtime; + if( utime( output_filename, &t ) != 0 ) warning = true; + } + if( warning && verbosity >= 1 ) + show_error( "Can't change output file attributes.", 0, false ); + } + + +/* Return the number of bytes really read. + If (value returned < size) and (errno == 0), means EOF was reached. +*/ +static int readblock( const int fd, uint8_t * const buf, const int size ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = read( fd, buf + sz, size - sz ); + if( n > 0 ) sz += n; + else if( n == 0 ) break; /* EOF */ + else if( errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +/* Return the number of bytes really written. + If (value returned < size), it is always an error. +*/ +static int writeblock( const int fd, const uint8_t * const buf, const int size ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = write( fd, buf + sz, size - sz ); + if( n > 0 ) sz += n; + else if( n < 0 && errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +static bool next_filename( void ) + { + const unsigned name_len = strlen( output_filename ); + const unsigned ext_len = strlen( known_extensions[0].from ); + int i, j; + if( name_len >= ext_len + 5 ) /* "*00001.lz" */ + for( i = name_len - ext_len - 1, j = 0; j < 5; --i, ++j ) + { + if( output_filename[i] < '9' ) { ++output_filename[i]; return true; } + else output_filename[i] = '0'; + } + return false; + } + + +static int do_compress( struct LZ_Encoder * const encoder, + const unsigned long long member_size, + const unsigned long long volume_size, const int infd, + struct Pretty_print * const pp, + const struct stat * const in_statsp ) + { + unsigned long long partial_volume_size = 0; + enum { buffer_size = 65536 }; + uint8_t buffer[buffer_size]; /* read/write buffer */ + if( verbosity >= 1 ) Pp_show_msg( pp, 0 ); + + while( true ) + { + int in_size = 0; + while( LZ_compress_write_size( encoder ) > 0 ) + { + const int size = min( LZ_compress_write_size( encoder ), buffer_size ); + const int rd = readblock( infd, buffer, size ); + if( rd != size && errno ) + { + Pp_show_msg( pp, 0 ); show_error( "Read error", errno, false ); + return 1; + } + if( rd > 0 && rd != LZ_compress_write( encoder, buffer, rd ) ) + internal_error( "library error (LZ_compress_write)." ); + if( rd < size ) LZ_compress_finish( encoder ); +/* else LZ_compress_sync_flush( encoder ); */ + in_size += rd; + } + const int out_size = LZ_compress_read( encoder, buffer, buffer_size ); + if( out_size < 0 ) + { + Pp_show_msg( pp, 0 ); + if( verbosity >= 0 ) + fprintf( stderr, "%s: LZ_compress_read error: %s\n", + program_name, LZ_strerror( LZ_compress_errno( encoder ) ) ); + return 1; + } + else if( out_size > 0 ) + { + const int wr = writeblock( outfd, buffer, out_size ); + if( wr != out_size ) + { + Pp_show_msg( pp, 0 ); show_error( "Write error", errno, false ); + return 1; + } + } + else if( in_size == 0 ) + internal_error( "library error (LZ_compress_read)." ); + if( LZ_compress_member_finished( encoder ) ) + { + unsigned long long size; + if( LZ_compress_finished( encoder ) == 1 ) break; + if( volume_size > 0 ) + { + partial_volume_size += LZ_compress_member_position( encoder ); + if( partial_volume_size >= volume_size - LZ_min_dictionary_size() ) + { + partial_volume_size = 0; + if( delete_output_on_interrupt ) + { + close_and_set_permissions( in_statsp ); + if( !next_filename() ) + { Pp_show_msg( pp, "Too many volume files." ); return 1; } + if( !open_outstream( true, in_statsp ) ) return 1; + } + } + size = min( member_size, volume_size - partial_volume_size ); + } + else + size = member_size; + if( LZ_compress_restart_member( encoder, size ) < 0 ) + { + Pp_show_msg( pp, 0 ); + if( verbosity >= 0 ) + fprintf( stderr, "%s: LZ_compress_restart_member error: %s\n", + program_name, LZ_strerror( LZ_compress_errno( encoder ) ) ); + return 1; + } + } + } + + if( verbosity >= 1 ) + { + const unsigned long long in_size = LZ_compress_total_in_size( encoder ); + const unsigned long long out_size = LZ_compress_total_out_size( encoder ); + if( in_size == 0 || out_size == 0 ) + fputs( " no data compressed.\n", stderr ); + else + fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved, " + "%llu in, %llu out.\n", + (double)in_size / out_size, + ( 100.0 * out_size ) / in_size, + 100.0 - ( ( 100.0 * out_size ) / in_size ), + in_size, out_size ); + } + return 0; + } + + +static int compress( const unsigned long long member_size, + const unsigned long long volume_size, const int infd, + const struct Lzma_options * const encoder_options, + struct Pretty_print * const pp, + const struct stat * const in_statsp ) + { + struct LZ_Encoder * const encoder = + LZ_compress_open( encoder_options->dictionary_size, + encoder_options->match_len_limit, ( volume_size > 0 ) ? + min( member_size, volume_size ) : member_size ); + int retval; + + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { + if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error ) + Pp_show_msg( pp, "Not enough memory. Try a smaller dictionary size." ); + else + internal_error( "invalid argument to encoder." ); + retval = 1; + } + else retval = do_compress( encoder, member_size, volume_size, + infd, pp, in_statsp ); + LZ_compress_close( encoder ); + return retval; + } + + +static int do_decompress( struct LZ_Decoder * const decoder, const int infd, + struct Pretty_print * const pp, const bool ignore_trailing, + const bool loose_trailing, const bool testing ) + { + enum { buffer_size = 65536 }; + uint8_t buffer[buffer_size]; /* read/write buffer */ + unsigned long long total_in = 0; /* to detect library stall */ + bool first_member; + + for( first_member = true; ; ) + { + const int max_in_size = + min( LZ_decompress_write_size( decoder ), buffer_size ); + int in_size = 0, out_size = 0; + if( max_in_size > 0 ) + { + in_size = readblock( infd, buffer, max_in_size ); + if( in_size != max_in_size && errno ) + { + Pp_show_msg( pp, 0 ); show_error( "Read error", errno, false ); + return 1; + } + if( in_size > 0 && in_size != LZ_decompress_write( decoder, buffer, in_size ) ) + internal_error( "library error (LZ_decompress_write)." ); + if( in_size < max_in_size ) LZ_decompress_finish( decoder ); + } + while( true ) + { + const int rd = + LZ_decompress_read( decoder, (outfd >= 0) ? buffer : 0, buffer_size ); + if( rd > 0 ) + { + out_size += rd; + if( outfd >= 0 ) + { + const int wr = writeblock( outfd, buffer, rd ); + if( wr != rd ) + { + Pp_show_msg( pp, 0 ); show_error( "Write error", errno, false ); + return 1; + } + } + } + else if( rd < 0 ) { out_size = rd; break; } + if( LZ_decompress_member_finished( decoder ) == 1 ) + { + if( verbosity >= 1 ) + { + const unsigned long long data_size = LZ_decompress_data_position( decoder ); + const unsigned long long member_size = LZ_decompress_member_position( decoder ); + if( verbosity >= 2 || ( verbosity == 1 && first_member ) ) + Pp_show_msg( pp, 0 ); + if( verbosity >= 2 ) + { + if( verbosity >= 4 ) + show_header( LZ_decompress_dictionary_size( decoder ) ); + if( data_size == 0 || member_size == 0 ) + fputs( "no data compressed. ", stderr ); + else + fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ", + (double)data_size / member_size, + ( 100.0 * member_size ) / data_size, + 100.0 - ( ( 100.0 * member_size ) / data_size ) ); + if( verbosity >= 4 ) + fprintf( stderr, "CRC %08X, ", LZ_decompress_data_crc( decoder ) ); + if( verbosity >= 3 ) + fprintf( stderr, "%9llu out, %8llu in. ", data_size, member_size ); + fputs( testing ? "ok\n" : "done\n", stderr ); Pp_reset( pp ); + } + } + first_member = false; /* member decompressed successfully */ + } + if( rd <= 0 ) break; + } + if( out_size < 0 || ( first_member && out_size == 0 ) ) + { + const unsigned long long member_pos = LZ_decompress_member_position( decoder ); + const enum LZ_Errno lz_errno = LZ_decompress_errno( decoder ); + if( lz_errno == LZ_library_error ) + internal_error( "library error (LZ_decompress_read)." ); + if( member_pos <= 6 ) + { + if( lz_errno == LZ_unexpected_eof ) + { + if( first_member ) + show_file_error( pp->name, "File ends unexpectedly at member header.", 0 ); + else + Pp_show_msg( pp, "Truncated header in multimember file." ); + return 2; + } + else if( lz_errno == LZ_data_error ) + { + if( member_pos == 4 ) + { if( verbosity >= 0 ) + { Pp_show_msg( pp, 0 ); + fprintf( stderr, "Version %d member format not supported.\n", + LZ_decompress_member_version( decoder ) ); } } + else if( member_pos == 5 ) + Pp_show_msg( pp, "Invalid dictionary size in member header." ); + else if( first_member ) /* for lzlib older than 1.10 */ + Pp_show_msg( pp, "Bad version or dictionary size in member header." ); + else if( !loose_trailing ) + Pp_show_msg( pp, "Corrupt header in multimember file." ); + else if( !ignore_trailing ) + Pp_show_msg( pp, "Trailing data not allowed." ); + else break; /* trailing data */ + return 2; + } + } + if( lz_errno == LZ_header_error ) + { + if( first_member ) + show_file_error( pp->name, + "Bad magic number (file not in lzip format).", 0 ); + else if( !ignore_trailing ) + Pp_show_msg( pp, "Trailing data not allowed." ); + else break; /* trailing data */ + return 2; + } + if( lz_errno == LZ_mem_error ) { Pp_show_msg( pp, mem_msg ); return 1; } + if( verbosity >= 0 ) + { + Pp_show_msg( pp, 0 ); + fprintf( stderr, "%s at pos %llu\n", ( lz_errno == LZ_unexpected_eof ) ? + "File ends unexpectedly" : "Decoder error", + LZ_decompress_total_in_size( decoder ) ); + } + return 2; + } + if( LZ_decompress_finished( decoder ) == 1 ) break; + if( in_size == 0 && out_size == 0 ) + { + const unsigned long long size = LZ_decompress_total_in_size( decoder ); + if( total_in == size ) internal_error( "library error (stalled)." ); + total_in = size; + } + } + if( verbosity == 1 ) fputs( testing ? "ok\n" : "done\n", stderr ); + return 0; + } + + +static int decompress( const int infd, struct Pretty_print * const pp, + const bool ignore_trailing, + const bool loose_trailing, const bool testing ) + { + struct LZ_Decoder * const decoder = LZ_decompress_open(); + int retval; + + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { Pp_show_msg( pp, mem_msg ); retval = 1; } + else retval = do_decompress( decoder, infd, pp, ignore_trailing, + loose_trailing, testing ); + LZ_decompress_close( decoder ); + return retval; + } + + +static void show_error( const char * const msg, const int errcode, + const bool help ) + { + if( verbosity < 0 ) return; + if( msg && msg[0] ) + fprintf( stderr, "%s: %s%s%s\n", program_name, msg, + ( errcode > 0 ) ? ": " : "", + ( errcode > 0 ) ? strerror( errcode ) : "" ); + if( help ) + fprintf( stderr, "Try '%s --help' for more information.\n", + invocation_name ); + } + + +static void show_file_error( const char * const filename, + const char * const msg, const int errcode ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: %s: %s%s%s\n", program_name, filename, msg, + ( errcode > 0 ) ? ": " : "", + ( errcode > 0 ) ? strerror( errcode ) : "" ); + } + + +static void internal_error( const char * const msg ) + { + if( verbosity >= 0 ) + fprintf( stderr, "%s: internal error: %s\n", program_name, msg ); + exit( 3 ); + } + + +int main( const int argc, const char * const argv[] ) + { + /* Mapping from gzip/bzip2 style 1..9 compression modes + to the corresponding LZMA compression modes. */ + const struct Lzma_options option_mapping[] = + { + { 65535, 16 }, /* -0 (65535,16 chooses fast encoder) */ + { 1 << 20, 5 }, /* -1 */ + { 3 << 19, 6 }, /* -2 */ + { 1 << 21, 8 }, /* -3 */ + { 3 << 20, 12 }, /* -4 */ + { 1 << 22, 20 }, /* -5 */ + { 1 << 23, 36 }, /* -6 */ + { 1 << 24, 68 }, /* -7 */ + { 3 << 23, 132 }, /* -8 */ + { 1 << 25, 273 } }; /* -9 */ + struct Lzma_options encoder_options = option_mapping[6]; /* default = "-6" */ + const unsigned long long max_member_size = 0x0008000000000000ULL; /* 2 PiB */ + const unsigned long long max_volume_size = 0x4000000000000000ULL; /* 4 EiB */ + unsigned long long member_size = max_member_size; + unsigned long long volume_size = 0; + const char * default_output_filename = ""; + enum Mode program_mode = m_compress; + int i; + bool force = false; + bool ignore_trailing = true; + bool keep_input_files = false; + bool loose_trailing = false; + bool recompress = false; + bool to_stdout = false; + if( argc > 0 ) invocation_name = argv[0]; + + enum { opt_chk = 256, opt_lt }; + const struct ap_Option options[] = + { + { '0', "fast", ap_no }, + { '1', 0, ap_no }, + { '2', 0, ap_no }, + { '3', 0, ap_no }, + { '4', 0, ap_no }, + { '5', 0, ap_no }, + { '6', 0, ap_no }, + { '7', 0, ap_no }, + { '8', 0, ap_no }, + { '9', "best", ap_no }, + { 'a', "trailing-error", ap_no }, + { 'b', "member-size", ap_yes }, + { 'c', "stdout", ap_no }, + { 'd', "decompress", ap_no }, + { 'f', "force", ap_no }, + { 'F', "recompress", ap_no }, + { 'h', "help", ap_no }, + { 'k', "keep", ap_no }, + { 'm', "match-length", ap_yes }, + { 'n', "threads", ap_yes }, + { 'o', "output", ap_yes }, + { 'q', "quiet", ap_no }, + { 's', "dictionary-size", ap_yes }, + { 'S', "volume-size", ap_yes }, + { 't', "test", ap_no }, + { 'v', "verbose", ap_no }, + { 'V', "version", ap_no }, + { opt_chk, "check-lib", ap_no }, + { opt_lt, "loose-trailing", ap_no }, + { 0, 0, ap_no } }; + + /* static because valgrind complains and memory management in C sucks */ + static struct Arg_parser parser; + if( !ap_init( &parser, argc, argv, options, 0 ) ) + { show_error( mem_msg, 0, false ); return 1; } + if( ap_error( &parser ) ) /* bad option */ + { show_error( ap_error( &parser ), 0, true ); return 1; } + + int argind = 0; + for( ; argind < ap_arguments( &parser ); ++argind ) + { + const int code = ap_code( &parser, argind ); + if( !code ) break; /* no more options */ + const char * const pn = ap_parsed_name( &parser, argind ); + const char * const arg = ap_argument( &parser, argind ); + switch( code ) + { + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + encoder_options = option_mapping[code-'0']; break; + case 'a': ignore_trailing = false; break; + case 'b': member_size = getnum( arg, pn, 100000, max_member_size ); break; + case 'c': to_stdout = true; break; + case 'd': set_mode( &program_mode, m_decompress ); break; + case 'f': force = true; break; + case 'F': recompress = true; break; + case 'h': show_help(); return 0; + case 'k': keep_input_files = true; break; + case 'm': encoder_options.match_len_limit = + getnum( arg, pn, LZ_min_match_len_limit(), + LZ_max_match_len_limit() ); break; + case 'n': break; + case 'o': if( strcmp( arg, "-" ) == 0 ) to_stdout = true; + else { default_output_filename = arg; } break; + case 'q': verbosity = -1; break; + case 's': encoder_options.dictionary_size = get_dict_size( arg, pn ); + break; + case 'S': volume_size = getnum( arg, pn, 100000, max_volume_size ); break; + case 't': set_mode( &program_mode, m_test ); break; + case 'v': if( verbosity < 4 ) ++verbosity; break; + case 'V': show_version(); return 0; + case opt_chk: return check_lib(); + case opt_lt: loose_trailing = true; break; + default : internal_error( "uncaught option." ); + } + } /* end process options */ + + if( strcmp( PROGVERSION, LZ_version_string ) != 0 ) + internal_error( "wrong PROGVERSION." ); +#if !defined LZ_API_VERSION || LZ_API_VERSION < 1012 +#error "lzlib 1.12 or newer needed." +#else + if( LZ_api_version() < 1012 ) /* minilzip passes null to LZ_decompress_read */ + { show_error( "lzlib 1.12 or newer needed. Try --check-lib.", 0, false ); + return 1; } + if( LZ_api_version() != LZ_API_VERSION ) show_error( + "warning: wrong library API version. Try --check-lib.", 0, false ); + else +#endif + if( strcmp( LZ_version_string, LZ_version() ) != 0 ) show_error( + "warning: wrong library version_string. Try --check-lib.", 0, false ); + +#if defined __MSVCRT__ || defined __OS2__ || defined __DJGPP__ + setmode( STDIN_FILENO, O_BINARY ); + setmode( STDOUT_FILENO, O_BINARY ); +#endif + + static const char ** filenames = 0; + int num_filenames = max( 1, ap_arguments( &parser ) - argind ); + filenames = resize_buffer( filenames, num_filenames * sizeof filenames[0] ); + filenames[0] = "-"; + + bool filenames_given = false; + for( i = 0; argind + i < ap_arguments( &parser ); ++i ) + { + filenames[i] = ap_argument( &parser, argind + i ); + if( strcmp( filenames[i], "-" ) != 0 ) filenames_given = true; + } + + if( program_mode == m_compress ) + { + if( volume_size > 0 && !to_stdout && default_output_filename[0] && + num_filenames > 1 ) + { show_error( "Only can compress one file when using '-o' and '-S'.", + 0, true ); return 1; } + } + else volume_size = 0; + if( program_mode == m_test ) to_stdout = false; /* apply overrides */ + if( program_mode == m_test || to_stdout ) default_output_filename = ""; + + output_filename = resize_buffer( output_filename, 1 ); + output_filename[0] = 0; + if( to_stdout && program_mode != m_test ) /* check tty only once */ + { outfd = STDOUT_FILENO; if( !check_tty_out( program_mode ) ) return 1; } + else outfd = -1; + + const bool to_file = !to_stdout && program_mode != m_test && + default_output_filename[0]; + if( !to_stdout && program_mode != m_test && ( filenames_given || to_file ) ) + set_signals( signal_handler ); + + static struct Pretty_print pp; + Pp_init( &pp, filenames, num_filenames ); + + int failed_tests = 0; + int retval = 0; + const bool one_to_one = !to_stdout && program_mode != m_test && !to_file; + bool stdin_used = false; + for( i = 0; i < num_filenames; ++i ) + { + const char * input_filename = ""; + int infd; + struct stat in_stats; + + Pp_set_name( &pp, filenames[i] ); + if( strcmp( filenames[i], "-" ) == 0 ) + { + if( stdin_used ) continue; else stdin_used = true; + infd = STDIN_FILENO; + if( !check_tty_in( pp.name, infd, program_mode, &retval ) ) continue; + if( one_to_one ) { outfd = STDOUT_FILENO; output_filename[0] = 0; } + } + else + { + const int eindex = extension_index( input_filename = filenames[i] ); + infd = open_instream( input_filename, &in_stats, program_mode, + eindex, one_to_one, recompress ); + if( infd < 0 ) { set_retval( &retval, 1 ); continue; } + if( !check_tty_in( pp.name, infd, program_mode, &retval ) ) continue; + if( one_to_one ) /* open outfd after verifying infd */ + { + if( program_mode == m_compress ) + set_c_outname( input_filename, true, volume_size > 0 ); + else set_d_outname( input_filename, eindex ); + if( !open_outstream( force, true ) ) + { close( infd ); set_retval( &retval, 1 ); continue; } + } + } + + if( one_to_one && !check_tty_out( program_mode ) ) + { set_retval( &retval, 1 ); return retval; } /* don't delete a tty */ + + if( to_file && outfd < 0 ) /* open outfd after verifying infd */ + { + if( program_mode == m_compress ) set_c_outname( default_output_filename, + false, volume_size > 0 ); + else + { output_filename = resize_buffer( output_filename, + strlen( default_output_filename ) + 1 ); + strcpy( output_filename, default_output_filename ); } + if( !open_outstream( force, false ) || !check_tty_out( program_mode ) ) + return 1; /* check tty only once and don't try to delete a tty */ + } + + const struct stat * const in_statsp = + ( input_filename[0] && one_to_one ) ? &in_stats : 0; + int tmp; + if( program_mode == m_compress ) + tmp = compress( member_size, volume_size, infd, &encoder_options, &pp, + in_statsp ); + else + tmp = decompress( infd, &pp, ignore_trailing, + loose_trailing, program_mode == m_test ); + if( close( infd ) != 0 ) + { show_file_error( pp.name, "Error closing input file", errno ); + set_retval( &tmp, 1 ); } + set_retval( &retval, tmp ); + if( tmp ) + { if( program_mode != m_test ) cleanup_and_fail( retval ); + else ++failed_tests; } + + if( delete_output_on_interrupt && one_to_one ) + close_and_set_permissions( in_statsp ); + if( input_filename[0] && !keep_input_files && one_to_one && + ( program_mode != m_compress || volume_size == 0 ) ) + remove( input_filename ); + } + if( delete_output_on_interrupt ) close_and_set_permissions( 0 ); /* -o */ + else if( outfd >= 0 && close( outfd ) != 0 ) /* -c */ + { + show_error( "Error closing stdout", errno, false ); + set_retval( &retval, 1 ); + } + if( failed_tests > 0 && verbosity >= 1 && num_filenames > 1 ) + fprintf( stderr, "%s: warning: %d %s failed the test.\n", + program_name, failed_tests, + ( failed_tests == 1 ) ? "file" : "files" ); + free( output_filename ); + free( filenames ); + ap_free( &parser ); + return retval; + } diff --git a/testsuite/check.sh b/testsuite/check.sh new file mode 100755 index 0000000..e93697e --- /dev/null +++ b/testsuite/check.sh @@ -0,0 +1,444 @@ +#! /bin/sh +# check script for Lzlib - Compression library for the lzip format +# Copyright (C) 2009-2022 Antonio Diaz Diaz. +# +# This script is free software: you have unlimited permission +# to copy, distribute, and modify it. + +LC_ALL=C +export LC_ALL +objdir=`pwd` +testdir=`cd "$1" ; pwd` +LZIP="${objdir}"/minilzip +BBEXAMPLE="${objdir}"/bbexample +FFEXAMPLE="${objdir}"/ffexample +LZCHECK="${objdir}"/lzcheck +framework_failure() { echo "failure in testing framework" ; exit 1 ; } + +if [ ! -f "${LZIP}" ] || [ ! -x "${LZIP}" ] ; then + echo "${LZIP}: cannot execute" + exit 1 +fi + +[ -e "${LZIP}" ] 2> /dev/null || + { + echo "$0: a POSIX shell is required to run the tests" + echo "Try bash -c \"$0 $1 $2\"" + exit 1 + } + +if [ -d tmp ] ; then rm -rf tmp ; fi +mkdir tmp +cd "${objdir}"/tmp || framework_failure + +cat "${testdir}"/test.txt > in || framework_failure +in_lz="${testdir}"/test.txt.lz +in_em="${testdir}"/test_em.txt.lz +fox_lf="${testdir}"/fox_lf +fox_lz="${testdir}"/fox.lz +fail=0 +test_failed() { fail=1 ; printf " $1" ; [ -z "$2" ] || printf "($2)" ; } + +"${LZIP}" --check-lib # just print warning +[ $? != 2 ] || { test_failed $LINENO ; exit 2 ; } # unless bad lzlib.h +printf "testing lzlib-%s..." "$2" + +"${LZIP}" -fkqm4 in +[ $? = 1 ] || test_failed $LINENO +[ ! -e in.lz ] || test_failed $LINENO +"${LZIP}" -fkqm274 in +[ $? = 1 ] || test_failed $LINENO +[ ! -e in.lz ] || test_failed $LINENO +for i in bad_size -1 0 4095 513MiB 1G 1T 1P 1E 1Z 1Y 10KB ; do + "${LZIP}" -fkqs $i in + [ $? = 1 ] || test_failed $LINENO $i + [ ! -e in.lz ] || test_failed $LINENO $i +done +"${LZIP}" -tq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq < in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -cdq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -cdq < in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -dq -o in < "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -dq -o in "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -dq -o out nx_file.lz +[ $? = 1 ] || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO +"${LZIP}" -q -o out.lz nx_file +[ $? = 1 ] || test_failed $LINENO +[ ! -e out.lz ] || test_failed $LINENO +"${LZIP}" -qf -S100k -o out in in +[ $? = 1 ] || test_failed $LINENO +# these are for code coverage +"${LZIP}" -cdt "${in_lz}" > out 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -t -- nx_file.lz 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -t "" < /dev/null 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --help > /dev/null || test_failed $LINENO +"${LZIP}" -n1 -V > /dev/null || test_failed $LINENO +"${LZIP}" -m 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -z 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --bad_option 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --t 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --test=2 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output= 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +printf "LZIP\001-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\002-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\001+.............................." | "${LZIP}" -t 2> /dev/null +rm -f out || framework_failure + +printf "\ntesting decompression..." + +for i in "${in_lz}" "${in_em}" "${testdir}"/test_sync.lz ; do + "${LZIP}" -t "$i" || test_failed $LINENO "$i" + "${LZIP}" -d "$i" -o copy || test_failed $LINENO "$i" + cmp in copy || test_failed $LINENO "$i" + "${LZIP}" -cd "$i" > copy || test_failed $LINENO "$i" + cmp in copy || test_failed $LINENO "$i" + "${LZIP}" -d "$i" -o - > copy || test_failed $LINENO "$i" + cmp in copy || test_failed $LINENO "$i" + "${LZIP}" -d < "$i" > copy || test_failed $LINENO "$i" + cmp in copy || test_failed $LINENO "$i" + rm -f copy || framework_failure +done + +lines=$("${LZIP}" -tvv "${in_em}" 2>&1 | wc -l) || test_failed $LINENO +[ "${lines}" -eq 8 ] || test_failed $LINENO "${lines}" + +"${LZIP}" -cd "${fox_lz}" > fox || test_failed $LINENO +cat "${in_lz}" > copy.lz || framework_failure +"${LZIP}" -dk copy.lz || test_failed $LINENO +cmp in copy || test_failed $LINENO +cat fox > copy || framework_failure +cat "${in_lz}" > out.lz || framework_failure +rm -f out || framework_failure +"${LZIP}" -d copy.lz out.lz 2> /dev/null # skip copy, decompress out +[ $? = 1 ] || test_failed $LINENO +cmp fox copy || test_failed $LINENO +cmp in out || test_failed $LINENO +"${LZIP}" -df copy.lz || test_failed $LINENO +[ ! -e copy.lz ] || test_failed $LINENO +cmp in copy || test_failed $LINENO +rm -f copy out || framework_failure + +cat "${in_lz}" > copy.lz || framework_failure +"${LZIP}" -d -S100k copy.lz || test_failed $LINENO # ignore -S +[ ! -e copy.lz ] || test_failed $LINENO +cmp in copy || test_failed $LINENO + +printf "to be overwritten" > copy || framework_failure +"${LZIP}" -df -o copy < "${in_lz}" || test_failed $LINENO +cmp in copy || test_failed $LINENO +rm -f out copy || framework_failure +"${LZIP}" -d -o ./- "${in_lz}" || test_failed $LINENO +cmp in ./- || test_failed $LINENO +rm -f ./- || framework_failure +"${LZIP}" -d -o ./- < "${in_lz}" || test_failed $LINENO +cmp in ./- || test_failed $LINENO +rm -f ./- || framework_failure + +cat "${in_lz}" > anyothername || framework_failure +"${LZIP}" -dv - anyothername - < "${in_lz}" > copy 2> /dev/null || + test_failed $LINENO +cmp in copy || test_failed $LINENO +cmp in anyothername.out || test_failed $LINENO +rm -f copy anyothername.out || framework_failure + +"${LZIP}" -tq in "${in_lz}" +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq nx_file.lz "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdq in "${in_lz}" > copy +[ $? = 2 ] || test_failed $LINENO +cat copy in | cmp in - || test_failed $LINENO # copy must be empty +"${LZIP}" -cdq nx_file.lz "${in_lz}" > copy +[ $? = 1 ] || test_failed $LINENO +cmp in copy || test_failed $LINENO +rm -f copy || framework_failure +cat "${in_lz}" > copy.lz || framework_failure +for i in 1 2 3 4 5 6 7 ; do + printf "g" >> copy.lz || framework_failure + "${LZIP}" -atvvvv copy.lz "${in_lz}" 2> /dev/null + [ $? = 2 ] || test_failed $LINENO $i +done +"${LZIP}" -dq in copy.lz +[ $? = 2 ] || test_failed $LINENO +[ -e copy.lz ] || test_failed $LINENO +[ ! -e copy ] || test_failed $LINENO +[ ! -e in.out ] || test_failed $LINENO +"${LZIP}" -dq nx_file.lz copy.lz +[ $? = 1 ] || test_failed $LINENO +[ ! -e copy.lz ] || test_failed $LINENO +[ ! -e nx_file ] || test_failed $LINENO +cmp in copy || test_failed $LINENO + +cat in in > in2 || framework_failure +"${LZIP}" -t "${in_lz}" "${in_lz}" || test_failed $LINENO +"${LZIP}" -cd "${in_lz}" "${in_lz}" -o out > copy2 || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO # override -o +cmp in2 copy2 || test_failed $LINENO +rm -f copy2 || framework_failure +"${LZIP}" -d "${in_lz}" "${in_lz}" -o copy2 || test_failed $LINENO +cmp in2 copy2 || test_failed $LINENO +rm -f copy2 || framework_failure + +cat "${in_lz}" "${in_lz}" > copy2.lz || framework_failure +printf "\ngarbage" >> copy2.lz || framework_failure +"${LZIP}" -tvvvv copy2.lz 2> /dev/null || test_failed $LINENO +"${LZIP}" -atq copy2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq < copy2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -adkq copy2.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e copy2 ] || test_failed $LINENO +"${LZIP}" -adkq -o copy2 < copy2.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e copy2 ] || test_failed $LINENO +printf "to be overwritten" > copy2 || framework_failure +"${LZIP}" -df copy2.lz || test_failed $LINENO +cmp in2 copy2 || test_failed $LINENO +rm -f copy2 || framework_failure + +printf "\ntesting compression..." + +"${LZIP}" -c -0 in in in -S100k -o out3.lz > copy2.lz || test_failed $LINENO +[ ! -e out3.lz ] || test_failed $LINENO # override -o and -S +"${LZIP}" -0f in in --output=copy2.lz || test_failed $LINENO +"${LZIP}" -d copy2.lz -o out2 || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f out2 copy2.lz || framework_failure + +"${LZIP}" -cf "${in_lz}" > out 2> /dev/null # /dev/null is a tty on OS/2 +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -Fvvm36 -o - -s16 "${in_lz}" > out 2> /dev/null || test_failed $LINENO +"${LZIP}" -cd out | "${LZIP}" -d > copy || test_failed $LINENO +cmp in copy || test_failed $LINENO + +"${LZIP}" -0 -o ./- in || test_failed $LINENO +"${LZIP}" -cd ./- | cmp in - || test_failed $LINENO +rm -f ./- || framework_failure +"${LZIP}" -0 -o ./- < in || test_failed $LINENO # don't add .lz +[ ! -e ./-.lz ] || test_failed $LINENO +"${LZIP}" -cd ./- | cmp in - || test_failed $LINENO +rm -f ./- || framework_failure + +for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do + "${LZIP}" -k -$i -s16 in || test_failed $LINENO $i + mv -f in.lz copy.lz || test_failed $LINENO $i + printf "garbage" >> copy.lz || framework_failure + "${LZIP}" -df copy.lz || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i + + "${LZIP}" -$i -s16 in -c > out || test_failed $LINENO $i + "${LZIP}" -$i -s16 in -o o_out || test_failed $LINENO $i # don't add .lz + [ ! -e o_out.lz ] || test_failed $LINENO + cmp out o_out || test_failed $LINENO $i + rm -f o_out || framework_failure + printf "g" >> out || framework_failure + "${LZIP}" -cd out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i + + "${LZIP}" -$i -s16 < in > out || test_failed $LINENO $i + "${LZIP}" -d < out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i + + rm -f out.lz || framework_failure + printf "to be overwritten" > out || framework_failure # don't add .lz + "${LZIP}" -f -$i -s16 -o out < in || test_failed $LINENO $i + [ ! -e out.lz ] || test_failed $LINENO + "${LZIP}" -df -o copy < out || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i +done +rm -f out out.lz || framework_failure + +cat in in in in in in in in > in8 || framework_failure +"${LZIP}" -1s12 -S100k in8 || test_failed $LINENO +"${LZIP}" -t in800001.lz in800002.lz || test_failed $LINENO +"${LZIP}" -cd in800001.lz in800002.lz | cmp in8 - || test_failed $LINENO +[ ! -e in800003.lz ] || test_failed $LINENO +rm -f in800001.lz in800002.lz || framework_failure +"${LZIP}" -1s12 -S100k -o out.lz in8 || test_failed $LINENO +# ignore -S +"${LZIP}" -d out.lz00001.lz out.lz00002.lz -S100k -o out || test_failed $LINENO +cmp in8 out || test_failed $LINENO +"${LZIP}" -t out.lz00001.lz out.lz00002.lz || test_failed $LINENO +[ ! -e out.lz00003.lz ] || test_failed $LINENO +rm -f out out.lz00001.lz out.lz00002.lz || framework_failure +"${LZIP}" -1ks4Ki -b100000 in8 || test_failed $LINENO +"${LZIP}" -t in8.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz -o out | cmp in8 - || test_failed $LINENO # override -o +[ ! -e out ] || test_failed $LINENO +"${LZIP}" -0 -S100k -o out < in8.lz || test_failed $LINENO +"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO +"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO +[ ! -e out00003.lz ] || test_failed $LINENO +rm -f out00001.lz || framework_failure +"${LZIP}" -1 -S100k -o out < in8.lz || test_failed $LINENO +"${LZIP}" -t out00001.lz out00002.lz || test_failed $LINENO +"${LZIP}" -cd out00001.lz out00002.lz | cmp in8.lz - || test_failed $LINENO +[ ! -e out00003.lz ] || test_failed $LINENO +rm -f out00001.lz out00002.lz || framework_failure +"${LZIP}" -0 -F -S100k in8.lz || test_failed $LINENO +"${LZIP}" -t in8.lz00001.lz in8.lz00002.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz00001.lz in8.lz00002.lz | cmp in8.lz - || test_failed $LINENO +[ ! -e in8.lz00003.lz ] || test_failed $LINENO +rm -f in8.lz00001.lz in8.lz00002.lz || framework_failure +"${LZIP}" -0kF -b100k in8.lz || test_failed $LINENO +"${LZIP}" -t in8.lz.lz || test_failed $LINENO +"${LZIP}" -cd in8.lz.lz | cmp in8.lz - || test_failed $LINENO +rm -f in8.lz in8.lz.lz || framework_failure + +"${BBEXAMPLE}" in || test_failed $LINENO +"${BBEXAMPLE}" "${in_lz}" || test_failed $LINENO +"${BBEXAMPLE}" "${fox_lf}" || test_failed $LINENO + +"${FFEXAMPLE}" -h > /dev/null || test_failed $LINENO +"${FFEXAMPLE}" > /dev/null && test_failed $LINENO +rm -f out || framework_failure +"${FFEXAMPLE}" -b in out || test_failed $LINENO +cmp in out || test_failed $LINENO +"${FFEXAMPLE}" -b in | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -b in8 | cmp in8 - || test_failed $LINENO +"${FFEXAMPLE}" -b "${fox_lf}" | cmp "${fox_lf}" - || test_failed $LINENO +"${FFEXAMPLE}" -d "${in_lz}" - | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -d "${in_em}" - | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -c in | "${FFEXAMPLE}" -d | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -m in | "${FFEXAMPLE}" -d | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -l in | "${FFEXAMPLE}" -d | cmp in - || test_failed $LINENO +cat "${fox_lf}" "${in_lz}" | "${FFEXAMPLE}" -r | cmp in - || test_failed $LINENO +cat in8 "${in_lz}" | "${FFEXAMPLE}" -r | cmp in - || test_failed $LINENO +cat "${in_lz}" "${fox_lf}" "${in_lz}" | "${FFEXAMPLE}" -r - | cmp in2 - || + test_failed $LINENO +cat "${in_lz}" in8 "${in_lz}" | "${FFEXAMPLE}" -r - - | cmp in2 - || + test_failed $LINENO + +"${LZCHECK}" in || test_failed $LINENO +"${LZCHECK}" "${in_lz}" || test_failed $LINENO +"${LZCHECK}" "${fox_lf}" || test_failed $LINENO +rm -f in8 || framework_failure + +printf "\ntesting bad input..." + +headers='LZIp LZiP LZip LzIP LzIp LziP lZIP lZIp lZiP lzIP' +body='\001\014\000\203\377\373\377\377\300\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000$\000\000\000\000\000\000\000' +cat "${in_lz}" > int.lz +printf "LZIP${body}" >> int.lz +if "${LZIP}" -tq int.lz ; then + for header in ${headers} ; do + printf "${header}${body}" > int.lz # first member + "${LZIP}" -tq int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq --loose-trailing int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + cat "${in_lz}" > int.lz + printf "${header}${body}" >> int.lz # trailing data + "${LZIP}" -tq int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -t --loose-trailing int.lz || + test_failed $LINENO ${header} + "${LZIP}" -t --loose-trailing < int.lz || + test_failed $LINENO ${header} + "${LZIP}" -cd --loose-trailing int.lz > /dev/null || + test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing --trailing-error int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing --trailing-error < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq --loose-trailing --trailing-error int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + done +else + printf "\nwarning: skipping header test: 'printf' does not work on your system." +fi +rm -f int.lz || framework_failure + +for i in fox_v2.lz fox_s11.lz fox_de20.lz \ + fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do + "${LZIP}" -tq "${testdir}"/$i + [ $? = 2 ] || test_failed $LINENO $i +done + +for i in fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do + "${LZIP}" -cdq "${testdir}"/$i > out + [ $? = 2 ] || test_failed $LINENO $i + cmp fox out || test_failed $LINENO $i +done +rm -f fox out || framework_failure + +cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure +cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure +if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null && + [ -e trunc.lz ] && cmp in2.lz trunc.lz > /dev/null 2>&1 ; then + for i in 6 20 14734 14753 14754 14755 14756 14757 14758 ; do + dd if=in3.lz of=trunc.lz bs=$i count=1 2> /dev/null + "${LZIP}" -tq trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -tq < trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -cdq trunc.lz > out + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -dq < trunc.lz > out + [ $? = 2 ] || test_failed $LINENO $i + done +else + printf "\nwarning: skipping truncation test: 'dd' does not work on your system." +fi +rm -f in2.lz in3.lz trunc.lz out || framework_failure + +cat "${in_lz}" > ingin.lz || framework_failure +printf "g" >> ingin.lz || framework_failure +cat "${in_lz}" >> ingin.lz || framework_failure +"${LZIP}" -atq ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq < ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -acdq ingin.lz > out +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -adq < ingin.lz > out +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -t ingin.lz || test_failed $LINENO +"${LZIP}" -t < ingin.lz || test_failed $LINENO +"${LZIP}" -cd ingin.lz > copy || test_failed $LINENO +cmp in copy || test_failed $LINENO +"${LZIP}" -d < ingin.lz > copy || test_failed $LINENO +cmp in copy || test_failed $LINENO +"${FFEXAMPLE}" -d ingin.lz | cmp in - || test_failed $LINENO +"${FFEXAMPLE}" -r ingin.lz | cmp in2 - || test_failed $LINENO +rm -f copy ingin.lz in2 out || framework_failure + +echo +if [ ${fail} = 0 ] ; then + echo "tests completed successfully." + cd "${objdir}" && rm -r tmp +else + echo "tests failed." +fi +exit ${fail} diff --git a/testsuite/fox.lz b/testsuite/fox.lz Binary files differnew file mode 100644 index 0000000..509da82 --- /dev/null +++ b/testsuite/fox.lz diff --git a/testsuite/fox_bcrc.lz b/testsuite/fox_bcrc.lz Binary files differnew file mode 100644 index 0000000..8f6a7c4 --- /dev/null +++ b/testsuite/fox_bcrc.lz diff --git a/testsuite/fox_crc0.lz b/testsuite/fox_crc0.lz Binary files differnew file mode 100644 index 0000000..1abe926 --- /dev/null +++ b/testsuite/fox_crc0.lz diff --git a/testsuite/fox_das46.lz b/testsuite/fox_das46.lz Binary files differnew file mode 100644 index 0000000..43ed9f9 --- /dev/null +++ b/testsuite/fox_das46.lz diff --git a/testsuite/fox_de20.lz b/testsuite/fox_de20.lz Binary files differnew file mode 100644 index 0000000..10949d8 --- /dev/null +++ b/testsuite/fox_de20.lz diff --git a/testsuite/fox_lf b/testsuite/fox_lf new file mode 100644 index 0000000..a0b11b5 --- /dev/null +++ b/testsuite/fox_lf @@ -0,0 +1,9 @@ +The +quick +brown +fox +jumps +over +the +lazy +dog. diff --git a/testsuite/fox_mes81.lz b/testsuite/fox_mes81.lz Binary files differnew file mode 100644 index 0000000..d50ef2e --- /dev/null +++ b/testsuite/fox_mes81.lz diff --git a/testsuite/fox_s11.lz b/testsuite/fox_s11.lz Binary files differnew file mode 100644 index 0000000..dca909c --- /dev/null +++ b/testsuite/fox_s11.lz diff --git a/testsuite/fox_v2.lz b/testsuite/fox_v2.lz Binary files differnew file mode 100644 index 0000000..8620981 --- /dev/null +++ b/testsuite/fox_v2.lz diff --git a/testsuite/test.txt b/testsuite/test.txt new file mode 100644 index 0000000..9196a3a --- /dev/null +++ b/testsuite/test.txt @@ -0,0 +1,676 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) <year> <name of author> + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + <signature of Ty Coon>, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. + GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) <year> <name of author>
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
diff --git a/testsuite/test.txt.lz b/testsuite/test.txt.lz Binary files differnew file mode 100644 index 0000000..22cea6e --- /dev/null +++ b/testsuite/test.txt.lz diff --git a/testsuite/test_em.txt.lz b/testsuite/test_em.txt.lz Binary files differnew file mode 100644 index 0000000..7e96250 --- /dev/null +++ b/testsuite/test_em.txt.lz diff --git a/testsuite/test_sync.lz b/testsuite/test_sync.lz Binary files differnew file mode 100644 index 0000000..db680c3 --- /dev/null +++ b/testsuite/test_sync.lz |