diff options
-rw-r--r-- | AUTHORS | 1 | ||||
-rw-r--r-- | COPYING | 338 | ||||
-rw-r--r-- | ChangeLog | 235 | ||||
-rw-r--r-- | INSTALL | 92 | ||||
-rw-r--r-- | Makefile.in | 145 | ||||
-rw-r--r-- | NEWS | 14 | ||||
-rw-r--r-- | README | 112 | ||||
-rw-r--r-- | arg_parser.cc | 197 | ||||
-rw-r--r-- | arg_parser.h | 110 | ||||
-rw-r--r-- | compress.cc | 558 | ||||
-rwxr-xr-x | configure | 210 | ||||
-rw-r--r-- | dec_stdout.cc | 337 | ||||
-rw-r--r-- | dec_stream.cc | 650 | ||||
-rw-r--r-- | decompress.cc | 363 | ||||
-rw-r--r-- | doc/plzip.1 | 148 | ||||
-rw-r--r-- | doc/plzip.info | 833 | ||||
-rw-r--r-- | doc/plzip.texi | 907 | ||||
-rw-r--r-- | list.cc | 114 | ||||
-rw-r--r-- | lzip.h | 340 | ||||
-rw-r--r-- | lzip_index.cc | 209 | ||||
-rw-r--r-- | lzip_index.h | 94 | ||||
-rw-r--r-- | main.cc | 1016 | ||||
-rwxr-xr-x | testsuite/check.sh | 447 | ||||
-rw-r--r-- | testsuite/fox.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_bcrc.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_crc0.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_das46.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_de20.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_mes81.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_s11.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/fox_v2.lz | bin | 0 -> 80 bytes | |||
-rw-r--r-- | testsuite/test.txt | 676 | ||||
-rw-r--r-- | testsuite/test.txt.lz | bin | 0 -> 7376 bytes | |||
-rw-r--r-- | testsuite/test_em.txt.lz | bin | 0 -> 14024 bytes |
34 files changed, 8146 insertions, 0 deletions
@@ -0,0 +1 @@ +Plzip was written by Laszlo Ersek and Antonio Diaz Diaz. @@ -0,0 +1,338 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) <year> <name of author> + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + <signature of Ty Coon>, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. diff --git a/ChangeLog b/ChangeLog new file mode 100644 index 0000000..ed480ff --- /dev/null +++ b/ChangeLog @@ -0,0 +1,235 @@ +2024-01-21 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.11 released. + * main.cc: Reformat file diagnostics as 'PROGRAM: FILE: MESSAGE'. + (show_option_error): New function showing argument and option name. + (main): Make -o preserve date/mode/owner if 1 input file. + (open_outstream): Create missing intermediate directories. + * configure, Makefile.in: New variable 'MAKEINFO'. + +2022-01-24 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.10 released. + * main.cc (getnum): Show option name and valid range if error. + (check_lib): Check that LZ_API_VERSION and LZ_version_string match. + * configure: Set variable LIBS. + * Improve several descriptions in manual, '--help', and man page. + * plzip.texi: Change GNU Texinfo category to 'Compression'. + (Reported by Alfred M. Szmidt). + +2021-01-03 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.9 released. + * main.cc (main): Report an error if a file name is empty. + Make '-o' behave like '-c', but writing to file instead of stdout. + Make '-c' and '-o' check whether the output is a terminal only once. + Do not open output if input is a terminal. + * main.cc: New option '--check-lib'. + * Replace 'decompressed', 'compressed' with 'out', 'in' in output. + * decompress.cc, dec_stream.cc, dec_stdout.cc: + Continue testing if any input file fails the test. + Show the largest dictionary size in a multimember file. + * main.cc: Show final diagnostic when testing multiple files. + * decompress.cc, dec_stream.cc [LZ_API_VERSION >= 1012]: Avoid + copying decompressed data when testing with lzlib 1.12 or newer. + * compress.cc, dec_stream.cc: Start only the worker threads required. + * dec_stream.cc: Splitter stops reading when trailing data is found. + Don't include trailing data in the compressed size shown. + Use plain comparison instead of Boyer-Moore to search for headers. + * lzip_index.cc: Improve messages for corruption in last header. + * decompress.cc: Shorten messages 'Data error' and 'Unexpected EOF'. + * main.cc: Set a valid invocation_name even if argc == 0. + * Document extraction from tar.lz in manual, '--help', and man page. + * plzip.texi (Introduction): Mention tarlz as an alternative. + * plzip.texi: Several fixes and improvements. + * testsuite: Add 8 new test files. + +2019-01-05 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.8 released. + * Rename File_* to Lzip_*. + * main.cc: New options '--in-slots' and '--out-slots'. + * main.cc: Increase default in_slots per worker from 2 to 4. + * main.cc: Increase default out_slots per worker from 32 to 64. + * lzip.h (Lzip_trailer): New function 'verify_consistency'. + * lzip_index.cc: Detect some kinds of corrupt trailers. + * main.cc (main): Check return value of close( infd ). + * plzip.texi: Improve description of '-0..-9', '-m', and '-s'. + * configure: New option '--with-mingw'. + * configure: Accept appending to CXXFLAGS; 'CXXFLAGS+=OPTIONS'. + * INSTALL: Document use of CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'. + +2018-02-07 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.7 released. + * compress.cc: Use 'LZ_compress_restart_member' and replace input + packet queue by a circular buffer to reduce memory fragmentation. + * compress.cc: Return one empty packet at a time to reduce mem use. + * main.cc: Reduce threads on 32 bit systems to use under 2.22 GiB. + * main.cc: New option '--loose-trailing'. + * Improve corrupt header detection to HD = 3 on seekable files. + (On all files with lzlib 1.10 or newer). + * Replace 'bits/byte' with inverse compression ratio in output. + * Show progress of decompression at verbosity level 2 (-vv). + * Show progress of (de)compression only if stderr is a terminal. + * main.cc: Do not add a second .lz extension to the arg of -o. + * Show dictionary size at verbosity level 4 (-vvvv). + * main.cc (cleanup_and_fail): Suppress messages from other threads. + * list.cc: Add missing '#include <pthread.h>'. + * plzip.texi: New chapter 'Output'. + * plzip.texi (Memory requirements): Add table. + * plzip.texi (Program design): Add a block diagram. + +2017-04-12 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.6 released. + * The option '-l, --list' has been ported from lziprecover. + * Don't allow mixing different operations (-d, -l or -t). + * main.cc: Continue testing if any input file is a terminal. + * lzip_index.cc: Improve detection of bad dict and trailing data. + * lzip.h: Unify messages for bad magic, trailing data, etc. + +2016-05-14 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.5 released. + * main.cc: New option '-a, --trailing-error'. + * main.cc (main): Delete '--output' file if infd is a terminal. + * main.cc (main): Don't use stdin more than once. + * plzip.texi: New chapters 'Trailing data' and 'Examples'. + * configure: Avoid warning on some shells when testing for g++. + * Makefile.in: Detect the existence of install-info. + * check.sh: A POSIX shell is required to run the tests. + * check.sh: Don't check error messages. + +2015-07-09 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.4 released. + * Option '-0' now uses the fast encoder of lzlib 1.7. + +2015-01-22 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.3 released. + * dec_stream.cc: Don't use output packets or muxer when testing. + * Make '-dvvv' and '-tvvv' show dictionary size like lzip. + * lzip.h: Add missing 'const' to the declaration of 'compress'. + * plzip.texi: New chapters 'Memory requirements' and + 'Minimum file sizes'. + * Makefile.in: New targets 'install*-compress'. + +2014-08-29 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.2 released. + * main.cc (close_and_set_permissions): Behave like 'cp -p'. + * dec_stdout.cc, dec_stream.cc: Make 'slot_av' a vector to limit + the number of packets produced by each worker individually. + * plzip.texinfo: Rename to plzip.texi. + * plzip.texi: Document the approximate amount of memory required. + * Change license to GPL version 2 or later. + +2013-09-17 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.1 released. + * Show progress of compression at verbosity level 2 (-vv). + * SIGUSR1 and SIGUSR2 are no longer used to signal a fatal error. + +2013-05-29 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.0 released. + * compress.cc: Change 'deliver_packet' to 'deliver_packets'. + * Scalability of decompression from/to regular files has been + increased by removing splitter and muxer when not needed. + * The number of worker threads is now limited to the number of + members when decompressing from a regular file. + * configure: Options now accept a separate argument. + * Makefile.in: New targets 'install-as-lzip' and 'install-bin'. + * main.cc: Use 'setmode' instead of '_setmode' on Windows and OS/2. + * main.cc: Define 'strtoull' to 'std::strtoul' on Windows. + +2012-03-01 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.9 released. + * Minor fixes and cleanups. + * configure: Rename 'datadir' to 'datarootdir'. + +2012-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.8 released. + * main.cc: New option '-F, --recompress'. + * decompress.cc (decompress): Show compression ratio. + * main.cc (close_and_set_permissions): Inability to change output + file attributes has been downgraded from error to warning. + * Small change in '--help' output and man page. + * Change quote characters in messages as advised by GNU Standards. + * main.cc: Set stdin/stdout in binary mode on OS2. + * compress.cc: Reduce memory use of compressed packets. + * decompress.cc: Use Boyer-Moore algorithm to search for headers. + +2010-12-03 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.7 released. + * Match length limits set by options -1 to -9 have been changed + to match those of lzip 1.11. + * decompress.cc: A limit has been set on the number of packets + produced by workers to limit the amount of memory used. + * main.cc (open_instream): Don't show the message + " and '--stdout' was not specified" for directories, etc. + Exit with status 1 if any output file exists and is skipped. + * main.cc: Fix warning about fchown return value being ignored. + * testsuite: Rename 'test1' to 'test.txt'. New tests. + +2010-03-20 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.6 released. + * Small portability fixes. + * plzip.texinfo: New chapter 'Program Design'. + Add missing description of option '-n, --threads'. + * Fix debug statistics. + +2010-02-10 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.5 released. + * Parallel decompression has been implemented. + +2010-01-31 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.4 released. + * main.cc (show_version): Show the version of lzlib being used. + * Code reorganization. Class Packet_courier now coordinates data + movement and synchronization among threads. + +2010-01-24 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.3 released. + * New option '-B, --data-size'. + * Output file is now removed if plzip is interrupted. + * This version automatically chooses the smallest possible + dictionary size for each member during compression, saving + memory during decompression. + * main.cc: New constant 'o_binary'. + +2010-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.2 released. + * New options '-s, --dictionary-size' and '-m, --match-length'. + * 'lacos_rbtree' has been replaced with a circular buffer. + +2009-12-05 Antonio Diaz Diaz <ant_diaz@teleline.es> + + * Version 0.1 released. + * This version is based on llzip-0.03 (2009-11-21), written by + Laszlo Ersek <lacos@caesar.elte.hu>. Thanks Laszlo! + From llzip-0.03/README: + + llzip is a hack on my lbzip2-0.17 release. I ripped out the + decompression stuff, and replaced the bzip2 compression with + the lzma compression from lzlib-0.7. llzip is mainly meant + as an assisted fork point for the lzip developers. + Nonetheless, I tried to review the diff against lbzip2-0.17 + thoroughly, and I think llzip should be usable on its own + until something better appears on the net. + + +Copyright (C) 2009-2024 Antonio Diaz Diaz. + +This file is a collection of facts, and thus it is not copyrightable, but just +in case, you have unlimited permission to copy, distribute, and modify it. @@ -0,0 +1,92 @@ +Requirements +------------ +You will need a C++98 compiler with support for 'long long', and the +compression library lzlib installed. (gcc 3.3.6 or newer is recommended). +I use gcc 6.1.0 and 3.3.6, but the code should compile with any standards +compliant compiler. +Gcc is available at http://gcc.gnu.org. +Lzlib is available at http://www.nongnu.org/lzip/lzlib.html. + +Lzlib must be version 1.0 or newer, but the fast encoder requires lzlib 1.7 +or newer, the Hamming distance (HD) = 3 detection of corrupt headers in +non-seekable multimember files requires lzlib 1.10 or newer, and the +'no copy' optimization for testing requires lzlib 1.12 or newer. + +The operating system must allow signal handlers read access to objects with +static storage duration so that the cleanup handler for Control-C can delete +the partial output file. + + +Procedure +--------- +1. Unpack the archive if you have not done so already: + + tar -xf plzip[version].tar.lz +or + lzip -cd plzip[version].tar.lz | tar -xf - + +This creates the directory ./plzip[version] containing the source code +extracted from the archive. + +2. Change to plzip directory and run configure. + (Try 'configure --help' for usage instructions). + + cd plzip[version] + ./configure + + To link against a lzlib not installed in a standard place, use: + + ./configure CPPFLAGS='-I <includedir>' LDFLAGS='-L <libdir>' + + (Replace <includedir> with the directory containing the file lzlib.h, + and <libdir> with the directory containing the file liblz.a). + + If you are compiling on MinGW, use --with-mingw (note that the Windows + I/O functions used with MinGW are not guaranteed to be thread safe): + + ./configure --with-mingw CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO' + +3. Run make. + + make + +4. Optionally, type 'make check' to run the tests that come with plzip. + +5. Type 'make install' to install the program and any data files and + documentation. You need root privileges to install into a prefix owned + by root. + + Or type 'make install-compress', which additionally compresses the + info manual and the man page after installation. + (Installing compressed docs may become the default in the future). + + You can install only the program, the info manual, or the man page by + typing 'make install-bin', 'make install-info', or 'make install-man' + respectively. + + Instead of 'make install', you can type 'make install-as-lzip' to + install the program and any data files and documentation, and link + the program to the name 'lzip'. + + +Another way +----------- +You can also compile plzip into a separate directory. +To do this, you must use a version of 'make' that supports the variable +'VPATH', such as GNU 'make'. 'cd' to the directory where you want the +object files and executables to go and run the 'configure' script. +'configure' automatically checks for the source code in '.', in '..', and +in the directory that 'configure' is in. + +'configure' recognizes the option '--srcdir=DIR' to control where to look +for the source code. Usually 'configure' can determine that directory +automatically. + +After running 'configure', you can run 'make' and 'make install' as +explained above. + + +Copyright (C) 2009-2024 Antonio Diaz Diaz. + +This file is free documentation: you have unlimited permission to copy, +distribute, and modify it. diff --git a/Makefile.in b/Makefile.in new file mode 100644 index 0000000..bb3afc0 --- /dev/null +++ b/Makefile.in @@ -0,0 +1,145 @@ + +DISTNAME = $(pkgname)-$(pkgversion) +INSTALL = install +INSTALL_PROGRAM = $(INSTALL) -m 755 +INSTALL_DATA = $(INSTALL) -m 644 +INSTALL_DIR = $(INSTALL) -d -m 755 +SHELL = /bin/sh +CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1 + +objs = arg_parser.o lzip_index.o list.o compress.o dec_stdout.o \ + dec_stream.o decompress.o main.o + + +.PHONY : all install install-bin install-info install-man \ + install-strip install-compress install-strip-compress \ + install-bin-strip install-info-compress install-man-compress \ + install-as-lzip \ + uninstall uninstall-bin uninstall-info uninstall-man \ + doc info man check dist clean distclean + +all : $(progname) + +$(progname) : $(objs) + $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(objs) $(LIBS) + +decompress.o : decompress.cc + $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(with_mingw) -c -o $@ $< + +main.o : main.cc + $(CXX) $(CPPFLAGS) $(CXXFLAGS) -DPROGVERSION=\"$(pkgversion)\" -c -o $@ $< + +%.o : %.cc + $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c -o $@ $< + +# prevent 'make' from trying to remake source files +$(VPATH)/configure $(VPATH)/Makefile.in $(VPATH)/doc/$(pkgname).texi : ; +%.h %.cc : ; + +$(objs) : Makefile +arg_parser.o : arg_parser.h +compress.o : lzip.h +dec_stdout.o : lzip.h lzip_index.h +dec_stream.o : lzip.h +decompress.o : lzip.h lzip_index.h +list.o : lzip.h lzip_index.h +lzip_index.o : lzip.h lzip_index.h +main.o : arg_parser.h lzip.h + +doc : info man + +info : $(VPATH)/doc/$(pkgname).info + +$(VPATH)/doc/$(pkgname).info : $(VPATH)/doc/$(pkgname).texi + cd $(VPATH)/doc && $(MAKEINFO) $(pkgname).texi + +man : $(VPATH)/doc/$(progname).1 + +$(VPATH)/doc/$(progname).1 : $(progname) + help2man -n 'reduces the size of files' -o $@ ./$(progname) + +Makefile : $(VPATH)/configure $(VPATH)/Makefile.in + ./config.status + +check : all + @$(VPATH)/testsuite/check.sh $(VPATH)/testsuite $(pkgversion) + +install : install-bin install-info install-man +install-strip : install-bin-strip install-info install-man +install-compress : install-bin install-info-compress install-man-compress +install-strip-compress : install-bin-strip install-info-compress install-man-compress + +install-bin : all + if [ ! -d "$(DESTDIR)$(bindir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(bindir)" ; fi + $(INSTALL_PROGRAM) ./$(progname) "$(DESTDIR)$(bindir)/$(progname)" + +install-bin-strip : all + $(MAKE) INSTALL_PROGRAM='$(INSTALL_PROGRAM) -s' install-bin + +install-info : + if [ ! -d "$(DESTDIR)$(infodir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(infodir)" ; fi + -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"* + $(INSTALL_DATA) $(VPATH)/doc/$(pkgname).info "$(DESTDIR)$(infodir)/$(pkgname).info" + -if $(CAN_RUN_INSTALLINFO) ; then \ + install-info --info-dir="$(DESTDIR)$(infodir)" "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ + fi + +install-info-compress : install-info + lzip -v -9 "$(DESTDIR)$(infodir)/$(pkgname).info" + +install-man : + if [ ! -d "$(DESTDIR)$(mandir)/man1" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1" ; fi + -rm -f "$(DESTDIR)$(mandir)/man1/$(progname).1"* + $(INSTALL_DATA) $(VPATH)/doc/$(progname).1 "$(DESTDIR)$(mandir)/man1/$(progname).1" + +install-man-compress : install-man + lzip -v -9 "$(DESTDIR)$(mandir)/man1/$(progname).1" + +install-as-lzip : install + -rm -f "$(DESTDIR)$(bindir)/lzip" + cd "$(DESTDIR)$(bindir)" && ln -s $(progname) lzip + +uninstall : uninstall-man uninstall-info uninstall-bin + +uninstall-bin : + -rm -f "$(DESTDIR)$(bindir)/$(progname)" + +uninstall-info : + -if $(CAN_RUN_INSTALLINFO) ; then \ + install-info --info-dir="$(DESTDIR)$(infodir)" --remove "$(DESTDIR)$(infodir)/$(pkgname).info" ; \ + fi + -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"* + +uninstall-man : + -rm -f "$(DESTDIR)$(mandir)/man1/$(progname).1"* + +dist : doc + ln -sf $(VPATH) $(DISTNAME) + tar -Hustar --owner=root --group=root -cvf $(DISTNAME).tar \ + $(DISTNAME)/AUTHORS \ + $(DISTNAME)/COPYING \ + $(DISTNAME)/ChangeLog \ + $(DISTNAME)/INSTALL \ + $(DISTNAME)/Makefile.in \ + $(DISTNAME)/NEWS \ + $(DISTNAME)/README \ + $(DISTNAME)/configure \ + $(DISTNAME)/doc/$(progname).1 \ + $(DISTNAME)/doc/$(pkgname).info \ + $(DISTNAME)/doc/$(pkgname).texi \ + $(DISTNAME)/*.h \ + $(DISTNAME)/*.cc \ + $(DISTNAME)/testsuite/check.sh \ + $(DISTNAME)/testsuite/test.txt \ + $(DISTNAME)/testsuite/fox.lz \ + $(DISTNAME)/testsuite/fox_*.lz \ + $(DISTNAME)/testsuite/test.txt.lz \ + $(DISTNAME)/testsuite/test_em.txt.lz + rm -f $(DISTNAME) + lzip -v -9 $(DISTNAME).tar + +clean : + -rm -f $(progname) $(objs) + +distclean : clean + -rm -f Makefile config.status *.tar *.tar.lz @@ -0,0 +1,14 @@ +Changes in version 1.11: + +File diagnostics have been reformatted as 'PROGRAM: FILE: MESSAGE'. + +Diagnostics caused by invalid arguments to command-line options now show the +argument and the name of the option. + +The option '-o, --output' now preserves dates, permissions, and ownership of +the file when (de)compressing exactly one file. + +The option '-o, --output' now creates missing intermediate directories when +writing to a file. + +The variable MAKEINFO has been added to configure and Makefile.in. @@ -0,0 +1,112 @@ +Description + +Plzip is a massively parallel (multi-threaded) implementation of lzip, +compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib. + +Lzip is a lossless data compressor with a user interface similar to the one +of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov +chain-Algorithm' (LZMA) stream format to maximize interoperability. The +maximum dictionary size is 512 MiB so that any lzip file can be decompressed +on 32-bit machines. Lzip provides accurate and robust 3-factor integrity +checking. Lzip can compress about as fast as gzip (lzip -0) or compress most +files more than bzip2 (lzip -9). Decompression speed is intermediate between +gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery +perspective. Lzip has been designed, written, and tested with great care to +replace gzip and bzip2 as the standard general-purpose compressed format for +Unix-like systems. + +Plzip can compress/decompress large files on multiprocessor machines much +faster than lzip, at the cost of a slightly reduced compression ratio (0.4 +to 2 percent larger compressed files). Note that the number of usable +threads is limited by file size; on files larger than a few GB plzip can use +hundreds of processors, but on files of only a few MB plzip is no faster +than lzip. + +For creation and manipulation of compressed tar archives tarlz can be more +efficient than using tar and plzip because tarlz is able to keep the +alignment between tar members and lzip members. + +When compressing, plzip divides the input file into chunks and compresses as +many chunks simultaneously as worker threads are chosen, creating a +multimember compressed file. Each chunk is compressed in-place (using the +same buffer for input and output), reducing the amount of RAM required. + +When decompressing, plzip decompresses as many members simultaneously as +worker threads are chosen. Files that were compressed with lzip are not +decompressed faster than using lzip (unless the option '-b' was used) +because lzip usually produces single-member files, which can't be +decompressed in parallel. + +The lzip file format is designed for data sharing and long-term archiving, +taking into account both data integrity and decoder availability: + + * The lzip format provides very safe integrity checking and some data + recovery means. The program lziprecover can repair bit flip errors + (one of the most common forms of data corruption) in lzip files, and + provides data recovery capabilities, including error-checked merging + of damaged copies of a file. + + * The lzip format is as simple as possible (but not simpler). The lzip + manual provides the source code of a simple decompressor along with a + detailed explanation of how it works, so that with the only help of the + lzip manual it would be possible for a digital archaeologist to extract + the data from a lzip file long after quantum computers eventually + render LZMA obsolete. + + * Additionally the lzip reference implementation is copylefted, which + guarantees that it will remain free forever. + +A nice feature of the lzip format is that a corrupt byte is easier to repair +the nearer it is from the beginning of the file. Therefore, with the help of +lziprecover, losing an entire archive just because of a corrupt byte near +the beginning is a thing of the past. + +Plzip uses the same well-defined exit status values used by lzip, which +makes it safer than compressors returning ambiguous warning values (like +gzip) when it is used as a back end for other programs like tar or zutils. + +Plzip automatically uses for each file the largest dictionary size that does +not exceed neither the file size nor the limit given. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. + +When compressing, plzip replaces every file given in the command line +with a compressed version of itself, with the name "original_name.lz". +When decompressing, plzip attempts to guess the name for the decompressed +file from that of the compressed file as follows: + +filename.lz becomes filename +filename.tlz becomes filename.tar +anyothername becomes anyothername.out + +(De)compressing a file is much like copying or moving it. Therefore plzip +preserves the access and modification dates, permissions, and, if you have +appropriate privileges, ownership of the file just as 'cp -p' does. (If the +user ID or the group ID can't be duplicated, the file permission bits +S_ISUID and S_ISGID are cleared). + +Plzip is able to read from some types of non-regular files if either the +option '-c' or the option '-o' is specified. + +If no file names are specified, plzip compresses (or decompresses) from +standard input to standard output. Plzip refuses to read compressed data +from a terminal or write compressed data to a terminal, as this would be +entirely incomprehensible and might leave the terminal in an abnormal state. + +Plzip correctly decompresses a file which is the concatenation of two or +more compressed files. The result is the concatenation of the corresponding +decompressed files. Integrity testing of concatenated compressed files is +also supported. + +LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have +been compressed. Decompressed is used to refer to data which have undergone +the process of decompression. + + +Copyright (C) 2009-2024 Antonio Diaz Diaz. + +This file is free documentation: you have unlimited permission to copy, +distribute, and modify it. + +The file Makefile.in is a data file used by configure to produce the Makefile. +It has the same copyright owner and permissions that configure itself. diff --git a/arg_parser.cc b/arg_parser.cc new file mode 100644 index 0000000..0c04d8e --- /dev/null +++ b/arg_parser.cc @@ -0,0 +1,197 @@ +/* Arg_parser - POSIX/GNU command-line argument parser. (C++ version) + Copyright (C) 2006-2024 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +#include <cstring> +#include <string> +#include <vector> + +#include "arg_parser.h" + + +bool Arg_parser::parse_long_option( const char * const opt, const char * const arg, + const Option options[], int & argind ) + { + unsigned len; + int index = -1; + bool exact = false, ambig = false; + + for( len = 0; opt[len+2] && opt[len+2] != '='; ++len ) ; + + // Test all long options for either exact match or abbreviated matches. + for( int i = 0; options[i].code != 0; ++i ) + if( options[i].long_name && + std::strncmp( options[i].long_name, &opt[2], len ) == 0 ) + { + if( std::strlen( options[i].long_name ) == len ) // Exact match found + { index = i; exact = true; break; } + else if( index < 0 ) index = i; // First nonexact match found + else if( options[index].code != options[i].code || + options[index].has_arg != options[i].has_arg ) + ambig = true; // Second or later nonexact match found + } + + if( ambig && !exact ) + { + error_ = "option '"; error_ += opt; error_ += "' is ambiguous"; + return false; + } + + if( index < 0 ) // nothing found + { + error_ = "unrecognized option '"; error_ += opt; error_ += '\''; + return false; + } + + ++argind; + data.push_back( Record( options[index].code, options[index].long_name ) ); + + if( opt[len+2] ) // '--<long_option>=<argument>' syntax + { + if( options[index].has_arg == no ) + { + error_ = "option '--"; error_ += options[index].long_name; + error_ += "' doesn't allow an argument"; + return false; + } + if( options[index].has_arg == yes && !opt[len+3] ) + { + error_ = "option '--"; error_ += options[index].long_name; + error_ += "' requires an argument"; + return false; + } + data.back().argument = &opt[len+3]; + return true; + } + + if( options[index].has_arg == yes ) + { + if( !arg || !arg[0] ) + { + error_ = "option '--"; error_ += options[index].long_name; + error_ += "' requires an argument"; + return false; + } + ++argind; data.back().argument = arg; + return true; + } + + return true; + } + + +bool Arg_parser::parse_short_option( const char * const opt, const char * const arg, + const Option options[], int & argind ) + { + int cind = 1; // character index in opt + + while( cind > 0 ) + { + int index = -1; + const unsigned char c = opt[cind]; + + if( c != 0 ) + for( int i = 0; options[i].code; ++i ) + if( c == options[i].code ) + { index = i; break; } + + if( index < 0 ) + { + error_ = "invalid option -- '"; error_ += c; error_ += '\''; + return false; + } + + data.push_back( Record( c ) ); + if( opt[++cind] == 0 ) { ++argind; cind = 0; } // opt finished + + if( options[index].has_arg != no && cind > 0 && opt[cind] ) + { + data.back().argument = &opt[cind]; ++argind; cind = 0; + } + else if( options[index].has_arg == yes ) + { + if( !arg || !arg[0] ) + { + error_ = "option requires an argument -- '"; error_ += c; + error_ += '\''; + return false; + } + data.back().argument = arg; ++argind; cind = 0; + } + } + return true; + } + + +Arg_parser::Arg_parser( const int argc, const char * const argv[], + const Option options[], const bool in_order ) + { + if( argc < 2 || !argv || !options ) return; + + std::vector< const char * > non_options; // skipped non-options + int argind = 1; // index in argv + + while( argind < argc ) + { + const unsigned char ch1 = argv[argind][0]; + const unsigned char ch2 = ch1 ? argv[argind][1] : 0; + + if( ch1 == '-' && ch2 ) // we found an option + { + const char * const opt = argv[argind]; + const char * const arg = ( argind + 1 < argc ) ? argv[argind+1] : 0; + if( ch2 == '-' ) + { + if( !argv[argind][2] ) { ++argind; break; } // we found "--" + else if( !parse_long_option( opt, arg, options, argind ) ) break; + } + else if( !parse_short_option( opt, arg, options, argind ) ) break; + } + else + { + if( in_order ) data.push_back( Record( argv[argind++] ) ); + else non_options.push_back( argv[argind++] ); + } + } + if( !error_.empty() ) data.clear(); + else + { + for( unsigned i = 0; i < non_options.size(); ++i ) + data.push_back( Record( non_options[i] ) ); + while( argind < argc ) + data.push_back( Record( argv[argind++] ) ); + } + } + + +Arg_parser::Arg_parser( const char * const opt, const char * const arg, + const Option options[] ) + { + if( !opt || !opt[0] || !options ) return; + + if( opt[0] == '-' && opt[1] ) // we found an option + { + int argind = 1; // dummy + if( opt[1] == '-' ) + { if( opt[2] ) parse_long_option( opt, arg, options, argind ); } + else + parse_short_option( opt, arg, options, argind ); + if( !error_.empty() ) data.clear(); + } + else data.push_back( Record( opt ) ); + } diff --git a/arg_parser.h b/arg_parser.h new file mode 100644 index 0000000..1eeec9a --- /dev/null +++ b/arg_parser.h @@ -0,0 +1,110 @@ +/* Arg_parser - POSIX/GNU command-line argument parser. (C++ version) + Copyright (C) 2006-2024 Antonio Diaz Diaz. + + This library is free software. Redistribution and use in source and + binary forms, with or without modification, are permitted provided + that the following conditions are met: + + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions, and the following disclaimer. + + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions, and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +*/ + +/* Arg_parser reads the arguments in 'argv' and creates a number of + option codes, option arguments, and non-option arguments. + + In case of error, 'error' returns a non-empty error message. + + 'options' is an array of 'struct Option' terminated by an element + containing a code which is zero. A null long_name means a short-only + option. A code value outside the unsigned char range means a long-only + option. + + Arg_parser normally makes it appear as if all the option arguments + were specified before all the non-option arguments for the purposes + of parsing, even if the user of your program intermixed option and + non-option arguments. If you want the arguments in the exact order + the user typed them, call 'Arg_parser' with 'in_order' = true. + + The argument '--' terminates all options; any following arguments are + treated as non-option arguments, even if they begin with a hyphen. + + The syntax for optional option arguments is '-<short_option><argument>' + (without whitespace), or '--<long_option>=<argument>'. +*/ + +class Arg_parser + { +public: + enum Has_arg { no, yes, maybe }; + + struct Option + { + int code; // Short option letter or code ( code != 0 ) + const char * long_name; // Long option name (maybe null) + Has_arg has_arg; + }; + +private: + struct Record + { + int code; + std::string parsed_name; + std::string argument; + explicit Record( const unsigned char c ) + : code( c ), parsed_name( "-" ) { parsed_name += c; } + Record( const int c, const char * const long_name ) + : code( c ), parsed_name( "--" ) { parsed_name += long_name; } + explicit Record( const char * const arg ) : code( 0 ), argument( arg ) {} + }; + + const std::string empty_arg; + std::string error_; + std::vector< Record > data; + + bool parse_long_option( const char * const opt, const char * const arg, + const Option options[], int & argind ); + bool parse_short_option( const char * const opt, const char * const arg, + const Option options[], int & argind ); + +public: + Arg_parser( const int argc, const char * const argv[], + const Option options[], const bool in_order = false ); + + // Restricted constructor. Parses a single token and argument (if any). + Arg_parser( const char * const opt, const char * const arg, + const Option options[] ); + + const std::string & error() const { return error_; } + + // The number of arguments parsed. May be different from argc. + int arguments() const { return data.size(); } + + /* If code( i ) is 0, argument( i ) is a non-option. + Else argument( i ) is the option's argument (or empty). */ + int code( const int i ) const + { + if( i >= 0 && i < arguments() ) return data[i].code; + else return 0; + } + + // Full name of the option parsed (short or long). + const std::string & parsed_name( const int i ) const + { + if( i >= 0 && i < arguments() ) return data[i].parsed_name; + else return empty_arg; + } + + const std::string & argument( const int i ) const + { + if( i >= 0 && i < arguments() ) return data[i].argument; + else return empty_arg; + } + }; diff --git a/compress.cc b/compress.cc new file mode 100644 index 0000000..defa58d --- /dev/null +++ b/compress.cc @@ -0,0 +1,558 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009 Laszlo Ersek. + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <algorithm> +#include <cerrno> +#include <climits> +#include <csignal> +#include <cstdio> +#include <cstdlib> +#include <cstring> +#include <string> +#include <vector> +#include <stdint.h> +#include <unistd.h> +#include <lzlib.h> + +#include "lzip.h" + +#ifndef LLONG_MAX +#define LLONG_MAX 0x7FFFFFFFFFFFFFFFLL +#endif + + +/* Return the number of bytes really read. + If (value returned < size) and (errno == 0), means EOF was reached. +*/ +int readblock( const int fd, uint8_t * const buf, const int size ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = read( fd, buf + sz, size - sz ); + if( n > 0 ) sz += n; + else if( n == 0 ) break; // EOF + else if( errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +/* Return the number of bytes really written. + If (value returned < size), it is always an error. +*/ +int writeblock( const int fd, const uint8_t * const buf, const int size ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = write( fd, buf + sz, size - sz ); + if( n > 0 ) sz += n; + else if( n < 0 && errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +void xinit_mutex( pthread_mutex_t * const mutex ) + { + const int errcode = pthread_mutex_init( mutex, 0 ); + if( errcode ) + { show_error( "pthread_mutex_init", errcode ); cleanup_and_fail(); } + } + +void xinit_cond( pthread_cond_t * const cond ) + { + const int errcode = pthread_cond_init( cond, 0 ); + if( errcode ) + { show_error( "pthread_cond_init", errcode ); cleanup_and_fail(); } + } + + +void xdestroy_mutex( pthread_mutex_t * const mutex ) + { + const int errcode = pthread_mutex_destroy( mutex ); + if( errcode ) + { show_error( "pthread_mutex_destroy", errcode ); cleanup_and_fail(); } + } + +void xdestroy_cond( pthread_cond_t * const cond ) + { + const int errcode = pthread_cond_destroy( cond ); + if( errcode ) + { show_error( "pthread_cond_destroy", errcode ); cleanup_and_fail(); } + } + + +void xlock( pthread_mutex_t * const mutex ) + { + const int errcode = pthread_mutex_lock( mutex ); + if( errcode ) + { show_error( "pthread_mutex_lock", errcode ); cleanup_and_fail(); } + } + + +void xunlock( pthread_mutex_t * const mutex ) + { + const int errcode = pthread_mutex_unlock( mutex ); + if( errcode ) + { show_error( "pthread_mutex_unlock", errcode ); cleanup_and_fail(); } + } + + +void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex ) + { + const int errcode = pthread_cond_wait( cond, mutex ); + if( errcode ) + { show_error( "pthread_cond_wait", errcode ); cleanup_and_fail(); } + } + + +void xsignal( pthread_cond_t * const cond ) + { + const int errcode = pthread_cond_signal( cond ); + if( errcode ) + { show_error( "pthread_cond_signal", errcode ); cleanup_and_fail(); } + } + + +void xbroadcast( pthread_cond_t * const cond ) + { + const int errcode = pthread_cond_broadcast( cond ); + if( errcode ) + { show_error( "pthread_cond_broadcast", errcode ); cleanup_and_fail(); } + } + + +namespace { + +unsigned long long in_size = 0; +unsigned long long out_size = 0; +const char * const mem_msg2 = "Not enough memory. Try a smaller dictionary size."; + + +struct Packet // data block with a serial number + { + uint8_t * data; + int size; // number of bytes in data (if any) + unsigned id; // serial number assigned as received + Packet() : data( 0 ), size( 0 ), id( 0 ) {} + void init( uint8_t * const d, const int s, const unsigned i ) + { data = d; size = s; id = i; } + }; + + +class Packet_courier // moves packets around + { +public: + unsigned icheck_counter; + unsigned iwait_counter; + unsigned ocheck_counter; + unsigned owait_counter; +private: + unsigned receive_id; // id assigned to next packet received + unsigned distrib_id; // id of next packet to be distributed + unsigned deliver_id; // id of next packet to be delivered + Slot_tally slot_tally; // limits the number of input packets + std::vector< Packet > circular_ibuffer; + std::vector< const Packet * > circular_obuffer; + int num_working; // number of workers still running + const int num_slots; // max packets in circulation + pthread_mutex_t imutex; + pthread_cond_t iav_or_eof; // input packet available or splitter done + pthread_mutex_t omutex; + pthread_cond_t oav_or_exit; // output packet available or all workers exited + bool eof; // splitter done + + Packet_courier( const Packet_courier & ); // declared as private + void operator=( const Packet_courier & ); // declared as private + +public: + Packet_courier( const int workers, const int slots ) + : icheck_counter( 0 ), iwait_counter( 0 ), + ocheck_counter( 0 ), owait_counter( 0 ), + receive_id( 0 ), distrib_id( 0 ), deliver_id( 0 ), + slot_tally( slots ), circular_ibuffer( slots ), + circular_obuffer( slots, (const Packet *) 0 ), + num_working( workers ), num_slots( slots ), eof( false ) + { + xinit_mutex( &imutex ); xinit_cond( &iav_or_eof ); + xinit_mutex( &omutex ); xinit_cond( &oav_or_exit ); + } + + ~Packet_courier() + { + xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex ); + xdestroy_cond( &iav_or_eof ); xdestroy_mutex( &imutex ); + } + + // fill a packet with data received from splitter + void receive_packet( uint8_t * const data, const int size ) + { + slot_tally.get_slot(); // wait for a free slot + xlock( &imutex ); + circular_ibuffer[receive_id % num_slots].init( data, size, receive_id ); + ++receive_id; + xsignal( &iav_or_eof ); + xunlock( &imutex ); + } + + // distribute a packet to a worker + Packet * distribute_packet() + { + Packet * ipacket = 0; + xlock( &imutex ); + ++icheck_counter; + while( receive_id == distrib_id && !eof ) // no packets to distribute + { + ++iwait_counter; + xwait( &iav_or_eof, &imutex ); + } + if( receive_id != distrib_id ) + { ipacket = &circular_ibuffer[distrib_id % num_slots]; ++distrib_id; } + xunlock( &imutex ); + if( !ipacket ) // EOF + { + xlock( &omutex ); // notify muxer when last worker exits + if( --num_working == 0 ) xsignal( &oav_or_exit ); + xunlock( &omutex ); + } + return ipacket; + } + + // collect a packet from a worker + void collect_packet( const Packet * const opacket ) + { + const int i = opacket->id % num_slots; + xlock( &omutex ); + // id collision shouldn't happen + if( circular_obuffer[i] != 0 ) + internal_error( "id collision in collect_packet." ); + // merge packet into circular buffer + circular_obuffer[i] = opacket; + if( opacket->id == deliver_id ) xsignal( &oav_or_exit ); + xunlock( &omutex ); + } + + // deliver packets to muxer + void deliver_packets( std::vector< const Packet * > & packet_vector ) + { + xlock( &omutex ); + ++ocheck_counter; + int i = deliver_id % num_slots; + while( circular_obuffer[i] == 0 && num_working > 0 ) + { + ++owait_counter; + xwait( &oav_or_exit, &omutex ); + } + packet_vector.clear(); + while( true ) + { + const Packet * const opacket = circular_obuffer[i]; + if( !opacket ) break; + packet_vector.push_back( opacket ); + circular_obuffer[i] = 0; + ++deliver_id; + i = deliver_id % num_slots; + } + xunlock( &omutex ); + } + + void return_empty_packet() // return a slot to the tally + { slot_tally.leave_slot(); } + + void finish( const int workers_spared ) + { + xlock( &imutex ); // splitter has no more packets to send + eof = true; + xbroadcast( &iav_or_eof ); + xunlock( &imutex ); + xlock( &omutex ); // notify muxer if all workers have exited + num_working -= workers_spared; + if( num_working <= 0 ) xsignal( &oav_or_exit ); + xunlock( &omutex ); + } + + bool finished() // all packets delivered to muxer + { + if( !slot_tally.all_free() || !eof || receive_id != distrib_id || + num_working != 0 ) return false; + for( int i = 0; i < num_slots; ++i ) + if( circular_obuffer[i] != 0 ) return false; + return true; + } + }; + + +struct Worker_arg + { + Packet_courier * courier; + const Pretty_print * pp; + int dictionary_size; + int match_len_limit; + int offset; + }; + +struct Splitter_arg + { + struct Worker_arg worker_arg; + pthread_t * worker_threads; + int infd; + int data_size; + int num_workers; // returned by splitter to main thread + }; + + +/* Get packets from courier, replace their contents, and return them to + courier. */ +extern "C" void * cworker( void * arg ) + { + const Worker_arg & tmp = *(const Worker_arg *)arg; + Packet_courier & courier = *tmp.courier; + const Pretty_print & pp = *tmp.pp; + const int dictionary_size = tmp.dictionary_size; + const int match_len_limit = tmp.match_len_limit; + const int offset = tmp.offset; + LZ_Encoder * encoder = 0; + + while( true ) + { + Packet * const packet = courier.distribute_packet(); + if( !packet ) break; // no more packets to process + + if( !encoder ) + { + const bool fast = dictionary_size == 65535 && match_len_limit == 16; + const int dict_size = fast ? dictionary_size : + std::max( std::min( dictionary_size, packet->size ), + LZ_min_dictionary_size() ); + encoder = LZ_compress_open( dict_size, match_len_limit, LLONG_MAX ); + if( !encoder || LZ_compress_errno( encoder ) != LZ_ok ) + { + if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error ) + pp( mem_msg2 ); + else + internal_error( "invalid argument to encoder." ); + cleanup_and_fail(); + } + } + else + if( LZ_compress_restart_member( encoder, LLONG_MAX ) < 0 ) + { pp( "LZ_compress_restart_member failed." ); cleanup_and_fail(); } + + int written = 0; + int new_pos = 0; + while( true ) + { + if( written < packet->size ) + { + const int wr = LZ_compress_write( encoder, + packet->data + offset + written, + packet->size - written ); + if( wr < 0 ) internal_error( "library error (LZ_compress_write)." ); + written += wr; + } + if( written >= packet->size ) LZ_compress_finish( encoder ); + const int rd = LZ_compress_read( encoder, packet->data + new_pos, + offset + written - new_pos ); + if( rd < 0 ) + { + pp(); + if( verbosity >= 0 ) + std::fprintf( stderr, "LZ_compress_read error: %s\n", + LZ_strerror( LZ_compress_errno( encoder ) ) ); + cleanup_and_fail(); + } + new_pos += rd; + if( new_pos >= offset + written ) + internal_error( "packet size exceeded in worker." ); + if( LZ_compress_finished( encoder ) == 1 ) break; + } + + if( packet->size > 0 ) show_progress( packet->size ); + packet->size = new_pos; + courier.collect_packet( packet ); + } + if( encoder && LZ_compress_close( encoder ) < 0 ) + { pp( "LZ_compress_close failed." ); cleanup_and_fail(); } + return 0; + } + + +/* Split data from input file into chunks and pass them to courier for + packaging and distribution to workers. + Start a worker per packet up to a maximum of num_workers. +*/ +extern "C" void * csplitter( void * arg ) + { + Splitter_arg & tmp = *(Splitter_arg *)arg; + Packet_courier & courier = *tmp.worker_arg.courier; + const Pretty_print & pp = *tmp.worker_arg.pp; + pthread_t * const worker_threads = tmp.worker_threads; + const int offset = tmp.worker_arg.offset; + const int infd = tmp.infd; + const int data_size = tmp.data_size; + int i = 0; // number of workers started + + for( bool first_post = true; ; first_post = false ) + { + uint8_t * const data = new( std::nothrow ) uint8_t[offset+data_size]; + if( !data ) { pp( mem_msg2 ); cleanup_and_fail(); } + const int size = readblock( infd, data + offset, data_size ); + if( size != data_size && errno ) + { pp(); show_error( "Read error", errno ); cleanup_and_fail(); } + + if( size > 0 || first_post ) // first packet may be empty + { + in_size += size; + courier.receive_packet( data, size ); + if( i < tmp.num_workers ) // start a new worker + { + const int errcode = + pthread_create( &worker_threads[i++], 0, cworker, &tmp.worker_arg ); + if( errcode ) { show_error( "Can't create worker threads", errcode ); + cleanup_and_fail(); } + } + if( size < data_size ) break; // EOF + } + else + { + delete[] data; + break; + } + } + courier.finish( tmp.num_workers - i ); // no more packets to send + tmp.num_workers = i; + return 0; + } + + +/* Get from courier the processed and sorted packets, and write their + contents to the output file. +*/ +void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd ) + { + std::vector< const Packet * > packet_vector; + while( true ) + { + courier.deliver_packets( packet_vector ); + if( packet_vector.empty() ) break; // all workers exited + + for( unsigned i = 0; i < packet_vector.size(); ++i ) + { + const Packet * const opacket = packet_vector[i]; + out_size += opacket->size; + + if( writeblock( outfd, opacket->data, opacket->size ) != opacket->size ) + { pp(); show_error( "Write error", errno ); cleanup_and_fail(); } + delete[] opacket->data; + courier.return_empty_packet(); + } + } + } + +} // end namespace + + +/* Init the courier, then start the splitter and the workers and call the + muxer. */ +int compress( const unsigned long long cfile_size, + const int data_size, const int dictionary_size, + const int match_len_limit, const int num_workers, + const int infd, const int outfd, + const Pretty_print & pp, const int debug_level ) + { + const int offset = data_size / 8; // offset for compression in-place + const int slots_per_worker = 2; + const int num_slots = + ( ( num_workers > 1 ) ? num_workers * slots_per_worker : 1 ); + in_size = 0; + out_size = 0; + Packet_courier courier( num_workers, num_slots ); + + if( debug_level & 2 ) std::fputs( "compress.\n", stderr ); + + pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; + if( !worker_threads ) { pp( mem_msg ); return 1; } + + Splitter_arg splitter_arg; + splitter_arg.worker_arg.courier = &courier; + splitter_arg.worker_arg.pp = &pp; + splitter_arg.worker_arg.dictionary_size = dictionary_size; + splitter_arg.worker_arg.match_len_limit = match_len_limit; + splitter_arg.worker_arg.offset = offset; + splitter_arg.worker_threads = worker_threads; + splitter_arg.infd = infd; + splitter_arg.data_size = data_size; + splitter_arg.num_workers = num_workers; + + pthread_t splitter_thread; + int errcode = pthread_create( &splitter_thread, 0, csplitter, &splitter_arg ); + if( errcode ) + { show_error( "Can't create splitter thread", errcode ); + delete[] worker_threads; return 1; } + if( verbosity >= 1 ) pp(); + show_progress( 0, cfile_size, &pp ); // init + + muxer( courier, pp, outfd ); + + errcode = pthread_join( splitter_thread, 0 ); + if( errcode ) { show_error( "Can't join splitter thread", errcode ); + cleanup_and_fail(); } + + for( int i = splitter_arg.num_workers; --i >= 0; ) + { // join only the workers started + errcode = pthread_join( worker_threads[i], 0 ); + if( errcode ) { show_error( "Can't join worker threads", errcode ); + cleanup_and_fail(); } + } + delete[] worker_threads; + + if( verbosity >= 1 ) + { + if( in_size == 0 || out_size == 0 ) + std::fputs( " no data compressed.\n", stderr ); + else + std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved, " + "%llu in, %llu out.\n", + (double)in_size / out_size, + ( 100.0 * out_size ) / in_size, + 100.0 - ( ( 100.0 * out_size ) / in_size ), + in_size, out_size ); + } + + if( debug_level & 1 ) + std::fprintf( stderr, + "workers started %8u\n" + "any worker tried to consume from splitter %8u times\n" + "any worker had to wait %8u times\n" + "muxer tried to consume from workers %8u times\n" + "muxer had to wait %8u times\n", + splitter_arg.num_workers, + courier.icheck_counter, courier.iwait_counter, + courier.ocheck_counter, courier.owait_counter ); + + if( !courier.finished() ) internal_error( "courier not finished." ); + return 0; + } diff --git a/configure b/configure new file mode 100755 index 0000000..4e627d5 --- /dev/null +++ b/configure @@ -0,0 +1,210 @@ +#! /bin/sh +# configure script for Plzip - Massively parallel implementation of lzip +# Copyright (C) 2009-2024 Antonio Diaz Diaz. +# +# This configure script is free software: you have unlimited permission +# to copy, distribute, and modify it. + +pkgname=plzip +pkgversion=1.11 +progname=plzip +with_mingw= +srctrigger=doc/${pkgname}.texi + +# clear some things potentially inherited from environment. +LC_ALL=C +export LC_ALL +srcdir= +prefix=/usr/local +exec_prefix='$(prefix)' +bindir='$(exec_prefix)/bin' +datarootdir='$(prefix)/share' +infodir='$(datarootdir)/info' +mandir='$(datarootdir)/man' +CXX=g++ +CPPFLAGS= +CXXFLAGS='-Wall -W -O2' +LDFLAGS= +LIBS='-llz -lpthread' +MAKEINFO=makeinfo + +# checking whether we are using GNU C++. +/bin/sh -c "${CXX} --version" > /dev/null 2>&1 || { CXX=c++ ; CXXFLAGS=-O2 ; } + +# Loop over all args +args= +no_create= +while [ $# != 0 ] ; do + + # Get the first arg, and shuffle + option=$1 ; arg2=no + shift + + # Add the argument quoted to args + if [ -z "${args}" ] ; then args="\"${option}\"" + else args="${args} \"${option}\"" ; fi + + # Split out the argument for options that take them + case ${option} in + *=*) optarg=`echo "${option}" | sed -e 's,^[^=]*=,,;s,/$,,'` ;; + esac + + # Process the options + case ${option} in + --help | -h) + echo "Usage: $0 [OPTION]... [VAR=VALUE]..." + echo + echo "To assign makefile variables (e.g., CXX, CXXFLAGS...), specify them as" + echo "arguments to configure in the form VAR=VALUE." + echo + echo "Options and variables: [defaults in brackets]" + echo " -h, --help display this help and exit" + echo " -V, --version output version information and exit" + echo " --srcdir=DIR find the source code in DIR [. or ..]" + echo " --prefix=DIR install into DIR [${prefix}]" + echo " --exec-prefix=DIR base directory for arch-dependent files [${exec_prefix}]" + echo " --bindir=DIR user executables directory [${bindir}]" + echo " --datarootdir=DIR base directory for doc and data [${datarootdir}]" + echo " --infodir=DIR info files directory [${infodir}]" + echo " --mandir=DIR man pages directory [${mandir}]" + echo " --with-mingw use included pread/pwrite functions missing in MinGW" + echo " CXX=COMPILER C++ compiler to use [${CXX}]" + echo " CPPFLAGS=OPTIONS command-line options for the preprocessor [${CPPFLAGS}]" + echo " CXXFLAGS=OPTIONS command-line options for the C++ compiler [${CXXFLAGS}]" + echo " CXXFLAGS+=OPTIONS append options to the current value of CXXFLAGS" + echo " LDFLAGS=OPTIONS command-line options for the linker [${LDFLAGS}]" + echo " LIBS=OPTIONS libraries to pass to the linker [${LIBS}]" + echo " MAKEINFO=NAME makeinfo program to use [${MAKEINFO}]" + echo + exit 0 ;; + --version | -V) + echo "Configure script for ${pkgname} version ${pkgversion}" + exit 0 ;; + --srcdir) srcdir=$1 ; arg2=yes ;; + --prefix) prefix=$1 ; arg2=yes ;; + --exec-prefix) exec_prefix=$1 ; arg2=yes ;; + --bindir) bindir=$1 ; arg2=yes ;; + --datarootdir) datarootdir=$1 ; arg2=yes ;; + --infodir) infodir=$1 ; arg2=yes ;; + --mandir) mandir=$1 ; arg2=yes ;; + + --srcdir=*) srcdir=${optarg} ;; + --prefix=*) prefix=${optarg} ;; + --exec-prefix=*) exec_prefix=${optarg} ;; + --bindir=*) bindir=${optarg} ;; + --datarootdir=*) datarootdir=${optarg} ;; + --infodir=*) infodir=${optarg} ;; + --mandir=*) mandir=${optarg} ;; + --no-create) no_create=yes ;; + --with-mingw) with_mingw=-DWITH_MINGW ;; + + CXX=*) CXX=${optarg} ;; + CPPFLAGS=*) CPPFLAGS=${optarg} ;; + CXXFLAGS=*) CXXFLAGS=${optarg} ;; + CXXFLAGS+=*) CXXFLAGS="${CXXFLAGS} ${optarg}" ;; + LDFLAGS=*) LDFLAGS=${optarg} ;; + LIBS=*) LIBS="${optarg} ${LIBS}" ;; + MAKEINFO=*) MAKEINFO=${optarg} ;; + + --*) + echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;; + *=* | *-*-*) ;; + *) + echo "configure: unrecognized option: '${option}'" 1>&2 + echo "Try 'configure --help' for more information." 1>&2 + exit 1 ;; + esac + + # Check if the option took a separate argument + if [ "${arg2}" = yes ] ; then + if [ $# != 0 ] ; then args="${args} \"$1\"" ; shift + else echo "configure: Missing argument to '${option}'" 1>&2 + exit 1 + fi + fi +done + +# Find the source code, if location was not specified. +srcdirtext= +if [ -z "${srcdir}" ] ; then + srcdirtext="or . or .." ; srcdir=. + if [ ! -r "${srcdir}/${srctrigger}" ] ; then srcdir=.. ; fi + if [ ! -r "${srcdir}/${srctrigger}" ] ; then + ## the sed command below emulates the dirname command + srcdir=`echo "$0" | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` + fi +fi + +if [ ! -r "${srcdir}/${srctrigger}" ] ; then + echo "configure: Can't find source code in ${srcdir} ${srcdirtext}" 1>&2 + echo "configure: (At least ${srctrigger} is missing)." 1>&2 + exit 1 +fi + +# Set srcdir to . if that's what it is. +if [ "`pwd`" = "`cd "${srcdir}" ; pwd`" ] ; then srcdir=. ; fi + +echo +if [ -z "${no_create}" ] ; then + echo "creating config.status" + rm -f config.status + cat > config.status << EOF +#! /bin/sh +# This file was generated automatically by configure. Don't edit. +# Run this file to recreate the current configuration. +# +# This script is free software: you have unlimited permission +# to copy, distribute, and modify it. + +exec /bin/sh "$0" ${args} --no-create +EOF + chmod +x config.status +fi + +echo "creating Makefile" +if [ -n "${with_mingw}" ] ; then echo "WITH_MINGW = yes" ; fi +echo "VPATH = ${srcdir}" +echo "prefix = ${prefix}" +echo "exec_prefix = ${exec_prefix}" +echo "bindir = ${bindir}" +echo "datarootdir = ${datarootdir}" +echo "infodir = ${infodir}" +echo "mandir = ${mandir}" +echo "CXX = ${CXX}" +echo "CPPFLAGS = ${CPPFLAGS}" +echo "CXXFLAGS = ${CXXFLAGS}" +echo "LDFLAGS = ${LDFLAGS}" +echo "LIBS = ${LIBS}" +echo "MAKEINFO = ${MAKEINFO}" +rm -f Makefile +cat > Makefile << EOF +# Makefile for Plzip - Massively parallel implementation of lzip +# Copyright (C) 2009-2024 Antonio Diaz Diaz. +# This file was generated automatically by configure. Don't edit. +# +# This Makefile is free software: you have unlimited permission +# to copy, distribute, and modify it. + +pkgname = ${pkgname} +pkgversion = ${pkgversion} +progname = ${progname} +with_mingw = ${with_mingw} +VPATH = ${srcdir} +prefix = ${prefix} +exec_prefix = ${exec_prefix} +bindir = ${bindir} +datarootdir = ${datarootdir} +infodir = ${infodir} +mandir = ${mandir} +CXX = ${CXX} +CPPFLAGS = ${CPPFLAGS} +CXXFLAGS = ${CXXFLAGS} +LDFLAGS = ${LDFLAGS} +LIBS = ${LIBS} +MAKEINFO = ${MAKEINFO} +EOF +cat "${srcdir}/Makefile.in" >> Makefile + +echo "OK. Now you can run make." +echo "If make fails, check that the compression library lzlib is correctly installed" +echo "(see INSTALL)." diff --git a/dec_stdout.cc b/dec_stdout.cc new file mode 100644 index 0000000..6ffed07 --- /dev/null +++ b/dec_stdout.cc @@ -0,0 +1,337 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009 Laszlo Ersek. + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <algorithm> +#include <cerrno> +#include <climits> +#include <csignal> +#include <cstdio> +#include <cstdlib> +#include <cstring> +#include <queue> +#include <string> +#include <vector> +#include <stdint.h> +#include <unistd.h> +#include <lzlib.h> + +#include "lzip.h" +#include "lzip_index.h" + + +namespace { + +enum { max_packet_size = 1 << 20 }; + + +struct Packet // data block + { + uint8_t * data; // data may be null if size == 0 + int size; // number of bytes in data (if any) + bool eom; // end of member + Packet() : data( 0 ), size( 0 ), eom( true ) {} + Packet( uint8_t * const d, const int s, const bool e ) + : data( d ), size( s ), eom ( e ) {} + ~Packet() { if( data ) delete[] data; } + }; + + +class Packet_courier // moves packets around + { +public: + unsigned ocheck_counter; + unsigned owait_counter; +private: + int deliver_worker_id; // worker queue currently delivering packets + std::vector< std::queue< const Packet * > > opacket_queues; + int num_working; // number of workers still running + const int num_workers; // number of workers + const unsigned out_slots; // max output packets per queue + pthread_mutex_t omutex; + pthread_cond_t oav_or_exit; // output packet available or all workers exited + std::vector< pthread_cond_t > slot_av; // output slot available + const Shared_retval & shared_retval; // discard new packets on error + + Packet_courier( const Packet_courier & ); // declared as private + void operator=( const Packet_courier & ); // declared as private + +public: + Packet_courier( const Shared_retval & sh_ret, const int workers, + const int slots ) + : ocheck_counter( 0 ), owait_counter( 0 ), deliver_worker_id( 0 ), + opacket_queues( workers ), num_working( workers ), + num_workers( workers ), out_slots( slots ), slot_av( workers ), + shared_retval( sh_ret ) + { + xinit_mutex( &omutex ); xinit_cond( &oav_or_exit ); + for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] ); + } + + ~Packet_courier() + { + if( shared_retval() ) // cleanup to avoid memory leaks + for( int i = 0; i < num_workers; ++i ) + while( !opacket_queues[i].empty() ) + { delete opacket_queues[i].front(); opacket_queues[i].pop(); } + for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] ); + xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex ); + } + + void worker_finished() + { + // notify muxer when last worker exits + xlock( &omutex ); + if( --num_working == 0 ) xsignal( &oav_or_exit ); + xunlock( &omutex ); + } + + // collect a packet from a worker, discard packet on error + void collect_packet( const Packet * const opacket, const int worker_id ) + { + xlock( &omutex ); + if( opacket->data ) + while( opacket_queues[worker_id].size() >= out_slots ) + { + if( shared_retval() ) { delete opacket; goto done; } + xwait( &slot_av[worker_id], &omutex ); + } + opacket_queues[worker_id].push( opacket ); + if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit ); +done: + xunlock( &omutex ); + } + + /* deliver a packet to muxer + if packet->eom, move to next queue + if packet data == 0, wait again */ + const Packet * deliver_packet() + { + const Packet * opacket = 0; + xlock( &omutex ); + ++ocheck_counter; + while( true ) + { + while( opacket_queues[deliver_worker_id].empty() && num_working > 0 ) + { + ++owait_counter; + xwait( &oav_or_exit, &omutex ); + } + if( opacket_queues[deliver_worker_id].empty() ) break; + opacket = opacket_queues[deliver_worker_id].front(); + opacket_queues[deliver_worker_id].pop(); + if( opacket_queues[deliver_worker_id].size() + 1 == out_slots ) + xsignal( &slot_av[deliver_worker_id] ); + if( opacket->eom && ++deliver_worker_id >= num_workers ) + deliver_worker_id = 0; + if( opacket->data ) break; + delete opacket; opacket = 0; + } + xunlock( &omutex ); + return opacket; + } + + bool finished() // all packets delivered to muxer + { + if( num_working != 0 ) return false; + for( int i = 0; i < num_workers; ++i ) + if( !opacket_queues[i].empty() ) return false; + return true; + } + }; + + +struct Worker_arg + { + const Lzip_index * lzip_index; + Packet_courier * courier; + const Pretty_print * pp; + Shared_retval * shared_retval; + int worker_id; + int num_workers; + int infd; + }; + + +/* Read members from file, decompress their contents, and give to courier + the packets produced. +*/ +extern "C" void * dworker_o( void * arg ) + { + const Worker_arg & tmp = *(const Worker_arg *)arg; + const Lzip_index & lzip_index = *tmp.lzip_index; + Packet_courier & courier = *tmp.courier; + const Pretty_print & pp = *tmp.pp; + Shared_retval & shared_retval = *tmp.shared_retval; + const int worker_id = tmp.worker_id; + const int num_workers = tmp.num_workers; + const int infd = tmp.infd; + const int buffer_size = 65536; + + int new_pos = 0; + uint8_t * new_data = 0; + uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size]; + LZ_Decoder * const decoder = LZ_decompress_open(); + if( !ibuffer || !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; } + + for( long i = worker_id; i < lzip_index.members(); i += num_workers ) + { + long long member_pos = lzip_index.mblock( i ).pos(); + long long member_rest = lzip_index.mblock( i ).size(); + + while( member_rest > 0 ) + { + if( shared_retval() ) goto done; // other worker found a problem + while( LZ_decompress_write_size( decoder ) > 0 ) + { + const int size = std::min( LZ_decompress_write_size( decoder ), + (int)std::min( (long long)buffer_size, member_rest ) ); + if( size > 0 ) + { + if( preadblock( infd, ibuffer, size, member_pos ) != size ) + { if( shared_retval.set_value( 1 ) ) + { pp(); show_error( "Read error", errno ); } goto done; } + member_pos += size; + member_rest -= size; + if( LZ_decompress_write( decoder, ibuffer, size ) != size ) + internal_error( "library error (LZ_decompress_write)." ); + } + if( member_rest <= 0 ) { LZ_decompress_finish( decoder ); break; } + } + while( true ) // read and pack decompressed data + { + if( !new_data && + !( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) ) + { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; } + const int rd = LZ_decompress_read( decoder, new_data + new_pos, + max_packet_size - new_pos ); + if( rd < 0 ) + { decompress_error( decoder, pp, shared_retval, worker_id ); + goto done; } + new_pos += rd; + if( new_pos > max_packet_size ) + internal_error( "opacket size exceeded in worker." ); + const bool eom = LZ_decompress_finished( decoder ) == 1; + if( new_pos == max_packet_size || eom ) // make data packet + { + const Packet * const opacket = + new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom ); + courier.collect_packet( opacket, worker_id ); + if( new_pos > 0 ) { new_pos = 0; new_data = 0; } + if( eom ) + { LZ_decompress_reset( decoder ); // prepare for new member + break; } + } + if( rd == 0 ) break; + } + } + show_progress( lzip_index.mblock( i ).size() ); + } +done: + delete[] ibuffer; if( new_data ) delete[] new_data; + if( LZ_decompress_member_position( decoder ) != 0 && + shared_retval.set_value( 1 ) ) + pp( "Error, some data remains in decoder." ); + if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) ) + pp( "LZ_decompress_close failed." ); + courier.worker_finished(); + return 0; + } + + +/* Get from courier the processed and sorted packets, and write their + contents to the output file. Drain queue on error. +*/ +void muxer( Packet_courier & courier, const Pretty_print & pp, + Shared_retval & shared_retval, const int outfd ) + { + while( true ) + { + const Packet * const opacket = courier.deliver_packet(); + if( !opacket ) break; // queue is empty. all workers exited + + if( shared_retval() == 0 && + writeblock( outfd, opacket->data, opacket->size ) != opacket->size && + shared_retval.set_value( 1 ) ) + { pp(); show_error( "Write error", errno ); } + delete opacket; + } + } + +} // end namespace + + +// init the courier, then start the workers and call the muxer. +int dec_stdout( const int num_workers, const int infd, const int outfd, + const Pretty_print & pp, const int debug_level, + const int out_slots, const Lzip_index & lzip_index ) + { + Shared_retval shared_retval; + Packet_courier courier( shared_retval, num_workers, out_slots ); + + Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; + pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; + if( !worker_args || !worker_threads ) + { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; } + + int i = 0; // number of workers started + for( ; i < num_workers; ++i ) + { + worker_args[i].lzip_index = &lzip_index; + worker_args[i].courier = &courier; + worker_args[i].pp = &pp; + worker_args[i].shared_retval = &shared_retval; + worker_args[i].worker_id = i; + worker_args[i].num_workers = num_workers; + worker_args[i].infd = infd; + const int errcode = + pthread_create( &worker_threads[i], 0, dworker_o, &worker_args[i] ); + if( errcode ) + { if( shared_retval.set_value( 1 ) ) + { show_error( "Can't create worker threads", errcode ); } break; } + } + + muxer( courier, pp, shared_retval, outfd ); + + while( --i >= 0 ) + { + const int errcode = pthread_join( worker_threads[i], 0 ); + if( errcode && shared_retval.set_value( 1 ) ) + show_error( "Can't join worker threads", errcode ); + } + delete[] worker_threads; + delete[] worker_args; + + if( shared_retval() ) return shared_retval(); // some thread found a problem + + if( verbosity >= 1 ) + show_results( lzip_index.cdata_size(), lzip_index.udata_size(), + lzip_index.dictionary_size(), false ); + + if( debug_level & 1 ) + std::fprintf( stderr, + "workers started %8u\n" + "muxer tried to consume from workers %8u times\n" + "muxer had to wait %8u times\n", + num_workers, courier.ocheck_counter, courier.owait_counter ); + + if( !courier.finished() ) internal_error( "courier not finished." ); + return 0; + } diff --git a/dec_stream.cc b/dec_stream.cc new file mode 100644 index 0000000..6ea4ed7 --- /dev/null +++ b/dec_stream.cc @@ -0,0 +1,650 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009 Laszlo Ersek. + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <algorithm> +#include <cerrno> +#include <climits> +#include <csignal> +#include <cstdio> +#include <cstdlib> +#include <cstring> +#include <queue> +#include <string> +#include <vector> +#include <stdint.h> +#include <unistd.h> +#include <lzlib.h> + +#include "lzip.h" + +/* When a problem is detected by any thread: + - the thread sets shared_retval to 1 or 2. + - the splitter sets eof and returns. + - the courier discards new packets received or collected. + - the workers drain the queue and return. + - the muxer drains the queue and returns. + (Draining seems to be faster than cleaning up later). */ + +namespace { + +enum { max_packet_size = 1 << 20 }; +unsigned long long in_size = 0; +unsigned long long out_size = 0; + + +struct Packet // data block + { + uint8_t * data; // data may be null if size == 0 + int size; // number of bytes in data (if any) + bool eom; // end of member + Packet() : data( 0 ), size( 0 ), eom( true ) {} + Packet( uint8_t * const d, const int s, const bool e ) + : data( d ), size( s ), eom ( e ) {} + ~Packet() { if( data ) delete[] data; } + }; + + +class Packet_courier // moves packets around + { +public: + unsigned icheck_counter; + unsigned iwait_counter; + unsigned ocheck_counter; + unsigned owait_counter; +private: + int receive_worker_id; // worker queue currently receiving packets + int deliver_worker_id; // worker queue currently delivering packets + Slot_tally slot_tally; // limits the number of input packets + std::vector< std::queue< const Packet * > > ipacket_queues; + std::vector< std::queue< const Packet * > > opacket_queues; + int num_working; // number of workers still running + const int num_workers; // number of workers + const unsigned out_slots; // max output packets per queue + pthread_mutex_t imutex; + pthread_cond_t iav_or_eof; // input packet available or splitter done + pthread_mutex_t omutex; + pthread_cond_t oav_or_exit; // output packet available or all workers exited + std::vector< pthread_cond_t > slot_av; // output slot available + const Shared_retval & shared_retval; // discard new packets on error + bool eof; // splitter done + bool trailing_data_found_; // a worker found trailing data + + Packet_courier( const Packet_courier & ); // declared as private + void operator=( const Packet_courier & ); // declared as private + +public: + Packet_courier( const Shared_retval & sh_ret, const int workers, + const int in_slots, const int oslots ) + : icheck_counter( 0 ), iwait_counter( 0 ), + ocheck_counter( 0 ), owait_counter( 0 ), + receive_worker_id( 0 ), deliver_worker_id( 0 ), + slot_tally( in_slots ), ipacket_queues( workers ), + opacket_queues( workers ), num_working( workers ), + num_workers( workers ), out_slots( oslots ), slot_av( workers ), + shared_retval( sh_ret ), eof( false ), trailing_data_found_( false ) + { + xinit_mutex( &imutex ); xinit_cond( &iav_or_eof ); + xinit_mutex( &omutex ); xinit_cond( &oav_or_exit ); + for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] ); + } + + ~Packet_courier() + { + if( shared_retval() ) // cleanup to avoid memory leaks + for( int i = 0; i < num_workers; ++i ) + { + while( !ipacket_queues[i].empty() ) + { delete ipacket_queues[i].front(); ipacket_queues[i].pop(); } + while( !opacket_queues[i].empty() ) + { delete opacket_queues[i].front(); opacket_queues[i].pop(); } + } + for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] ); + xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex ); + xdestroy_cond( &iav_or_eof ); xdestroy_mutex( &imutex ); + } + + /* Make a packet with data received from splitter. + If eom == true (end of member), move to next queue. */ + void receive_packet( uint8_t * const data, const int size, const bool eom ) + { + if( shared_retval() ) { delete[] data; return; } // discard packet on error + const Packet * const ipacket = new Packet( data, size, eom ); + slot_tally.get_slot(); // wait for a free slot + xlock( &imutex ); + ipacket_queues[receive_worker_id].push( ipacket ); + xbroadcast( &iav_or_eof ); + xunlock( &imutex ); + if( eom && ++receive_worker_id >= num_workers ) receive_worker_id = 0; + } + + // distribute a packet to a worker + const Packet * distribute_packet( const int worker_id ) + { + const Packet * ipacket = 0; + xlock( &imutex ); + ++icheck_counter; + while( ipacket_queues[worker_id].empty() && !eof ) + { + ++iwait_counter; + xwait( &iav_or_eof, &imutex ); + } + if( !ipacket_queues[worker_id].empty() ) + { + ipacket = ipacket_queues[worker_id].front(); + ipacket_queues[worker_id].pop(); + } + xunlock( &imutex ); + if( ipacket ) slot_tally.leave_slot(); + else // no more packets + { + xlock( &omutex ); // notify muxer when last worker exits + if( --num_working == 0 ) xsignal( &oav_or_exit ); + xunlock( &omutex ); + } + return ipacket; + } + + // collect a packet from a worker, discard packet on error + void collect_packet( const Packet * const opacket, const int worker_id ) + { + xlock( &omutex ); + if( opacket->data ) + while( opacket_queues[worker_id].size() >= out_slots ) + { + if( shared_retval() ) { delete opacket; goto done; } + xwait( &slot_av[worker_id], &omutex ); + } + opacket_queues[worker_id].push( opacket ); + if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit ); +done: + xunlock( &omutex ); + } + + /* deliver a packet to muxer + if packet->eom, move to next queue + if packet data == 0, wait again */ + const Packet * deliver_packet() + { + const Packet * opacket = 0; + xlock( &omutex ); + ++ocheck_counter; + while( true ) + { + while( opacket_queues[deliver_worker_id].empty() && num_working > 0 ) + { + ++owait_counter; + xwait( &oav_or_exit, &omutex ); + } + if( opacket_queues[deliver_worker_id].empty() ) break; + opacket = opacket_queues[deliver_worker_id].front(); + opacket_queues[deliver_worker_id].pop(); + if( opacket_queues[deliver_worker_id].size() + 1 == out_slots ) + xsignal( &slot_av[deliver_worker_id] ); + if( opacket->eom && ++deliver_worker_id >= num_workers ) + deliver_worker_id = 0; + if( opacket->data ) break; + delete opacket; opacket = 0; + } + xunlock( &omutex ); + return opacket; + } + + void add_sizes( const unsigned long long partial_in_size, + const unsigned long long partial_out_size ) + { + xlock( &imutex ); + in_size += partial_in_size; + out_size += partial_out_size; + xunlock( &imutex ); + } + + void set_trailing_flag() { trailing_data_found_ = true; } + bool trailing_data_found() { return trailing_data_found_; } + + void finish( const int workers_started ) + { + xlock( &imutex ); // splitter has no more packets to send + eof = true; + xbroadcast( &iav_or_eof ); + xunlock( &imutex ); + xlock( &omutex ); // notify muxer if all workers have exited + num_working -= num_workers - workers_started; // workers spared + if( num_working <= 0 ) xsignal( &oav_or_exit ); + xunlock( &omutex ); + } + + bool finished() // all packets delivered to muxer + { + if( !slot_tally.all_free() || !eof || num_working != 0 ) return false; + for( int i = 0; i < num_workers; ++i ) + if( !ipacket_queues[i].empty() ) return false; + for( int i = 0; i < num_workers; ++i ) + if( !opacket_queues[i].empty() ) return false; + return true; + } + }; + + +struct Worker_arg + { + Packet_courier * courier; + const Pretty_print * pp; + Shared_retval * shared_retval; + int worker_id; + bool ignore_trailing; + bool loose_trailing; + bool testing; + bool nocopy; // avoid copying decompressed data when testing + }; + +struct Splitter_arg + { + struct Worker_arg worker_arg; + Worker_arg * worker_args; + pthread_t * worker_threads; + unsigned long long cfile_size; + int infd; + unsigned dictionary_size; // returned by splitter to main thread + int num_workers; // returned by splitter to main thread + }; + + +/* Consume packets from courier, decompress their contents and, if not + testing, give to courier the packets produced. +*/ +extern "C" void * dworker_s( void * arg ) + { + const Worker_arg & tmp = *(const Worker_arg *)arg; + Packet_courier & courier = *tmp.courier; + const Pretty_print & pp = *tmp.pp; + Shared_retval & shared_retval = *tmp.shared_retval; + const int worker_id = tmp.worker_id; + const bool ignore_trailing = tmp.ignore_trailing; + const bool loose_trailing = tmp.loose_trailing; + const bool testing = tmp.testing; + const bool nocopy = tmp.nocopy; + + unsigned long long partial_in_size = 0, partial_out_size = 0; + int new_pos = 0; + bool draining = false; // either trailing data or an error were found + uint8_t * new_data = 0; + LZ_Decoder * const decoder = LZ_decompress_open(); + if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) + { draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg ); } + + while( true ) + { + const Packet * const ipacket = courier.distribute_packet( worker_id ); + if( !ipacket ) break; // no more packets to process + + int written = 0; + while( !draining ) // else discard trailing data or drain queue + { + if( LZ_decompress_write_size( decoder ) > 0 && written < ipacket->size ) + { + const int wr = LZ_decompress_write( decoder, ipacket->data + written, + ipacket->size - written ); + if( wr < 0 ) internal_error( "library error (LZ_decompress_write)." ); + written += wr; + if( written > ipacket->size ) + internal_error( "ipacket size exceeded in worker." ); + } + if( ipacket->eom && written == ipacket->size ) + LZ_decompress_finish( decoder ); + unsigned long long total_in = 0; // detect empty member + corrupt header + while( !draining ) // read and pack decompressed data + { + if( !nocopy && !new_data && + !( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) ) + { draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg ); + break; } + const int rd = LZ_decompress_read( decoder, + nocopy ? 0 : new_data + new_pos, + max_packet_size - new_pos ); + if( rd < 0 ) // trailing data or decoder error + { + draining = true; + const enum LZ_Errno lz_errno = LZ_decompress_errno( decoder ); + if( lz_errno == LZ_header_error ) + { + courier.set_trailing_flag(); + if( !ignore_trailing ) + { if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); } + } + else if( lz_errno == LZ_data_error && + LZ_decompress_member_position( decoder ) == 0 ) + { + courier.set_trailing_flag(); + if( !loose_trailing ) + { if( shared_retval.set_value( 2 ) ) pp( corrupt_mm_msg ); } + else if( !ignore_trailing ) + { if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); } + } + else + decompress_error( decoder, pp, shared_retval, worker_id ); + } + else new_pos += rd; + if( new_pos > max_packet_size ) + internal_error( "opacket size exceeded in worker." ); + if( LZ_decompress_member_finished( decoder ) == 1 ) + { + partial_in_size += LZ_decompress_member_position( decoder ); + partial_out_size += LZ_decompress_data_position( decoder ); + } + const bool eom = draining || LZ_decompress_finished( decoder ) == 1; + if( new_pos == max_packet_size || eom ) + { + if( !testing ) // make data packet + { + const Packet * const opacket = + new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom ); + courier.collect_packet( opacket, worker_id ); + if( new_pos > 0 ) new_data = 0; + } + new_pos = 0; + if( eom ) + { LZ_decompress_reset( decoder ); // prepare for new member + break; } + } + if( rd == 0 ) + { + const unsigned long long size = LZ_decompress_total_in_size( decoder ); + if( total_in == size ) break; else total_in = size; + } + } + if( !ipacket->data || written == ipacket->size ) break; + } + delete ipacket; + } + + if( new_data ) delete[] new_data; + courier.add_sizes( partial_in_size, partial_out_size ); + if( LZ_decompress_member_position( decoder ) != 0 && + shared_retval.set_value( 1 ) ) + pp( "Error, some data remains in decoder." ); + if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) ) + pp( "LZ_decompress_close failed." ); + return 0; + } + + +bool start_worker( const Worker_arg & worker_arg, + Worker_arg * const worker_args, + pthread_t * const worker_threads, const int worker_id, + Shared_retval & shared_retval ) + { + worker_args[worker_id] = worker_arg; + worker_args[worker_id].worker_id = worker_id; + const int errcode = pthread_create( &worker_threads[worker_id], 0, + dworker_s, &worker_args[worker_id] ); + if( errcode && shared_retval.set_value( 1 ) ) + show_error( "Can't create worker threads", errcode ); + return errcode == 0; + } + + +/* Split data from input file into chunks and pass them to courier for + packaging and distribution to workers. + Start a worker per member up to a maximum of num_workers. +*/ +extern "C" void * dsplitter_s( void * arg ) + { + Splitter_arg & tmp = *(Splitter_arg *)arg; + const Worker_arg & worker_arg = tmp.worker_arg; + Packet_courier & courier = *worker_arg.courier; + const Pretty_print & pp = *worker_arg.pp; + Shared_retval & shared_retval = *worker_arg.shared_retval; + Worker_arg * const worker_args = tmp.worker_args; + pthread_t * const worker_threads = tmp.worker_threads; + const int infd = tmp.infd; + int worker_id = 0; // number of workers started + const int hsize = Lzip_header::size; + const int tsize = Lzip_trailer::size; + const int buffer_size = max_packet_size; + // buffer with room for trailer, header, data, and sentinel "LZIP" + const int base_buffer_size = tsize + hsize + buffer_size + 4; + uint8_t * const base_buffer = new( std::nothrow ) uint8_t[base_buffer_size]; + if( !base_buffer ) + { +mem_fail: + if( shared_retval.set_value( 1 ) ) pp( mem_msg ); +fail: + delete[] base_buffer; + courier.finish( worker_id ); // no more packets to send + tmp.num_workers = worker_id; + return 0; + } + uint8_t * const buffer = base_buffer + tsize; + + int size = readblock( infd, buffer, buffer_size + hsize ) - hsize; + bool at_stream_end = ( size < buffer_size ); + if( size != buffer_size && errno ) + { if( shared_retval.set_value( 1 ) ) + { pp(); show_error( "Read error", errno ); } goto fail; } + if( size + hsize < min_member_size ) + { if( shared_retval.set_value( 2 ) ) show_file_error( pp.name(), + ( size <= 0 ) ? "File ends unexpectedly at member header." : + "Input file is too short." ); goto fail; } + const Lzip_header & header = *(const Lzip_header *)buffer; + if( !header.check_magic() ) + { if( shared_retval.set_value( 2 ) ) + { show_file_error( pp.name(), bad_magic_msg ); } goto fail; } + if( !header.check_version() ) + { if( shared_retval.set_value( 2 ) ) + { pp( bad_version( header.version() ) ); } goto fail; } + tmp.dictionary_size = header.dictionary_size(); + if( !isvalid_ds( tmp.dictionary_size ) ) + { if( shared_retval.set_value( 2 ) ) { pp( bad_dict_msg ); } goto fail; } + if( verbosity >= 1 ) pp(); + show_progress( 0, tmp.cfile_size, &pp ); // init + + unsigned long long partial_member_size = 0; + bool worker_pending = true; // start 1 worker per first packet of member + while( true ) + { + if( shared_retval() ) break; // stop sending packets on error + int pos = 0; // current searching position + std::memcpy( buffer + hsize + size, lzip_magic, 4 ); // sentinel + for( int newpos = 1; newpos <= size; ++newpos ) + { + while( buffer[newpos] != lzip_magic[0] || + buffer[newpos+1] != lzip_magic[1] || + buffer[newpos+2] != lzip_magic[2] || + buffer[newpos+3] != lzip_magic[3] ) ++newpos; + if( newpos <= size ) + { + const Lzip_trailer & trailer = + *(const Lzip_trailer *)(buffer + newpos - tsize); + const unsigned long long member_size = trailer.member_size(); + if( partial_member_size + newpos - pos == member_size && + trailer.check_consistency() ) + { // header found + const Lzip_header & header = *(const Lzip_header *)(buffer + newpos); + if( !header.check_version() ) + { if( shared_retval.set_value( 2 ) ) + { pp( bad_version( header.version() ) ); } goto fail; } + const unsigned dictionary_size = header.dictionary_size(); + if( !isvalid_ds( dictionary_size ) ) + { if( shared_retval.set_value( 2 ) ) pp( bad_dict_msg ); + goto fail; } + if( tmp.dictionary_size < dictionary_size ) + tmp.dictionary_size = dictionary_size; + uint8_t * const data = new( std::nothrow ) uint8_t[newpos - pos]; + if( !data ) goto mem_fail; + std::memcpy( data, buffer + pos, newpos - pos ); + courier.receive_packet( data, newpos - pos, true ); // eom + partial_member_size = 0; + pos = newpos; + if( worker_pending ) + { if( !start_worker( worker_arg, worker_args, worker_threads, + worker_id, shared_retval ) ) goto fail; + ++worker_id; } + worker_pending = worker_id < tmp.num_workers; + show_progress( member_size ); + } + } + } + + if( at_stream_end ) + { + uint8_t * data = new( std::nothrow ) uint8_t[size + hsize - pos]; + if( !data ) goto mem_fail; + std::memcpy( data, buffer + pos, size + hsize - pos ); + courier.receive_packet( data, size + hsize - pos, true ); // eom + if( worker_pending && + start_worker( worker_arg, worker_args, worker_threads, + worker_id, shared_retval ) ) ++worker_id; + break; + } + if( pos < buffer_size ) + { + partial_member_size += buffer_size - pos; + uint8_t * data = new( std::nothrow ) uint8_t[buffer_size - pos]; + if( !data ) goto mem_fail; + std::memcpy( data, buffer + pos, buffer_size - pos ); + courier.receive_packet( data, buffer_size - pos, false ); + if( worker_pending ) + { if( !start_worker( worker_arg, worker_args, worker_threads, + worker_id, shared_retval ) ) break; + ++worker_id; worker_pending = false; } + } + if( courier.trailing_data_found() ) break; + std::memcpy( base_buffer, base_buffer + buffer_size, tsize + hsize ); + size = readblock( infd, buffer + hsize, buffer_size ); + at_stream_end = ( size < buffer_size ); + if( size != buffer_size && errno ) + { if( shared_retval.set_value( 1 ) ) + { pp(); show_error( "Read error", errno ); } break; } + } + delete[] base_buffer; + courier.finish( worker_id ); // no more packets to send + tmp.num_workers = worker_id; + return 0; + } + + +/* Get from courier the processed and sorted packets, and write their + contents to the output file. Drain queue on error. +*/ +void muxer( Packet_courier & courier, const Pretty_print & pp, + Shared_retval & shared_retval, const int outfd ) + { + while( true ) + { + const Packet * const opacket = courier.deliver_packet(); + if( !opacket ) break; // queue is empty. all workers exited + + if( shared_retval() == 0 && + writeblock( outfd, opacket->data, opacket->size ) != opacket->size && + shared_retval.set_value( 1 ) ) + { pp(); show_error( "Write error", errno ); } + delete opacket; + } + } + +} // end namespace + + +/* Init the courier, then start the splitter and the workers and, if not + testing, call the muxer. +*/ +int dec_stream( const unsigned long long cfile_size, const int num_workers, + const int infd, const int outfd, const Cl_options & cl_opts, + const Pretty_print & pp, const int debug_level, + const int in_slots, const int out_slots ) + { + const int total_in_slots = ( INT_MAX / num_workers >= in_slots ) ? + num_workers * in_slots : INT_MAX; + in_size = 0; + out_size = 0; + Shared_retval shared_retval; + Packet_courier courier( shared_retval, num_workers, total_in_slots, out_slots ); + + if( debug_level & 2 ) std::fputs( "decompress stream.\n", stderr ); + + Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; + pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; + if( !worker_args || !worker_threads ) + { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; } + +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 ); +#else + const bool nocopy = false; +#endif + + Splitter_arg splitter_arg; + splitter_arg.worker_arg.courier = &courier; + splitter_arg.worker_arg.pp = &pp; + splitter_arg.worker_arg.shared_retval = &shared_retval; + splitter_arg.worker_arg.worker_id = 0; + splitter_arg.worker_arg.ignore_trailing = cl_opts.ignore_trailing; + splitter_arg.worker_arg.loose_trailing = cl_opts.loose_trailing; + splitter_arg.worker_arg.testing = ( outfd < 0 ); + splitter_arg.worker_arg.nocopy = nocopy; + splitter_arg.worker_args = worker_args; + splitter_arg.worker_threads = worker_threads; + splitter_arg.cfile_size = cfile_size; + splitter_arg.infd = infd; + splitter_arg.num_workers = num_workers; + + pthread_t splitter_thread; + int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg ); + if( errcode ) + { show_error( "Can't create splitter thread", errcode ); + delete[] worker_threads; delete[] worker_args; return 1; } + + if( outfd >= 0 ) muxer( courier, pp, shared_retval, outfd ); + + errcode = pthread_join( splitter_thread, 0 ); + if( errcode && shared_retval.set_value( 1 ) ) + show_error( "Can't join splitter thread", errcode ); + + for( int i = splitter_arg.num_workers; --i >= 0; ) + { // join only the workers started + errcode = pthread_join( worker_threads[i], 0 ); + if( errcode && shared_retval.set_value( 1 ) ) + show_error( "Can't join worker threads", errcode ); + } + delete[] worker_threads; + delete[] worker_args; + + if( shared_retval() ) return shared_retval(); // some thread found a problem + + show_results( in_size, out_size, splitter_arg.dictionary_size, outfd < 0 ); + + if( debug_level & 1 ) + { + std::fprintf( stderr, + "workers started %8u\n" + "any worker tried to consume from splitter %8u times\n" + "any worker had to wait %8u times\n", + splitter_arg.num_workers, + courier.icheck_counter, courier.iwait_counter ); + if( outfd >= 0 ) + std::fprintf( stderr, + "muxer tried to consume from workers %8u times\n" + "muxer had to wait %8u times\n", + courier.ocheck_counter, courier.owait_counter ); + } + + if( !courier.finished() ) internal_error( "courier not finished." ); + return 0; + } diff --git a/decompress.cc b/decompress.cc new file mode 100644 index 0000000..5b0e68f --- /dev/null +++ b/decompress.cc @@ -0,0 +1,363 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009 Laszlo Ersek. + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <algorithm> +#include <cerrno> +#include <climits> +#include <csignal> +#include <cstdio> +#include <cstdlib> +#include <cstring> +#include <string> +#include <vector> +#include <stdint.h> +#include <unistd.h> +#include <sys/stat.h> +#include <lzlib.h> + +#include "lzip.h" +#include "lzip_index.h" + + +/* This code is based on a patch by Hannes Domani, <ssbssa@yahoo.de> to make + possible compiling plzip under MS Windows (with MINGW compiler). +*/ +#if defined __MSVCRT__ && defined WITH_MINGW +#include <windows.h> +#warning "Parallel I/O is not guaranteed to work on Windows." + +ssize_t pread( int fd, void *buf, size_t count, uint64_t offset ) + { + OVERLAPPED o = {0,0,0,0,0}; + HANDLE fh = (HANDLE)_get_osfhandle(fd); + DWORD bytes; + BOOL ret; + + if( fh == INVALID_HANDLE_VALUE ) { errno = EBADF; return -1; } + o.Offset = offset & 0xffffffff; + o.OffsetHigh = (offset >> 32) & 0xffffffff; + ret = ReadFile( fh, buf, (DWORD)count, &bytes, &o ); + if( !ret ) { errno = EIO; return -1; } + return (ssize_t)bytes; + } + +ssize_t pwrite( int fd, const void *buf, size_t count, uint64_t offset ) + { + OVERLAPPED o = {0,0,0,0,0}; + HANDLE fh = (HANDLE)_get_osfhandle(fd); + DWORD bytes; + BOOL ret; + + if( fh == INVALID_HANDLE_VALUE ) { errno = EBADF; return -1; } + o.Offset = offset & 0xffffffff; + o.OffsetHigh = (offset >> 32) & 0xffffffff; + ret = WriteFile(fh, buf, (DWORD)count, &bytes, &o); + if( !ret ) { errno = EIO; return -1; } + return (ssize_t)bytes; + } + +#endif // __MSVCRT__ + + +/* Return the number of bytes really read. + If (value returned < size) and (errno == 0), means EOF was reached. +*/ +int preadblock( const int fd, uint8_t * const buf, const int size, + const long long pos ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = pread( fd, buf + sz, size - sz, pos + sz ); + if( n > 0 ) sz += n; + else if( n == 0 ) break; // EOF + else if( errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +/* Return the number of bytes really written. + If (value returned < size), it is always an error. +*/ +int pwriteblock( const int fd, const uint8_t * const buf, const int size, + const long long pos ) + { + int sz = 0; + errno = 0; + while( sz < size ) + { + const int n = pwrite( fd, buf + sz, size - sz, pos + sz ); + if( n > 0 ) sz += n; + else if( n < 0 && errno != EINTR ) break; + errno = 0; + } + return sz; + } + + +void decompress_error( struct LZ_Decoder * const decoder, + const Pretty_print & pp, + Shared_retval & shared_retval, const int worker_id ) + { + const LZ_Errno errcode = LZ_decompress_errno( decoder ); + const int retval = ( errcode == LZ_header_error || errcode == LZ_data_error || + errcode == LZ_unexpected_eof ) ? 2 : 1; + if( !shared_retval.set_value( retval ) ) return; + pp(); + if( verbosity >= 0 ) + std::fprintf( stderr, "%s in worker %d\n", LZ_strerror( errcode ), + worker_id ); + } + + +void show_results( const unsigned long long in_size, + const unsigned long long out_size, + const unsigned dictionary_size, const bool testing ) + { + if( verbosity >= 2 ) + { + if( verbosity >= 4 ) show_header( dictionary_size ); + if( out_size == 0 || in_size == 0 ) + std::fputs( "no data compressed. ", stderr ); + else + std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ", + (double)out_size / in_size, + ( 100.0 * in_size ) / out_size, + 100.0 - ( ( 100.0 * in_size ) / out_size ) ); + if( verbosity >= 3 ) + std::fprintf( stderr, "%9llu out, %8llu in. ", out_size, in_size ); + } + if( verbosity >= 1 ) std::fputs( testing ? "ok\n" : "done\n", stderr ); + } + + +namespace { + +struct Worker_arg + { + const Lzip_index * lzip_index; + const Pretty_print * pp; + Shared_retval * shared_retval; + int worker_id; + int num_workers; + int infd; + int outfd; + bool nocopy; // avoid copying decompressed data when testing + }; + + +/* Read members from input file, decompress their contents, and write to + output file the data produced. +*/ +extern "C" void * dworker( void * arg ) + { + const Worker_arg & tmp = *(const Worker_arg *)arg; + const Lzip_index & lzip_index = *tmp.lzip_index; + const Pretty_print & pp = *tmp.pp; + Shared_retval & shared_retval = *tmp.shared_retval; + const int worker_id = tmp.worker_id; + const int num_workers = tmp.num_workers; + const int infd = tmp.infd; + const int outfd = tmp.outfd; + const bool nocopy = tmp.nocopy; + const int buffer_size = 65536; + + uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size]; + uint8_t * const obuffer = + nocopy ? 0 : new( std::nothrow ) uint8_t[buffer_size]; + LZ_Decoder * const decoder = LZ_decompress_open(); + if( !ibuffer || ( !nocopy && !obuffer ) || !decoder || + LZ_decompress_errno( decoder ) != LZ_ok ) + { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; } + + for( long i = worker_id; i < lzip_index.members(); i += num_workers ) + { + long long data_pos = lzip_index.dblock( i ).pos(); + long long data_rest = lzip_index.dblock( i ).size(); + long long member_pos = lzip_index.mblock( i ).pos(); + long long member_rest = lzip_index.mblock( i ).size(); + + while( member_rest > 0 ) + { + if( shared_retval() ) goto done; // other worker found a problem + while( LZ_decompress_write_size( decoder ) > 0 ) + { + const int size = std::min( LZ_decompress_write_size( decoder ), + (int)std::min( (long long)buffer_size, member_rest ) ); + if( size > 0 ) + { + if( preadblock( infd, ibuffer, size, member_pos ) != size ) + { if( shared_retval.set_value( 1 ) ) + { pp(); show_error( "Read error", errno ); } goto done; } + member_pos += size; + member_rest -= size; + if( LZ_decompress_write( decoder, ibuffer, size ) != size ) + internal_error( "library error (LZ_decompress_write)." ); + } + if( member_rest <= 0 ) { LZ_decompress_finish( decoder ); break; } + } + while( true ) // write decompressed data to file + { + const int rd = LZ_decompress_read( decoder, obuffer, buffer_size ); + if( rd < 0 ) + { decompress_error( decoder, pp, shared_retval, worker_id ); + goto done; } + if( rd > 0 && outfd >= 0 ) + { + const int wr = pwriteblock( outfd, obuffer, rd, data_pos ); + if( wr != rd ) + { + if( shared_retval.set_value( 1 ) ) { pp(); + if( verbosity >= 0 ) + std::fprintf( stderr, "Write error in worker %d: %s\n", + worker_id, std::strerror( errno ) ); } + goto done; + } + } + if( rd > 0 ) + { + data_pos += rd; + data_rest -= rd; + } + if( LZ_decompress_finished( decoder ) == 1 ) + { + if( data_rest != 0 ) + internal_error( "final data_rest is not zero." ); + LZ_decompress_reset( decoder ); // prepare for new member + break; + } + if( rd == 0 ) break; + } + } + show_progress( lzip_index.mblock( i ).size() ); + } +done: + if( obuffer ) { delete[] obuffer; } delete[] ibuffer; + if( LZ_decompress_member_position( decoder ) != 0 && + shared_retval.set_value( 1 ) ) + pp( "Error, some data remains in decoder." ); + if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) ) + pp( "LZ_decompress_close failed." ); + return 0; + } + +} // end namespace + + +// start the workers and wait for them to finish. +int decompress( const unsigned long long cfile_size, int num_workers, + const int infd, const int outfd, const Cl_options & cl_opts, + const Pretty_print & pp, const int debug_level, + const int in_slots, const int out_slots, + const bool infd_isreg, const bool one_to_one ) + { + if( !infd_isreg ) + return dec_stream( cfile_size, num_workers, infd, outfd, cl_opts, pp, + debug_level, in_slots, out_slots ); + + const Lzip_index lzip_index( infd, cl_opts ); + if( lzip_index.retval() == 1 ) // decompress as stream if seek fails + { + lseek( infd, 0, SEEK_SET ); + return dec_stream( cfile_size, num_workers, infd, outfd, cl_opts, pp, + debug_level, in_slots, out_slots ); + } + if( lzip_index.retval() != 0 ) // corrupt or invalid input file + { + if( lzip_index.bad_magic() ) + show_file_error( pp.name(), lzip_index.error().c_str() ); + else pp( lzip_index.error().c_str() ); + return lzip_index.retval(); + } + + if( num_workers > lzip_index.members() ) num_workers = lzip_index.members(); + + if( outfd >= 0 ) + { + struct stat st; + if( !one_to_one || fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) || + lseek( outfd, 0, SEEK_CUR ) < 0 ) + { + if( debug_level & 2 ) std::fputs( "decompress file to stdout.\n", stderr ); + if( verbosity >= 1 ) pp(); + show_progress( 0, cfile_size, &pp ); // init + return dec_stdout( num_workers, infd, outfd, pp, debug_level, out_slots, + lzip_index ); + } + } + + if( debug_level & 2 ) std::fputs( "decompress file to file.\n", stderr ); + if( verbosity >= 1 ) pp(); + show_progress( 0, cfile_size, &pp ); // init + + Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; + pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; + if( !worker_args || !worker_threads ) + { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; } + +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 ); +#else + const bool nocopy = false; +#endif + + Shared_retval shared_retval; + int i = 0; // number of workers started + for( ; i < num_workers; ++i ) + { + worker_args[i].lzip_index = &lzip_index; + worker_args[i].pp = &pp; + worker_args[i].shared_retval = &shared_retval; + worker_args[i].worker_id = i; + worker_args[i].num_workers = num_workers; + worker_args[i].infd = infd; + worker_args[i].outfd = outfd; + worker_args[i].nocopy = nocopy; + const int errcode = + pthread_create( &worker_threads[i], 0, dworker, &worker_args[i] ); + if( errcode ) + { if( shared_retval.set_value( 1 ) ) + { show_error( "Can't create worker threads", errcode ); } break; } + } + + while( --i >= 0 ) + { + const int errcode = pthread_join( worker_threads[i], 0 ); + if( errcode && shared_retval.set_value( 1 ) ) + show_error( "Can't join worker threads", errcode ); + } + delete[] worker_threads; + delete[] worker_args; + + if( shared_retval() ) return shared_retval(); // some thread found a problem + + if( verbosity >= 1 ) + show_results( lzip_index.cdata_size(), lzip_index.udata_size(), + lzip_index.dictionary_size(), outfd < 0 ); + + if( debug_level & 1 ) + std::fprintf( stderr, + "workers started %8u\n", num_workers ); + + return 0; + } diff --git a/doc/plzip.1 b/doc/plzip.1 new file mode 100644 index 0000000..3985e5b --- /dev/null +++ b/doc/plzip.1 @@ -0,0 +1,148 @@ +.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2. +.TH PLZIP "1" "January 2024" "plzip 1.11" "User Commands" +.SH NAME +plzip \- reduces the size of files +.SH SYNOPSIS +.B plzip +[\fI\,options\/\fR] [\fI\,files\/\fR] +.SH DESCRIPTION +Plzip is a massively parallel (multi\-threaded) implementation of lzip, +compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib. +.PP +Lzip is a lossless data compressor with a user interface similar to the one +of gzip or bzip2. Lzip uses a simplified form of the 'Lempel\-Ziv\-Markov +chain\-Algorithm' (LZMA) stream format to maximize interoperability. The +maximum dictionary size is 512 MiB so that any lzip file can be decompressed +on 32\-bit machines. Lzip provides accurate and robust 3\-factor integrity +checking. Lzip can compress about as fast as gzip (lzip \fB\-0\fR) or compress most +files more than bzip2 (lzip \fB\-9\fR). Decompression speed is intermediate between +gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery +perspective. Lzip has been designed, written, and tested with great care to +replace gzip and bzip2 as the standard general\-purpose compressed format for +Unix\-like systems. +.PP +Plzip can compress/decompress large files on multiprocessor machines much +faster than lzip, at the cost of a slightly reduced compression ratio (0.4 +to 2 percent larger compressed files). Note that the number of usable +threads is limited by file size; on files larger than a few GB plzip can use +hundreds of processors, but on files of only a few MB plzip is no faster +than lzip. +.SH OPTIONS +.TP +\fB\-h\fR, \fB\-\-help\fR +display this help and exit +.TP +\fB\-V\fR, \fB\-\-version\fR +output version information and exit +.TP +\fB\-a\fR, \fB\-\-trailing\-error\fR +exit with error status if trailing data +.TP +\fB\-B\fR, \fB\-\-data\-size=\fR<bytes> +set size of input data blocks [2x8=16 MiB] +.TP +\fB\-c\fR, \fB\-\-stdout\fR +write to standard output, keep input files +.TP +\fB\-d\fR, \fB\-\-decompress\fR +decompress, test compressed file integrity +.TP +\fB\-f\fR, \fB\-\-force\fR +overwrite existing output files +.TP +\fB\-F\fR, \fB\-\-recompress\fR +force re\-compression of compressed files +.TP +\fB\-k\fR, \fB\-\-keep\fR +keep (don't delete) input files +.TP +\fB\-l\fR, \fB\-\-list\fR +print (un)compressed file sizes +.TP +\fB\-m\fR, \fB\-\-match\-length=\fR<bytes> +set match length limit in bytes [36] +.TP +\fB\-n\fR, \fB\-\-threads=\fR<n> +set number of (de)compression threads [2] +.TP +\fB\-o\fR, \fB\-\-output=\fR<file> +write to <file>, keep input files +.TP +\fB\-q\fR, \fB\-\-quiet\fR +suppress all messages +.TP +\fB\-s\fR, \fB\-\-dictionary\-size=\fR<bytes> +set dictionary size limit in bytes [8 MiB] +.TP +\fB\-t\fR, \fB\-\-test\fR +test compressed file integrity +.TP +\fB\-v\fR, \fB\-\-verbose\fR +be verbose (a 2nd \fB\-v\fR gives more) +.TP +\fB\-0\fR .. \fB\-9\fR +set compression level [default 6] +.TP +\fB\-\-fast\fR +alias for \fB\-0\fR +.TP +\fB\-\-best\fR +alias for \fB\-9\fR +.TP +\fB\-\-loose\-trailing\fR +allow trailing data seeming corrupt header +.TP +\fB\-\-in\-slots=\fR<n> +number of 1 MiB input packets buffered [4] +.TP +\fB\-\-out\-slots=\fR<n> +number of 1 MiB output packets buffered [64] +.TP +\fB\-\-check\-lib\fR +compare version of lzlib.h with liblz.{a,so} +.PP +If no file names are given, or if a file is '\-', plzip compresses or +decompresses from standard input to standard output. +Numbers may be followed by a multiplier: k = kB = 10^3 = 1000, +Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc... +Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 to +2^29 bytes. +.PP +The bidimensional parameter space of LZMA can't be mapped to a linear scale +optimal for all files. If your files are large, very repetitive, etc, you +may need to use the options \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR directly +to achieve optimal performance. +.PP +To extract all the files from archive 'foo.tar.lz', use the commands +\&'tar \fB\-xf\fR foo.tar.lz' or 'plzip \fB\-cd\fR foo.tar.lz | tar \fB\-xf\fR \-'. +.PP +Exit status: 0 for a normal exit, 1 for environmental problems +(file not found, invalid command\-line options, I/O errors, etc), 2 to +indicate a corrupt or invalid input file, 3 for an internal consistency +error (e.g., bug) which caused plzip to panic. +.SH "REPORTING BUGS" +Report bugs to lzip\-bug@nongnu.org +.br +Plzip home page: http://www.nongnu.org/lzip/plzip.html +.SH COPYRIGHT +Copyright \(co 2009 Laszlo Ersek. +.br +Copyright \(co 2024 Antonio Diaz Diaz. +License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html> +.br +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +Using lzlib 1.14 +Using LZ_API_VERSION = 1014 +.SH "SEE ALSO" +The full documentation for +.B plzip +is maintained as a Texinfo manual. If the +.B info +and +.B plzip +programs are properly installed at your site, the command +.IP +.B info plzip +.PP +should give you access to the complete manual. diff --git a/doc/plzip.info b/doc/plzip.info new file mode 100644 index 0000000..becd133 --- /dev/null +++ b/doc/plzip.info @@ -0,0 +1,833 @@ +This is plzip.info, produced by makeinfo version 4.13+ from plzip.texi. + +INFO-DIR-SECTION Compression +START-INFO-DIR-ENTRY +* Plzip: (plzip). Massively parallel implementation of lzip +END-INFO-DIR-ENTRY + + +File: plzip.info, Node: Top, Next: Introduction, Up: (dir) + +Plzip Manual +************ + +This manual is for Plzip (version 1.11, 21 January 2024). + +* Menu: + +* Introduction:: Purpose and features of plzip +* Output:: Meaning of plzip's output +* Invoking plzip:: Command-line interface +* Program design:: Internal structure of plzip +* Memory requirements:: Memory required to compress and decompress +* Minimum file sizes:: Minimum file sizes required for full speed +* File format:: Detailed format of the compressed file +* Trailing data:: Extra data appended to the file +* Examples:: A small tutorial with examples +* Problems:: Reporting bugs +* Concept index:: Index of concepts + + + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. + + +File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top + +1 Introduction +************** + +Plzip is a massively parallel (multi-threaded) implementation of lzip, +compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib. + + Lzip is a lossless data compressor with a user interface similar to the +one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov +chain-Algorithm' (LZMA) stream format to maximize interoperability. The +maximum dictionary size is 512 MiB so that any lzip file can be decompressed +on 32-bit machines. Lzip provides accurate and robust 3-factor integrity +checking. Lzip can compress about as fast as gzip (lzip -0) or compress most +files more than bzip2 (lzip -9). Decompression speed is intermediate between +gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery +perspective. Lzip has been designed, written, and tested with great care to +replace gzip and bzip2 as the standard general-purpose compressed format for +Unix-like systems. + + Plzip can compress/decompress large files on multiprocessor machines much +faster than lzip, at the cost of a slightly reduced compression ratio (0.4 +to 2 percent larger compressed files). Note that the number of usable +threads is limited by file size; on files larger than a few GB plzip can use +hundreds of processors, but on files of only a few MB plzip is no faster +than lzip. *Note Minimum file sizes::. + + For creation and manipulation of compressed tar archives tarlz can be +more efficient than using tar and plzip because tarlz is able to keep the +alignment between tar members and lzip members. *Note tarlz manual: +(tarlz)Top. + + The lzip file format is designed for data sharing and long-term +archiving, taking into account both data integrity and decoder availability: + + * The lzip format provides very safe integrity checking and some data + recovery means. The program lziprecover can repair bit flip errors + (one of the most common forms of data corruption) in lzip files, and + provides data recovery capabilities, including error-checked merging + of damaged copies of a file. *Note Data safety: (lziprecover)Data + safety. + + * The lzip format is as simple as possible (but not simpler). The lzip + manual provides the source code of a simple decompressor along with a + detailed explanation of how it works, so that with the only help of the + lzip manual it would be possible for a digital archaeologist to extract + the data from a lzip file long after quantum computers eventually + render LZMA obsolete. + + * Additionally the lzip reference implementation is copylefted, which + guarantees that it will remain free forever. + + A nice feature of the lzip format is that a corrupt byte is easier to +repair the nearer it is from the beginning of the file. Therefore, with the +help of lziprecover, losing an entire archive just because of a corrupt +byte near the beginning is a thing of the past. + + Plzip uses the same well-defined exit status values used by lzip, which +makes it safer than compressors returning ambiguous warning values (like +gzip) when it is used as a back end for other programs like tar or zutils. + + Plzip automatically uses for each file the largest dictionary size that +does not exceed neither the file size nor the limit given. Keep in mind +that the decompression memory requirement is affected at compression time +by the choice of dictionary size limit. *Note Memory requirements::. + + When compressing, plzip replaces every file given in the command line +with a compressed version of itself, with the name "original_name.lz". When +decompressing, plzip attempts to guess the name for the decompressed file +from that of the compressed file as follows: + +filename.lz becomes filename +filename.tlz becomes filename.tar +anyothername becomes anyothername.out + + (De)compressing a file is much like copying or moving it. Therefore plzip +preserves the access and modification dates, permissions, and, if you have +appropriate privileges, ownership of the file just as 'cp -p' does. (If the +user ID or the group ID can't be duplicated, the file permission bits +S_ISUID and S_ISGID are cleared). + + Plzip is able to read from some types of non-regular files if either the +option '-c' or the option '-o' is specified. + + Plzip refuses to read compressed data from a terminal or write compressed +data to a terminal, as this would be entirely incomprehensible and might +leave the terminal in an abnormal state. + + Plzip correctly decompresses a file which is the concatenation of two or +more compressed files. The result is the concatenation of the corresponding +decompressed files. Integrity testing of concatenated compressed files is +also supported. + + +File: plzip.info, Node: Output, Next: Invoking plzip, Prev: Introduction, Up: Top + +2 Meaning of plzip's output +*************************** + +The output of plzip looks like this: + + plzip -v foo + foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. + + plzip -tvvv foo.lz + foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok + + The meaning of each field is as follows: + +'N:1' + The compression ratio (uncompressed_size / compressed_size), shown as + N to 1. + +'ratio' + The inverse compression ratio (compressed_size / uncompressed_size), + shown as a percentage. A decimal ratio is easily obtained by moving the + decimal point two places to the left; 14.98% = 0.1498. + +'saved' + The space saved by compression (1 - ratio), shown as a percentage. + +'in' + Size of the input data. This is the uncompressed size when + compressing, or the compressed size when decompressing or testing. + Note that plzip always prints the uncompressed size before the + compressed size when compressing, decompressing, testing, or listing. + +'out' + Size of the output data. This is the compressed size when compressing, + or the decompressed size when decompressing or testing. + + + When decompressing or testing at verbosity level 4 (-vvvv), the +dictionary size used to compress the file is also shown. + + LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never +have been compressed. Decompressed is used to refer to data which have +undergone the process of decompression. + + +File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top + +3 Invoking plzip +**************** + +The format for running plzip is: + + plzip [OPTIONS] [FILES] + +If no file names are specified, plzip compresses (or decompresses) from +standard input to standard output. A hyphen '-' used as a FILE argument +means standard input. It can be mixed with other FILES and is read just +once, the first time it appears in the command line. Remember to prepend +'./' to any file name beginning with a hyphen, or use '--'. + + plzip supports the following options: *Note Argument syntax: +(arg_parser)Argument syntax. + +'-h' +'--help' + Print an informative help message describing the options and exit. + +'-V' +'--version' + Print the version number of plzip on the standard output and exit. + This version number should be included in all bug reports. + +'-a' +'--trailing-error' + Exit with error status 2 if any remaining input is detected after + decompressing the last member. Such remaining input is usually trailing + garbage that can be safely ignored. *Note concat-example::. + +'-B BYTES' +'--data-size=BYTES' + When compressing, set the size in bytes of the input data blocks. The + input file is divided in chunks of this size before compression is + performed. Valid values range from 8 KiB to 1 GiB. Default value is + two times the dictionary size, except for option '-0' where it + defaults to 1 MiB. Plzip reduces the dictionary size if it is larger + than the data size specified. *Note Minimum file sizes::. + +'-c' +'--stdout' + Compress or decompress to standard output; keep input files unchanged. + If compressing several files, each file is compressed independently. + (The output consists of a sequence of independently compressed + members). This option (or '-o') is needed when reading from a named + pipe (fifo) or from a device. Use 'lziprecover -cd -i' to recover as + much of the decompressed data as possible when decompressing a corrupt + file. '-c' overrides '-o'. '-c' has no effect when testing or listing. + +'-d' +'--decompress' + Decompress the files specified. The integrity of the files specified is + checked. If a file does not exist, can't be opened, or the destination + file already exists and '--force' has not been specified, plzip + continues decompressing the rest of the files and exits with error + status 1. If a file fails to decompress, or is a terminal, plzip exits + immediately with error status 2 without decompressing the rest of the + files. A terminal is considered an uncompressed file, and therefore + invalid. + +'-f' +'--force' + Force overwrite of output files. + +'-F' +'--recompress' + When compressing, force re-compression of files whose name already has + the '.lz' or '.tlz' suffix. + +'-k' +'--keep' + Keep (don't delete) input files during compression or decompression. + +'-l' +'--list' + Print the uncompressed size, compressed size, and percentage saved of + the files specified. Trailing data are ignored. The values produced + are correct even for multimember files. If more than one file is + given, a final line containing the cumulative sizes is printed. With + '-v', the dictionary size, the number of members in the file, and the + amount of trailing data (if any) are also printed. With '-vv', the + positions and sizes of each member in multimember files are also + printed. + + If any file is damaged, does not exist, can't be opened, or is not + regular, the final exit status is > 0. '-lq' can be used to check + quickly (without decompressing) the structural integrity of the files + specified. (Use '--test' to check the data integrity). '-alq' + additionally checks that none of the files specified contain trailing + data. + +'-m BYTES' +'--match-length=BYTES' + When compressing, set the match length limit in bytes. After a match + this long is found, the search is finished. Valid values range from 5 + to 273. Larger values usually give better compression ratios but + longer compression times. + +'-n N' +'--threads=N' + Set the maximum number of worker threads, overriding the system's + default. Valid values range from 1 to "as many as your system can + support". If this option is not used, plzip tries to detect the number + of processors in the system and use it as default value. When + compressing on a 32 bit system, plzip tries to limit the memory use to + under 2.22 GiB (4 worker threads at level -9) by reducing the number + of threads below the system's default. 'plzip --help' shows the + system's default value. + + Plzip starts the number of threads required by each file without + exceeding the value specified. Note that the number of usable threads + is limited to ceil( file_size / data_size ) during compression (*note + Minimum file sizes::), and to the number of members in the input + during decompression. You can find the number of members in a lzip + file by running 'plzip -lv file.lz'. + +'-o FILE' +'--output=FILE' + If '-c' has not been also specified, write the (de)compressed output + to FILE, automatically creating any missing parent directories; keep + input files unchanged. If compressing several files, each file is + compressed independently. (The output consists of a sequence of + independently compressed members). This option (or '-c') is needed + when reading from a named pipe (fifo) or from a device. '-o -' is + equivalent to '-c'. '-o' has no effect when testing or listing. + + In order to keep backward compatibility with plzip versions prior to + 1.9, when compressing from standard input and no other file names are + given, the extension '.lz' is appended to FILE unless it already ends + in '.lz' or '.tlz'. This feature will be removed in a future version + of plzip. Meanwhile, redirection may be used instead of '-o' to write + the compressed output to a file without the extension '.lz' in its + name: 'plzip < file > foo'. + +'-q' +'--quiet' + Quiet operation. Suppress all messages. + +'-s BYTES' +'--dictionary-size=BYTES' + When compressing, set the dictionary size limit in bytes. Plzip uses + for each file the largest dictionary size that does not exceed neither + the file size nor this limit. Valid values range from 4 KiB to 512 MiB. + Values 12 to 29 are interpreted as powers of two, meaning 2^12 to 2^29 + bytes. Dictionary sizes are quantized so that they can be coded in + just one byte (*note coded-dict-size::). If the size specified does + not match one of the valid sizes, it is rounded upwards by adding up + to (BYTES / 8) to it. + + For maximum compression you should use a dictionary size limit as large + as possible, but keep in mind that the decompression memory requirement + is affected at compression time by the choice of dictionary size limit. + +'-t' +'--test' + Check integrity of the files specified, but don't decompress them. This + really performs a trial decompression and throws away the result. Use + it together with '-v' to see information about the files. If a file + fails the test, does not exist, can't be opened, or is a terminal, + plzip continues testing the rest of the files. A final diagnostic is + shown at verbosity level 1 or higher if any file fails the test when + testing multiple files. + +'-v' +'--verbose' + Verbose mode. + When compressing, show the compression ratio and size for each file + processed. + When decompressing or testing, further -v's (up to 4) increase the + verbosity level, showing status, compression ratio, dictionary size, + decompressed size, and compressed size. + Two or more '-v' options show the progress of (de)compression, except + for single-member files. + +'-0 .. -9' + Compression level. Set the compression parameters (dictionary size and + match length limit) as shown in the table below. The default + compression level is '-6', equivalent to '-s8MiB -m36'. Note that '-9' + can be much slower than '-0'. These options have no effect when + decompressing, testing, or listing. + + The bidimensional parameter space of LZMA can't be mapped to a linear + scale optimal for all files. If your files are large, very repetitive, + etc, you may need to use the options '--dictionary-size' and + '--match-length' directly to achieve optimal performance. + + If several compression levels or '-s' or '-m' options are given, the + last setting is used. For example '-9 -s64MiB' is equivalent to + '-s64MiB -m273' + + Level Dictionary size (-s) Match length limit (-m) + -0 64 KiB 16 bytes + -1 1 MiB 5 bytes + -2 1.5 MiB 6 bytes + -3 2 MiB 8 bytes + -4 3 MiB 12 bytes + -5 4 MiB 20 bytes + -6 8 MiB 36 bytes + -7 16 MiB 68 bytes + -8 24 MiB 132 bytes + -9 32 MiB 273 bytes + +'--fast' +'--best' + Aliases for GNU gzip compatibility. + +'--loose-trailing' + When decompressing, testing, or listing, allow trailing data whose + first bytes are so similar to the magic bytes of a lzip header that + they can be confused with a corrupt header. Use this option if a file + triggers a "corrupt header" error and the cause is not indeed a + corrupt header. + +'--in-slots=N' + Number of 1 MiB input packets buffered per worker thread when + decompressing from non-seekable input. Increasing the number of packets + may increase decompression speed, but requires more memory. Valid + values range from 1 to 64. The default value is 4. + +'--out-slots=N' + Number of 1 MiB output packets buffered per worker thread when + decompressing to non-seekable output. Increasing the number of packets + may increase decompression speed, but requires more memory. Valid + values range from 1 to 1024. The default value is 64. + +'--check-lib' + Compare the version of lzlib used to compile plzip with the version + actually being used at run time and exit. Report any differences + found. Exit with error status 1 if differences are found. A mismatch + may indicate that lzlib is not correctly installed or that a different + version of lzlib has been installed after compiling plzip. Exit with + error status 2 if LZ_API_VERSION and LZ_version_string don't match. + 'plzip -v --check-lib' shows the version of lzlib being used and the + value of LZ_API_VERSION (if defined). *Note Library version: + (lzlib)Library version. + + + Numbers given as arguments to options may be expressed in decimal, +hexadecimal, or octal (using the same syntax as integer constants in C++), +and may be followed by a multiplier and an optional 'B' for "byte". + + Table of SI and binary prefixes (unit multipliers): + +Prefix Value | Prefix Value +k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024) +M megabyte (10^6) | Mi mebibyte (2^20) +G gigabyte (10^9) | Gi gibibyte (2^30) +T terabyte (10^12) | Ti tebibyte (2^40) +P petabyte (10^15) | Pi pebibyte (2^50) +E exabyte (10^18) | Ei exbibyte (2^60) +Z zettabyte (10^21) | Zi zebibyte (2^70) +Y yottabyte (10^24) | Yi yobibyte (2^80) +R ronnabyte (10^27) | Ri robibyte (2^90) +Q quettabyte (10^30) | Qi quebibyte (2^100) + + + Exit status: 0 for a normal exit, 1 for environmental problems (file not +found, invalid command-line options, I/O errors, etc), 2 to indicate a +corrupt or invalid input file, 3 for an internal consistency error (e.g., +bug) which caused plzip to panic. + + +File: plzip.info, Node: Program design, Next: Memory requirements, Prev: Invoking plzip, Up: Top + +4 Internal structure of plzip +***************************** + +When compressing, plzip divides the input file into chunks and compresses as +many chunks simultaneously as worker threads are chosen, creating a +multimember compressed file. Each chunk is compressed in-place (using the +same buffer for input and output), reducing the amount of RAM required. + + When decompressing, plzip decompresses as many members simultaneously as +worker threads are chosen. Files that were compressed with lzip are not +decompressed faster than using lzip (unless the option '-b' was used) +because lzip usually produces single-member files, which can't be +decompressed in parallel. + + For each input file, a splitter thread and several worker threads are +created, acting the main thread as muxer (multiplexer) thread. A "packet +courier" takes care of data transfers among threads and limits the maximum +number of data blocks (packets) being processed simultaneously. + + The splitter reads data blocks from the input file, and distributes them +to the workers. The workers (de)compress the blocks received from the +splitter. The muxer collects processed packets from the workers, and writes +them to the output file. + + .------------. + ,-->| worker 0 |--, + | `------------' | +.-------. .----------. | .------------. | .-------. .--------. +| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output | +| file | `----------' | `------------' | `-------' | file | +`-------' | ... | `--------' + | .------------. | + `-->| worker N-1 |--' + `------------' + + When decompressing from a regular file, the splitter is removed and the +workers read directly from the input file. If the output file is also a +regular file, the muxer is also removed and the workers write directly to +the output file. With these optimizations, the use of RAM is greatly +reduced and the decompression speed of large files with many members is +only limited by the number of processors available and by I/O speed. + + +File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: Program design, Up: Top + +5 Memory required to compress and decompress +******************************************** + +The amount of memory required *per worker thread* for decompression or +testing is approximately the following: + + * For decompression of a regular (seekable) file to another regular file, + or for testing of a regular file; the dictionary size. + + * For testing of a non-seekable file or of standard input; the dictionary + size plus 1 MiB plus up to the number of 1 MiB input packets buffered + (4 by default). + + * For decompression of a regular file to a non-seekable file or to + standard output; the dictionary size plus up to the number of 1 MiB + output packets buffered (64 by default). + + * For decompression of a non-seekable file or of standard input; the + dictionary size plus 1 MiB plus up to the number of 1 MiB input and + output packets buffered (68 by default). + +The amount of memory required *per worker thread* for compression is +approximately the following: + + * For compression at level -0; 1.5 MiB plus 3.375 times the data size + (*note --data-size::). Default is 4.875 MiB. + + * For compression at other levels; 11 times the dictionary size plus + 3.375 times the data size. Default is 142 MiB. + +The following table shows the memory required *per thread* for compression +at a given level, using the default data size for each level: + +Level Memory required +-0 4.875 MiB +-1 17.75 MiB +-2 26.625 MiB +-3 35.5 MiB +-4 53.25 MiB +-5 71 MiB +-6 142 MiB +-7 284 MiB +-8 426 MiB +-9 568 MiB + + +File: plzip.info, Node: Minimum file sizes, Next: File format, Prev: Memory requirements, Up: Top + +6 Minimum file sizes required for full compression speed +******************************************************** + +When compressing, plzip divides the input file into chunks and compresses +as many chunks simultaneously as worker threads are chosen, creating a +multimember compressed file. + + For this to work as expected (and roughly multiply the compression speed +by the number of available processors), the uncompressed file must be at +least as large as the number of worker threads times the chunk size (*note +--data-size::). Else some processors do not get any data to compress, and +compression is proportionally slower. The maximum speed increase achievable +on a given file is limited by the ratio (file_size / data_size). For +example, a tarball the size of gcc or linux scales up to 10 or 14 +processors at level -9. + + The following table shows the minimum uncompressed file size needed for +full use of N processors at a given compression level, using the default +data size for each level: + +Processors 2 4 8 16 64 256 +------------------------------------------------------------------ +Level +-0 2 MiB 4 MiB 8 MiB 16 MiB 64 MiB 256 MiB +-1 4 MiB 8 MiB 16 MiB 32 MiB 128 MiB 512 MiB +-2 6 MiB 12 MiB 24 MiB 48 MiB 192 MiB 768 MiB +-3 8 MiB 16 MiB 32 MiB 64 MiB 256 MiB 1 GiB +-4 12 MiB 24 MiB 48 MiB 96 MiB 384 MiB 1.5 GiB +-5 16 MiB 32 MiB 64 MiB 128 MiB 512 MiB 2 GiB +-6 32 MiB 64 MiB 128 MiB 256 MiB 1 GiB 4 GiB +-7 64 MiB 128 MiB 256 MiB 512 MiB 2 GiB 8 GiB +-8 96 MiB 192 MiB 384 MiB 768 MiB 3 GiB 12 GiB +-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB + + +File: plzip.info, Node: File format, Next: Trailing data, Prev: Minimum file sizes, Up: Top + +7 File format +************* + +Perfection is reached, not when there is no longer anything to add, but +when there is no longer anything to take away. +-- Antoine de Saint-Exupery + + + In the diagram below, a box like this: + ++---+ +| | <-- the vertical bars might be missing ++---+ + + represents one byte; a box like this: + ++==============+ +| | ++==============+ + + represents a variable number of bytes. + + + A lzip file consists of one or more independent "members" (compressed +data sets). The members simply appear one after another in the file, with no +additional information before, between, or after them. Each member can +encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The +size of a multimember file is unlimited. + + Each member has the following structure: + ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + All multibyte values are stored in little endian order. + +'ID string (the "magic" bytes)' + A four byte string, identifying the lzip format, with the value "LZIP" + (0x4C, 0x5A, 0x49, 0x50). + +'VN (version number, 1 byte)' + Just in case something needs to be modified in the future. 1 for now. + +'DS (coded dictionary size, 1 byte)' + The dictionary size is calculated by taking a power of 2 (the base + size) and subtracting from it a fraction between 0/16 and 7/16 of the + base size. + Bits 4-0 contain the base 2 logarithm of the base size (12 to 29). + Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract + from the base size to obtain the dictionary size. + Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB + Valid values for dictionary size range from 4 KiB to 512 MiB. + +'LZMA stream' + The LZMA stream, finished by an "End Of Stream" marker. Uses default + values for encoder properties. *Note Stream format: (lzip)Stream + format, for a complete description. + +'CRC32 (4 bytes)' + Cyclic Redundancy Check (CRC) of the original uncompressed data. + +'Data size (8 bytes)' + Size of the original uncompressed data. + +'Member size (8 bytes)' + Total size of the member, including header and trailer. This field acts + as a distributed index, improves the checking of stream integrity, and + facilitates the safe recovery of undamaged members from multimember + files. Lzip limits the member size to 2 PiB to prevent the data size + field from overflowing. + + + +File: plzip.info, Node: Trailing data, Next: Examples, Prev: File format, Up: Top + +8 Extra data appended to the file +********************************* + +Sometimes extra data are found appended to a lzip file after the last +member. Such trailing data may be: + + * Padding added to make the file size a multiple of some block size, for + example when writing to a tape. It is safe to append any amount of + padding zero bytes to a lzip file. + + * Useful data added by the user; an "End Of File" string (to check that + the file has not been truncated), a cryptographically secure hash, a + description of file contents, etc. It is safe to append any amount of + text to a lzip file as long as none of the first four bytes of the + text matches the corresponding byte in the string "LZIP", and the text + does not contain any zero bytes (null characters). Nonzero bytes and + zero bytes can't be safely mixed in trailing data. + + * Garbage added by some not totally successful copy operation. + + * Malicious data added to the file in order to make its total size and + hash value (for a chosen hash) coincide with those of another file. + + * In rare cases, trailing data could be the corrupt header of another + member. In multimember or concatenated files the probability of + corruption happening in the magic bytes is 5 times smaller than the + probability of getting a false positive caused by the corruption of the + integrity information itself. Therefore it can be considered to be + below the noise level. Additionally, the test used by plzip to + discriminate trailing data from a corrupt header has a Hamming + distance (HD) of 3, and the 3 bit flips must happen in different magic + bytes for the test to fail. In any case, the option '--trailing-error' + guarantees that any corrupt header is detected. + + Trailing data are in no way part of the lzip file format, but tools +reading lzip files are expected to behave as correctly and usefully as +possible in the presence of trailing data. + + Trailing data can be safely ignored in most cases. In some cases, like +that of user-added data, they are expected to be ignored. In those cases +where a file containing trailing data must be rejected, the option +'--trailing-error' can be used. *Note --trailing-error::. + + +File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top + +9 A small tutorial with examples +******************************** + +WARNING! Even if plzip is bug-free, other causes may result in a corrupt +compressed file (bugs in the system libraries, memory errors, etc). +Therefore, if the data you are going to compress are important, give the +option '--keep' to plzip and don't remove the original file until you check +the compressed file with a command like 'plzip -cd file.lz | cmp file -'. +Most RAM errors happening during compression can only be detected by +comparing the compressed file with the original because the corruption +happens before plzip compresses the RAM contents, resulting in a valid +compressed file containing wrong data. + + +Example 1: Extract all the files from archive 'foo.tar.lz'. + + tar -xf foo.tar.lz + or + plzip -cd foo.tar.lz | tar -xf - + + +Example 2: Replace a regular file with its compressed version 'file.lz' and +show the compression ratio. + + plzip -v file + + +Example 3: Like example 2 but the created 'file.lz' has a block size of +1 MiB. The compression ratio is not shown. + + plzip -B 1MiB file + + +Example 4: Restore a regular file from its compressed version 'file.lz'. If +the operation is successful, 'file.lz' is removed. + + plzip -d file.lz + + +Example 5: Check the integrity of the compressed file 'file.lz' and show +status. + + plzip -tv file.lz + + +Example 6: The right way of concatenating the decompressed output of two or +more compressed files. *Note Trailing data::. + + Don't do this + cat file1.lz file2.lz file3.lz | plzip -d - + Do this instead + plzip -cd file1.lz file2.lz file3.lz + + +Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed data +are produced. + + plzip -cd file.lz | dd bs=1024 count=10 + + +Example 8: Decompress 'file.lz' partially from decompressed byte at offset +10000 to decompressed byte at offset 14999 (5000 bytes are produced). + + plzip -cd file.lz | dd bs=1000 skip=10 count=5 + + +Example 9: Compress a whole device in /dev/sdc and send the output to +'file.lz'. + + plzip -c /dev/sdc > file.lz + or + plzip /dev/sdc -o file.lz + + +File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top + +10 Reporting bugs +***************** + +There are probably bugs in plzip. There are certainly errors and omissions +in this manual. If you report them, they will get fixed. If you don't, no +one will ever know about them and they will remain unfixed for all +eternity, if not longer. + + If you find a bug in plzip, please send electronic mail to +<lzip-bug@nongnu.org>. Include the version number, which you can find by +running 'plzip --version' and 'plzip -v --check-lib'. + + +File: plzip.info, Node: Concept index, Prev: Problems, Up: Top + +Concept index +************* + + +* Menu: + +* bugs: Problems. (line 6) +* examples: Examples. (line 6) +* file format: File format. (line 6) +* getting help: Problems. (line 6) +* introduction: Introduction. (line 6) +* invoking: Invoking plzip. (line 6) +* memory requirements: Memory requirements. (line 6) +* minimum file sizes: Minimum file sizes. (line 6) +* options: Invoking plzip. (line 6) +* output: Output. (line 6) +* program design: Program design. (line 6) +* trailing data: Trailing data. (line 6) +* usage: Invoking plzip. (line 6) +* version: Invoking plzip. (line 6) + + + +Tag Table: +Node: Top217 +Node: Introduction1156 +Node: Output5934 +Node: Invoking plzip7497 +Ref: --trailing-error8372 +Ref: --data-size8610 +Node: Program design19519 +Node: Memory requirements21818 +Node: Minimum file sizes23503 +Node: File format25506 +Ref: coded-dict-size26945 +Node: Trailing data28195 +Node: Examples30531 +Ref: concat-example31964 +Node: Problems32721 +Node: Concept index33276 + +End Tag Table + + +Local Variables: +coding: iso-8859-15 +End: diff --git a/doc/plzip.texi b/doc/plzip.texi new file mode 100644 index 0000000..323fad1 --- /dev/null +++ b/doc/plzip.texi @@ -0,0 +1,907 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename plzip.info +@documentencoding ISO-8859-15 +@settitle Plzip Manual +@finalout +@c %**end of header + +@set UPDATED 21 January 2024 +@set VERSION 1.11 + +@dircategory Compression +@direntry +* Plzip: (plzip). Massively parallel implementation of lzip +@end direntry + + +@ifnothtml +@titlepage +@title Plzip +@subtitle Massively parallel implementation of lzip +@subtitle for Plzip version @value{VERSION}, @value{UPDATED} +@author by Antonio Diaz Diaz + +@page +@vskip 0pt plus 1filll +@end titlepage + +@contents +@end ifnothtml + +@ifnottex +@node Top +@top + +This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). + +@menu +* Introduction:: Purpose and features of plzip +* Output:: Meaning of plzip's output +* Invoking plzip:: Command-line interface +* Program design:: Internal structure of plzip +* Memory requirements:: Memory required to compress and decompress +* Minimum file sizes:: Minimum file sizes required for full speed +* File format:: Detailed format of the compressed file +* Trailing data:: Extra data appended to the file +* Examples:: A small tutorial with examples +* Problems:: Reporting bugs +* Concept index:: Index of concepts +@end menu + +@sp 1 +Copyright @copyright{} 2009-2024 Antonio Diaz Diaz. + +This manual is free documentation: you have unlimited permission to copy, +distribute, and modify it. +@end ifnottex + + +@node Introduction +@chapter Introduction +@cindex introduction + +@uref{http://www.nongnu.org/lzip/plzip.html,,Plzip} +is a massively parallel (multi-threaded) implementation of lzip, +compatible with lzip 1.4 or newer. Plzip uses the compression library +@uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}. + +@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} +is a lossless data compressor with a user interface similar to the one +of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov +chain-Algorithm' (LZMA) stream format to maximize interoperability. The +maximum dictionary size is 512 MiB so that any lzip file can be decompressed +on 32-bit machines. Lzip provides accurate and robust 3-factor integrity +checking. Lzip can compress about as fast as gzip @w{(lzip -0)} or compress most +files more than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between +gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery +perspective. Lzip has been designed, written, and tested with great care to +replace gzip and bzip2 as the standard general-purpose compressed format for +Unix-like systems. + +Plzip can compress/decompress large files on multiprocessor machines much +faster than lzip, at the cost of a slightly reduced compression ratio (0.4 +to 2 percent larger compressed files). Note that the number of usable +threads is limited by file size; on files larger than a few GB plzip can use +hundreds of processors, but on files of only a few MB plzip is no faster +than lzip. @xref{Minimum file sizes}. + +For creation and manipulation of compressed tar archives +@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be more +efficient than using tar and plzip because tarlz is able to keep the +alignment between tar members and lzip members. +@ifnothtml +@xref{Top,tarlz manual,,tarlz}. +@end ifnothtml + +The lzip file format is designed for data sharing and long-term archiving, +taking into account both data integrity and decoder availability: + +@itemize @bullet +@item +The lzip format provides very safe integrity checking and some data +recovery means. The program +@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} +can repair bit flip errors (one of the most common forms of data corruption) +in lzip files, and provides data recovery capabilities, including +error-checked merging of damaged copies of a file. +@ifnothtml +@xref{Data safety,,,lziprecover}. +@end ifnothtml + +@item +The lzip format is as simple as possible (but not simpler). The lzip +manual provides the source code of a simple decompressor along with a +detailed explanation of how it works, so that with the only help of the +lzip manual it would be possible for a digital archaeologist to extract +the data from a lzip file long after quantum computers eventually +render LZMA obsolete. + +@item +Additionally the lzip reference implementation is copylefted, which +guarantees that it will remain free forever. +@end itemize + +A nice feature of the lzip format is that a corrupt byte is easier to repair +the nearer it is from the beginning of the file. Therefore, with the help of +lziprecover, losing an entire archive just because of a corrupt byte near +the beginning is a thing of the past. + +Plzip uses the same well-defined exit status values used by lzip, which +makes it safer than compressors returning ambiguous warning values (like +gzip) when it is used as a back end for other programs like tar or zutils. + +Plzip automatically uses for each file the largest dictionary size that does +not exceed neither the file size nor the limit given. Keep in mind that the +decompression memory requirement is affected at compression time by the +choice of dictionary size limit. @xref{Memory requirements}. + +When compressing, plzip replaces every file given in the command line +with a compressed version of itself, with the name "original_name.lz". +When decompressing, plzip attempts to guess the name for the decompressed +file from that of the compressed file as follows: + +@multitable {anyothername} {becomes} {anyothername.out} +@item filename.lz @tab becomes @tab filename +@item filename.tlz @tab becomes @tab filename.tar +@item anyothername @tab becomes @tab anyothername.out +@end multitable + +(De)compressing a file is much like copying or moving it. Therefore plzip +preserves the access and modification dates, permissions, and, if you have +appropriate privileges, ownership of the file just as @w{@samp{cp -p}} does. +(If the user ID or the group ID can't be duplicated, the file permission +bits S_ISUID and S_ISGID are cleared). + +Plzip is able to read from some types of non-regular files if either the +option @option{-c} or the option @option{-o} is specified. + +Plzip refuses to read compressed data from a terminal or write compressed +data to a terminal, as this would be entirely incomprehensible and might +leave the terminal in an abnormal state. + +Plzip correctly decompresses a file which is the concatenation of two or +more compressed files. The result is the concatenation of the corresponding +decompressed files. Integrity testing of concatenated compressed files is +also supported. + + +@node Output +@chapter Meaning of plzip's output +@cindex output + +The output of plzip looks like this: + +@example +plzip -v foo + foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. + +plzip -tvvv foo.lz + foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok +@end example + +The meaning of each field is as follows: + +@table @code +@item N:1 +The compression ratio @w{(uncompressed_size / compressed_size)}, shown as +@w{N to 1}. + +@item ratio +The inverse compression ratio @w{(compressed_size / uncompressed_size)}, +shown as a percentage. A decimal ratio is easily obtained by moving the +decimal point two places to the left; @w{14.98% = 0.1498}. + +@item saved +The space saved by compression @w{(1 - ratio)}, shown as a percentage. + +@item in +Size of the input data. This is the uncompressed size when compressing, or +the compressed size when decompressing or testing. Note that plzip always +prints the uncompressed size before the compressed size when compressing, +decompressing, testing, or listing. + +@item out +Size of the output data. This is the compressed size when compressing, or +the decompressed size when decompressing or testing. + +@end table + +When decompressing or testing at verbosity level 4 (-vvvv), the dictionary +size used to compress the file is also shown. + +LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have +been compressed. Decompressed is used to refer to data which have undergone +the process of decompression. + + +@node Invoking plzip +@chapter Invoking plzip +@cindex invoking +@cindex options +@cindex usage +@cindex version + +The format for running plzip is: + +@example +plzip [@var{options}] [@var{files}] +@end example + +@noindent +If no file names are specified, plzip compresses (or decompresses) from +standard input to standard output. A hyphen @samp{-} used as a @var{file} +argument means standard input. It can be mixed with other @var{files} and is +read just once, the first time it appears in the command line. Remember to +prepend @file{./} to any file name beginning with a hyphen, or use @samp{--}. + +plzip supports the following +@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}: +@ifnothtml +@xref{Argument syntax,,,arg_parser}. +@end ifnothtml + +@table @code +@item -h +@itemx --help +Print an informative help message describing the options and exit. + +@item -V +@itemx --version +Print the version number of plzip on the standard output and exit. +This version number should be included in all bug reports. + +@anchor{--trailing-error} +@item -a +@itemx --trailing-error +Exit with error status 2 if any remaining input is detected after +decompressing the last member. Such remaining input is usually trailing +garbage that can be safely ignored. @xref{concat-example}. + +@anchor{--data-size} +@item -B @var{bytes} +@itemx --data-size=@var{bytes} +When compressing, set the size in bytes of the input data blocks. The input +file is divided in chunks of this size before compression is performed. +Valid values range from @w{8 KiB} to @w{1 GiB}. Default value is two times +the dictionary size, except for option @option{-0} where it defaults to +@w{1 MiB}. Plzip reduces the dictionary size if it is larger than the data +size specified. @xref{Minimum file sizes}. + +@item -c +@itemx --stdout +Compress or decompress to standard output; keep input files unchanged. If +compressing several files, each file is compressed independently. (The +output consists of a sequence of independently compressed members). This +option (or @option{-o}) is needed when reading from a named pipe (fifo) or +from a device. Use @w{@samp{lziprecover -cd -i}} to recover as much of the +decompressed data as possible when decompressing a corrupt file. @option{-c} +overrides @option{-o}. @option{-c} has no effect when testing or listing. + +@item -d +@itemx --decompress +Decompress the files specified. The integrity of the files specified is +checked. If a file does not exist, can't be opened, or the destination file +already exists and @option{--force} has not been specified, plzip continues +decompressing the rest of the files and exits with error status 1. If a file +fails to decompress, or is a terminal, plzip exits immediately with error +status 2 without decompressing the rest of the files. A terminal is +considered an uncompressed file, and therefore invalid. + +@item -f +@itemx --force +Force overwrite of output files. + +@item -F +@itemx --recompress +When compressing, force re-compression of files whose name already has +the @samp{.lz} or @samp{.tlz} suffix. + +@item -k +@itemx --keep +Keep (don't delete) input files during compression or decompression. + +@item -l +@itemx --list +Print the uncompressed size, compressed size, and percentage saved of the +files specified. Trailing data are ignored. The values produced are correct +even for multimember files. If more than one file is given, a final line +containing the cumulative sizes is printed. With @option{-v}, the dictionary +size, the number of members in the file, and the amount of trailing data (if +any) are also printed. With @option{-vv}, the positions and sizes of each +member in multimember files are also printed. + +If any file is damaged, does not exist, can't be opened, or is not regular, +the final exit status is @w{> 0}. @option{-lq} can be used to check quickly +(without decompressing) the structural integrity of the files specified. +(Use @option{--test} to check the data integrity). @option{-alq} +additionally checks that none of the files specified contain trailing data. + +@item -m @var{bytes} +@itemx --match-length=@var{bytes} +When compressing, set the match length limit in bytes. After a match this +long is found, the search is finished. Valid values range from 5 to 273. +Larger values usually give better compression ratios but longer compression +times. + +@item -n @var{n} +@itemx --threads=@var{n} +Set the maximum number of worker threads, overriding the system's default. +Valid values range from 1 to "as many as your system can support". If this +option is not used, plzip tries to detect the number of processors in the +system and use it as default value. When compressing on a @w{32 bit} system, +plzip tries to limit the memory use to under @w{2.22 GiB} (4 worker threads +at level -9) by reducing the number of threads below the system's default. +@w{@samp{plzip --help}} shows the system's default value. + +Plzip starts the number of threads required by each file without exceeding +the value specified. Note that the number of usable threads is limited to +@w{ceil( file_size / data_size )} during compression (@pxref{Minimum file +sizes}), and to the number of members in the input during decompression. You +can find the number of members in a lzip file by running +@w{@samp{plzip -lv file.lz}}. + +@item -o @var{file} +@itemx --output=@var{file} +If @option{-c} has not been also specified, write the (de)compressed output +to @var{file}, automatically creating any missing parent directories; keep +input files unchanged. If compressing several files, each file is compressed +independently. (The output consists of a sequence of independently +compressed members). This option (or @option{-c}) is needed when reading +from a named pipe (fifo) or from a device. @w{@option{-o -}} is equivalent +to @option{-c}. @option{-o} has no effect when testing or listing. + +In order to keep backward compatibility with plzip versions prior to 1.9, +when compressing from standard input and no other file names are given, the +extension @samp{.lz} is appended to @var{file} unless it already ends in +@samp{.lz} or @samp{.tlz}. This feature will be removed in a future version +of plzip. Meanwhile, redirection may be used instead of @option{-o} to write +the compressed output to a file without the extension @samp{.lz} in its +name: @w{@samp{plzip < file > foo}}. + +@item -q +@itemx --quiet +Quiet operation. Suppress all messages. + +@item -s @var{bytes} +@itemx --dictionary-size=@var{bytes} +When compressing, set the dictionary size limit in bytes. Plzip uses for +each file the largest dictionary size that does not exceed neither the file +size nor this limit. Valid values range from @w{4 KiB} to @w{512 MiB}. +Values 12 to 29 are interpreted as powers of two, meaning 2^12 to 2^29 +bytes. Dictionary sizes are quantized so that they can be coded in just one +byte (@pxref{coded-dict-size}). If the size specified does not match one of +the valid sizes, it is rounded upwards by adding up to @w{(@var{bytes} / 8)} +to it. + +For maximum compression you should use a dictionary size limit as large +as possible, but keep in mind that the decompression memory requirement +is affected at compression time by the choice of dictionary size limit. + +@item -t +@itemx --test +Check integrity of the files specified, but don't decompress them. This +really performs a trial decompression and throws away the result. Use it +together with @option{-v} to see information about the files. If a file +fails the test, does not exist, can't be opened, or is a terminal, plzip +continues testing the rest of the files. A final diagnostic is shown at +verbosity level 1 or higher if any file fails the test when testing multiple +files. + +@item -v +@itemx --verbose +Verbose mode.@* +When compressing, show the compression ratio and size for each file +processed.@* +When decompressing or testing, further -v's (up to 4) increase the +verbosity level, showing status, compression ratio, dictionary size, +decompressed size, and compressed size.@* +Two or more @option{-v} options show the progress of (de)compression, +except for single-member files. + +@item -0 .. -9 +Compression level. Set the compression parameters (dictionary size and +match length limit) as shown in the table below. The default compression +level is @option{-6}, equivalent to @w{@option{-s8MiB -m36}}. Note that +@option{-9} can be much slower than @option{-0}. These options have no +effect when decompressing, testing, or listing. + +The bidimensional parameter space of LZMA can't be mapped to a linear scale +optimal for all files. If your files are large, very repetitive, etc, you +may need to use the options @option{--dictionary-size} and +@option{--match-length} directly to achieve optimal performance. + +If several compression levels or @option{-s} or @option{-m} options are +given, the last setting is used. For example @w{@option{-9 -s64MiB}} is +equivalent to @w{@option{-s64MiB -m273}} + +@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)} +@item Level @tab Dictionary size (-s) @tab Match length limit (-m) +@item -0 @tab 64 KiB @tab 16 bytes +@item -1 @tab 1 MiB @tab 5 bytes +@item -2 @tab 1.5 MiB @tab 6 bytes +@item -3 @tab 2 MiB @tab 8 bytes +@item -4 @tab 3 MiB @tab 12 bytes +@item -5 @tab 4 MiB @tab 20 bytes +@item -6 @tab 8 MiB @tab 36 bytes +@item -7 @tab 16 MiB @tab 68 bytes +@item -8 @tab 24 MiB @tab 132 bytes +@item -9 @tab 32 MiB @tab 273 bytes +@end multitable + +@item --fast +@itemx --best +Aliases for GNU gzip compatibility. + +@item --loose-trailing +When decompressing, testing, or listing, allow trailing data whose first +bytes are so similar to the magic bytes of a lzip header that they can +be confused with a corrupt header. Use this option if a file triggers a +"corrupt header" error and the cause is not indeed a corrupt header. + +@item --in-slots=@var{n} +Number of @w{1 MiB} input packets buffered per worker thread when +decompressing from non-seekable input. Increasing the number of packets +may increase decompression speed, but requires more memory. Valid values +range from 1 to 64. The default value is 4. + +@item --out-slots=@var{n} +Number of @w{1 MiB} output packets buffered per worker thread when +decompressing to non-seekable output. Increasing the number of packets +may increase decompression speed, but requires more memory. Valid values +range from 1 to 1024. The default value is 64. + +@item --check-lib +Compare the +@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib} +used to compile plzip with the version actually being used at run time and +exit. Report any differences found. Exit with error status 1 if differences +are found. A mismatch may indicate that lzlib is not correctly installed or +that a different version of lzlib has been installed after compiling plzip. +Exit with error status 2 if LZ_API_VERSION and LZ_version_string don't +match. @w{@samp{plzip -v --check-lib}} shows the version of lzlib being used +and the value of LZ_API_VERSION (if defined). +@ifnothtml +@xref{Library version,,,lzlib}. +@end ifnothtml + +@end table + +Numbers given as arguments to options may be expressed in decimal, +hexadecimal, or octal (using the same syntax as integer constants in C++), +and may be followed by a multiplier and an optional @samp{B} for "byte". + +Table of SI and binary prefixes (unit multipliers): + +@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} +@item Prefix @tab Value @tab | @tab Prefix @tab Value +@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) +@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) +@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) +@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40) +@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50) +@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60) +@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70) +@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80) +@item R @tab ronnabyte (10^27) @tab | @tab Ri @tab robibyte (2^90) +@item Q @tab quettabyte (10^30) @tab | @tab Qi @tab quebibyte (2^100) +@end multitable + +@sp 1 +Exit status: 0 for a normal exit, 1 for environmental problems +(file not found, invalid command-line options, I/O errors, etc), 2 to +indicate a corrupt or invalid input file, 3 for an internal consistency +error (e.g., bug) which caused plzip to panic. + + +@node Program design +@chapter Internal structure of plzip +@cindex program design + +When compressing, plzip divides the input file into chunks and compresses as +many chunks simultaneously as worker threads are chosen, creating a +multimember compressed file. Each chunk is compressed in-place (using the +same buffer for input and output), reducing the amount of RAM required. + +When decompressing, plzip decompresses as many members simultaneously as +worker threads are chosen. Files that were compressed with lzip are not +decompressed faster than using lzip (unless the option @option{-b} was used) +because lzip usually produces single-member files, which can't be +decompressed in parallel. + +For each input file, a splitter thread and several worker threads are +created, acting the main thread as muxer (multiplexer) thread. A "packet +courier" takes care of data transfers among threads and limits the +maximum number of data blocks (packets) being processed simultaneously. + +The splitter reads data blocks from the input file, and distributes them +to the workers. The workers (de)compress the blocks received from the +splitter. The muxer collects processed packets from the workers, and +writes them to the output file. + +@verbatim + .------------. + ,-->| worker 0 |--, + | `------------' | +.-------. .----------. | .------------. | .-------. .--------. +| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output | +| file | `----------' | `------------' | `-------' | file | +`-------' | ... | `--------' + | .------------. | + `-->| worker N-1 |--' + `------------' +@end verbatim + +When decompressing from a regular file, the splitter is removed and the +workers read directly from the input file. If the output file is also a +regular file, the muxer is also removed and the workers write directly +to the output file. With these optimizations, the use of RAM is greatly +reduced and the decompression speed of large files with many members is +only limited by the number of processors available and by I/O speed. + + +@node Memory requirements +@chapter Memory required to compress and decompress +@cindex memory requirements + +The amount of memory required @strong{per worker thread} for decompression +or testing is approximately the following: + +@itemize @bullet +@item +For decompression of a regular (seekable) file to another regular file, +or for testing of a regular file; the dictionary size. + +@item +For testing of a non-seekable file or of standard input; the dictionary +size plus @w{1 MiB} plus up to the number of @w{1 MiB} input packets +buffered (4 by default). + +@item +For decompression of a regular file to a non-seekable file or to +standard output; the dictionary size plus up to the number of @w{1 MiB} +output packets buffered (64 by default). + +@item +For decompression of a non-seekable file or of standard input; the +dictionary size plus @w{1 MiB} plus up to the number of @w{1 MiB} input +and output packets buffered (68 by default). +@end itemize + +@noindent +The amount of memory required @strong{per worker thread} for compression +is approximately the following: + +@itemize @bullet +@item +For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size +(@pxref{--data-size}). Default is @w{4.875 MiB}. + +@item +For compression at other levels; 11 times the dictionary size plus 3.375 +times the data size. Default is @w{142 MiB}. +@end itemize + +@noindent +The following table shows the memory required @strong{per thread} for +compression at a given level, using the default data size for each level: + +@multitable {Level} {Memory required} +@item Level @tab Memory required +@item -0 @tab 4.875 MiB +@item -1 @tab 17.75 MiB +@item -2 @tab 26.625 MiB +@item -3 @tab 35.5 MiB +@item -4 @tab 53.25 MiB +@item -5 @tab 71 MiB +@item -6 @tab 142 MiB +@item -7 @tab 284 MiB +@item -8 @tab 426 MiB +@item -9 @tab 568 MiB +@end multitable + + +@node Minimum file sizes +@chapter Minimum file sizes required for full compression speed +@cindex minimum file sizes + +When compressing, plzip divides the input file into chunks and +compresses as many chunks simultaneously as worker threads are chosen, +creating a multimember compressed file. + +For this to work as expected (and roughly multiply the compression speed by +the number of available processors), the uncompressed file must be at least +as large as the number of worker threads times the chunk size +(@pxref{--data-size}). Else some processors do not get any data to compress, +and compression is proportionally slower. The maximum speed increase +achievable on a given file is limited by the ratio +@w{(file_size / data_size)}. For example, a tarball the size of gcc or linux +scales up to 10 or 14 processors at level -9. + +The following table shows the minimum uncompressed file size needed for +full use of N processors at a given compression level, using the default +data size for each level: + +@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} +@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256 +@item Level +@item -0 @tab 2 MiB @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 64 MiB @tab 256 MiB +@item -1 @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB @tab 512 MiB +@item -2 @tab 6 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB @tab 768 MiB +@item -3 @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB @tab 1 GiB +@item -4 @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB @tab 1.5 GiB +@item -5 @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB @tab 2 GiB +@item -6 @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB @tab 4 GiB +@item -7 @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB @tab 8 GiB +@item -8 @tab 96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB @tab 12 GiB +@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB @tab 16 GiB +@end multitable + + +@node File format +@chapter File format +@cindex file format + +Perfection is reached, not when there is no longer anything to add, but +when there is no longer anything to take away.@* +--- Antoine de Saint-Exupery + +@sp 1 +In the diagram below, a box like this: + +@verbatim ++---+ +| | <-- the vertical bars might be missing ++---+ +@end verbatim + +represents one byte; a box like this: + +@verbatim ++==============+ +| | ++==============+ +@end verbatim + +represents a variable number of bytes. + +@sp 1 +A lzip file consists of one or more independent "members" (compressed data +sets). The members simply appear one after another in the file, with no +additional information before, between, or after them. Each member can +encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data. +The size of a multimember file is unlimited. + +Each member has the following structure: + +@verbatim ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | ++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +@end verbatim + +All multibyte values are stored in little endian order. + +@table @samp +@item ID string (the "magic" bytes) +A four byte string, identifying the lzip format, with the value "LZIP" +(0x4C, 0x5A, 0x49, 0x50). + +@item VN (version number, 1 byte) +Just in case something needs to be modified in the future. 1 for now. + +@anchor{coded-dict-size} +@item DS (coded dictionary size, 1 byte) +The dictionary size is calculated by taking a power of 2 (the base size) +and subtracting from it a fraction between 0/16 and 7/16 of the base size.@* +Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@* +Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract +from the base size to obtain the dictionary size.@* +Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* +Valid values for dictionary size range from 4 KiB to 512 MiB. + +@item LZMA stream +The LZMA stream, finished by an "End Of Stream" marker. Uses default values +for encoder properties. +@ifnothtml +@xref{Stream format,,,lzip}, +@end ifnothtml +@ifhtml +See +@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format} +@end ifhtml +for a complete description. + +@item CRC32 (4 bytes) +Cyclic Redundancy Check (CRC) of the original uncompressed data. + +@item Data size (8 bytes) +Size of the original uncompressed data. + +@item Member size (8 bytes) +Total size of the member, including header and trailer. This field acts +as a distributed index, improves the checking of stream integrity, and +facilitates the safe recovery of undamaged members from multimember files. +Lzip limits the member size to @w{2 PiB} to prevent the data size field from +overflowing. + +@end table + + +@node Trailing data +@chapter Extra data appended to the file +@cindex trailing data + +Sometimes extra data are found appended to a lzip file after the last +member. Such trailing data may be: + +@itemize @bullet +@item +Padding added to make the file size a multiple of some block size, for +example when writing to a tape. It is safe to append any amount of +padding zero bytes to a lzip file. + +@item +Useful data added by the user; an "End Of File" string (to check that the +file has not been truncated), a cryptographically secure hash, a description +of file contents, etc. It is safe to append any amount of text to a lzip +file as long as none of the first four bytes of the text matches the +corresponding byte in the string "LZIP", and the text does not contain any +zero bytes (null characters). Nonzero bytes and zero bytes can't be safely +mixed in trailing data. + +@item +Garbage added by some not totally successful copy operation. + +@item +Malicious data added to the file in order to make its total size and +hash value (for a chosen hash) coincide with those of another file. + +@item +In rare cases, trailing data could be the corrupt header of another +member. In multimember or concatenated files the probability of +corruption happening in the magic bytes is 5 times smaller than the +probability of getting a false positive caused by the corruption of the +integrity information itself. Therefore it can be considered to be below +the noise level. Additionally, the test used by plzip to discriminate +trailing data from a corrupt header has a Hamming distance (HD) of 3, +and the 3 bit flips must happen in different magic bytes for the test to +fail. In any case, the option @option{--trailing-error} guarantees that +any corrupt header is detected. +@end itemize + +Trailing data are in no way part of the lzip file format, but tools +reading lzip files are expected to behave as correctly and usefully as +possible in the presence of trailing data. + +Trailing data can be safely ignored in most cases. In some cases, like +that of user-added data, they are expected to be ignored. In those cases +where a file containing trailing data must be rejected, the option +@option{--trailing-error} can be used. @xref{--trailing-error}. + + +@node Examples +@chapter A small tutorial with examples +@cindex examples + +WARNING! Even if plzip is bug-free, other causes may result in a corrupt +compressed file (bugs in the system libraries, memory errors, etc). +Therefore, if the data you are going to compress are important, give the +option @option{--keep} to plzip and don't remove the original file until you +check the compressed file with a command like +@w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during +compression can only be detected by comparing the compressed file with the +original because the corruption happens before plzip compresses the RAM +contents, resulting in a valid compressed file containing wrong data. + +@sp 1 +@noindent +Example 1: Extract all the files from archive @samp{foo.tar.lz}. + +@example + tar -xf foo.tar.lz +or + plzip -cd foo.tar.lz | tar -xf - +@end example + +@sp 1 +@noindent +Example 2: Replace a regular file with its compressed version @samp{file.lz} +and show the compression ratio. + +@example +plzip -v file +@end example + +@sp 1 +@noindent +Example 3: Like example 2 but the created @samp{file.lz} has a block size of +@w{1 MiB}. The compression ratio is not shown. + +@example +plzip -B 1MiB file +@end example + +@sp 1 +@noindent +Example 4: Restore a regular file from its compressed version +@samp{file.lz}. If the operation is successful, @samp{file.lz} is removed. + +@example +plzip -d file.lz +@end example + +@sp 1 +@noindent +Example 5: Check the integrity of the compressed file @samp{file.lz} and +show status. + +@example +plzip -tv file.lz +@end example + +@sp 1 +@anchor{concat-example} +@noindent +Example 6: The right way of concatenating the decompressed output of two or +more compressed files. @xref{Trailing data}. + +@example +Don't do this + cat file1.lz file2.lz file3.lz | plzip -d - +Do this instead + plzip -cd file1.lz file2.lz file3.lz +@end example + +@sp 1 +@noindent +Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of +decompressed data are produced. + +@example +plzip -cd file.lz | dd bs=1024 count=10 +@end example + +@sp 1 +@noindent +Example 8: Decompress @samp{file.lz} partially from decompressed byte at +offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced). + +@example +plzip -cd file.lz | dd bs=1000 skip=10 count=5 +@end example + +@sp 1 +@noindent +Example 9: Compress a whole device in /dev/sdc and send the output to +@samp{file.lz}. + +@example + plzip -c /dev/sdc > file.lz +or + plzip /dev/sdc -o file.lz +@end example + + +@node Problems +@chapter Reporting bugs +@cindex bugs +@cindex getting help + +There are probably bugs in plzip. There are certainly errors and +omissions in this manual. If you report them, they will get fixed. If +you don't, no one will ever know about them and they will remain unfixed +for all eternity, if not longer. + +If you find a bug in plzip, please send electronic mail to +@email{lzip-bug@@nongnu.org}. Include the version number, which you can +find by running @w{@samp{plzip --version}} and +@w{@samp{plzip -v --check-lib}}. + + +@node Concept index +@unnumbered Concept index + +@printindex cp + +@bye @@ -0,0 +1,114 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <cstdio> +#include <cstring> +#include <string> +#include <vector> +#include <stdint.h> +#include <unistd.h> +#include <sys/stat.h> + +#include "lzip.h" +#include "lzip_index.h" + + +namespace { + +void list_line( const unsigned long long uncomp_size, + const unsigned long long comp_size, + const char * const input_filename ) + { + if( uncomp_size > 0 ) + std::printf( "%14llu %14llu %6.2f%% %s\n", uncomp_size, comp_size, + 100.0 - ( ( 100.0 * comp_size ) / uncomp_size ), + input_filename ); + else + std::printf( "%14llu %14llu -INF%% %s\n", uncomp_size, comp_size, + input_filename ); + } + +} // end namespace + + +int list_files( const std::vector< std::string > & filenames, + const Cl_options & cl_opts ) + { + unsigned long long total_comp = 0, total_uncomp = 0; + int files = 0, retval = 0; + bool first_post = true; + bool stdin_used = false; + + for( unsigned i = 0; i < filenames.size(); ++i ) + { + const bool from_stdin = ( filenames[i] == "-" ); + if( from_stdin ) { if( stdin_used ) continue; else stdin_used = true; } + const char * const input_filename = + from_stdin ? "(stdin)" : filenames[i].c_str(); + struct stat in_stats; // not used + const int infd = from_stdin ? STDIN_FILENO : + open_instream( input_filename, &in_stats, false, true ); + if( infd < 0 ) { set_retval( retval, 1 ); continue; } + + const Lzip_index lzip_index( infd, cl_opts ); + close( infd ); + if( lzip_index.retval() != 0 ) + { + show_file_error( input_filename, lzip_index.error().c_str() ); + set_retval( retval, lzip_index.retval() ); + continue; + } + if( verbosity < 0 ) continue; + const unsigned long long udata_size = lzip_index.udata_size(); + const unsigned long long cdata_size = lzip_index.cdata_size(); + total_comp += cdata_size; total_uncomp += udata_size; ++files; + const long members = lzip_index.members(); + if( first_post ) + { + first_post = false; + if( verbosity >= 1 ) std::fputs( " dict memb trail ", stdout ); + std::fputs( " uncompressed compressed saved name\n", stdout ); + } + if( verbosity >= 1 ) + std::printf( "%s %5ld %6lld ", format_ds( lzip_index.dictionary_size() ), + members, lzip_index.file_size() - cdata_size ); + list_line( udata_size, cdata_size, input_filename ); + + if( verbosity >= 2 && members > 1 ) + { + std::fputs( " member data_pos data_size member_pos member_size\n", stdout ); + for( long i = 0; i < members; ++i ) + { + const Block & db = lzip_index.dblock( i ); + const Block & mb = lzip_index.mblock( i ); + std::printf( "%6ld %14llu %14llu %14llu %14llu\n", + i + 1, db.pos(), db.size(), mb.pos(), mb.size() ); + } + first_post = true; // reprint heading after list of members + } + std::fflush( stdout ); + } + if( verbosity >= 0 && files > 1 ) + { + if( verbosity >= 1 ) std::fputs( " ", stdout ); + list_line( total_uncomp, total_comp, "(totals)" ); + std::fflush( stdout ); + } + return retval; + } @@ -0,0 +1,340 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#include <pthread.h> + +enum { + min_dictionary_bits = 12, + min_dictionary_size = 1 << min_dictionary_bits, // >= modeled_distances + max_dictionary_bits = 29, + max_dictionary_size = 1 << max_dictionary_bits, + min_member_size = 36 }; + + +// defined in main.cc +extern int verbosity; + +class Pretty_print // requires global var 'int verbosity' + { + std::string name_; + std::string padded_name; + const char * const stdin_name; + unsigned longest_name; + mutable bool first_post; + +public: + Pretty_print( const std::vector< std::string > & filenames ) + : stdin_name( "(stdin)" ), longest_name( 0 ), first_post( false ) + { + if( verbosity <= 0 ) return; + const unsigned stdin_name_len = std::strlen( stdin_name ); + for( unsigned i = 0; i < filenames.size(); ++i ) + { + const std::string & s = filenames[i]; + const unsigned len = ( s == "-" ) ? stdin_name_len : s.size(); + if( longest_name < len ) longest_name = len; + } + if( longest_name == 0 ) longest_name = stdin_name_len; + } + + void set_name( const std::string & filename ) + { + if( filename.size() && filename != "-" ) name_ = filename; + else name_ = stdin_name; + padded_name = " "; padded_name += name_; padded_name += ": "; + if( longest_name > name_.size() ) + padded_name.append( longest_name - name_.size(), ' ' ); + first_post = true; + } + + void reset() const { if( name_.size() ) first_post = true; } + const char * name() const { return name_.c_str(); } + void operator()( const char * const msg = 0 ) const; + }; + + +inline bool isvalid_ds( const unsigned dictionary_size ) + { return dictionary_size >= min_dictionary_size && + dictionary_size <= max_dictionary_size; } + + +inline int real_bits( unsigned value ) + { + int bits = 0; + while( value > 0 ) { value >>= 1; ++bits; } + return bits; + } + + +const uint8_t lzip_magic[4] = { 0x4C, 0x5A, 0x49, 0x50 }; // "LZIP" + +struct Lzip_header + { + enum { size = 6 }; + uint8_t data[size]; // 0-3 magic bytes + // 4 version + // 5 coded dictionary size + + void set_magic() { std::memcpy( data, lzip_magic, 4 ); data[4] = 1; } + bool check_magic() const { return std::memcmp( data, lzip_magic, 4 ) == 0; } + + bool check_prefix( const int sz ) const // detect (truncated) header + { + for( int i = 0; i < sz && i < 4; ++i ) + if( data[i] != lzip_magic[i] ) return false; + return sz > 0; + } + + bool check_corrupt() const // detect corrupt header + { + int matches = 0; + for( int i = 0; i < 4; ++i ) + if( data[i] == lzip_magic[i] ) ++matches; + return matches > 1 && matches < 4; + } + + uint8_t version() const { return data[4]; } + bool check_version() const { return data[4] == 1; } + + unsigned dictionary_size() const + { + unsigned sz = 1 << ( data[5] & 0x1F ); + if( sz > min_dictionary_size ) + sz -= ( sz / 16 ) * ( ( data[5] >> 5 ) & 7 ); + return sz; + } + + bool dictionary_size( const unsigned sz ) + { + if( !isvalid_ds( sz ) ) return false; + data[5] = real_bits( sz - 1 ); + if( sz > min_dictionary_size ) + { + const unsigned base_size = 1 << data[5]; + const unsigned fraction = base_size / 16; + for( unsigned i = 7; i >= 1; --i ) + if( base_size - ( i * fraction ) >= sz ) + { data[5] |= i << 5; break; } + } + return true; + } + + bool check() const + { return check_magic() && check_version() && + isvalid_ds( dictionary_size() ); } + }; + + +struct Lzip_trailer + { + enum { size = 20 }; + uint8_t data[size]; // 0-3 CRC32 of the uncompressed data + // 4-11 size of the uncompressed data + // 12-19 member size including header and trailer + + unsigned data_crc() const + { + unsigned tmp = 0; + for( int i = 3; i >= 0; --i ) { tmp <<= 8; tmp += data[i]; } + return tmp; + } + + void data_crc( unsigned crc ) + { for( int i = 0; i <= 3; ++i ) { data[i] = (uint8_t)crc; crc >>= 8; } } + + unsigned long long data_size() const + { + unsigned long long tmp = 0; + for( int i = 11; i >= 4; --i ) { tmp <<= 8; tmp += data[i]; } + return tmp; + } + + void data_size( unsigned long long sz ) + { for( int i = 4; i <= 11; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } } + + unsigned long long member_size() const + { + unsigned long long tmp = 0; + for( int i = 19; i >= 12; --i ) { tmp <<= 8; tmp += data[i]; } + return tmp; + } + + void member_size( unsigned long long sz ) + { for( int i = 12; i <= 19; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } } + + bool check_consistency() const // check internal consistency + { + const unsigned crc = data_crc(); + const unsigned long long dsize = data_size(); + if( ( crc == 0 ) != ( dsize == 0 ) ) return false; + const unsigned long long msize = member_size(); + if( msize < min_member_size ) return false; + const unsigned long long mlimit = ( 9 * dsize + 7 ) / 8 + min_member_size; + if( mlimit > dsize && msize > mlimit ) return false; + const unsigned long long dlimit = 7090 * ( msize - 26 ) - 1; + if( dlimit > msize && dsize > dlimit ) return false; + return true; + } + }; + + +struct Cl_options // command-line options + { + bool ignore_trailing; + bool loose_trailing; + + Cl_options() : ignore_trailing( true ), loose_trailing( false ) {} + }; + + +inline void set_retval( int & retval, const int new_val ) + { if( retval < new_val ) retval = new_val; } + +const char * const bad_magic_msg = "Bad magic number (file not in lzip format)."; +const char * const bad_dict_msg = "Invalid dictionary size in member header."; +const char * const corrupt_mm_msg = "Corrupt header in multimember file."; +const char * const trailing_msg = "Trailing data not allowed."; +const char * const mem_msg = "Not enough memory."; + +// defined in compress.cc +int readblock( const int fd, uint8_t * const buf, const int size ); +int writeblock( const int fd, const uint8_t * const buf, const int size ); +void xinit_mutex( pthread_mutex_t * const mutex ); +void xinit_cond( pthread_cond_t * const cond ); +void xdestroy_mutex( pthread_mutex_t * const mutex ); +void xdestroy_cond( pthread_cond_t * const cond ); +void xlock( pthread_mutex_t * const mutex ); +void xunlock( pthread_mutex_t * const mutex ); +void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex ); +void xsignal( pthread_cond_t * const cond ); +void xbroadcast( pthread_cond_t * const cond ); +int compress( const unsigned long long cfile_size, + const int data_size, const int dictionary_size, + const int match_len_limit, const int num_workers, + const int infd, const int outfd, + const Pretty_print & pp, const int debug_level ); + +// defined in lzip_index.cc +class Lzip_index; // forward declaration + +// defined in dec_stdout.cc +int dec_stdout( const int num_workers, const int infd, const int outfd, + const Pretty_print & pp, const int debug_level, + const int out_slots, const Lzip_index & lzip_index ); + +// defined in dec_stream.cc +int dec_stream( const unsigned long long cfile_size, const int num_workers, + const int infd, const int outfd, const Cl_options & cl_opts, + const Pretty_print & pp, const int debug_level, + const int in_slots, const int out_slots ); + +// defined in decompress.cc +int preadblock( const int fd, uint8_t * const buf, const int size, + const long long pos ); +class Shared_retval; +void decompress_error( struct LZ_Decoder * const decoder, + const Pretty_print & pp, + Shared_retval & shared_retval, const int worker_id ); +void show_results( const unsigned long long in_size, + const unsigned long long out_size, + const unsigned dictionary_size, const bool testing ); +int decompress( const unsigned long long cfile_size, int num_workers, + const int infd, const int outfd, const Cl_options & cl_opts, + const Pretty_print & pp, const int debug_level, + const int in_slots, const int out_slots, + const bool infd_isreg, const bool one_to_one ); + +// defined in list.cc +int list_files( const std::vector< std::string > & filenames, + const Cl_options & cl_opts ); + +// defined in main.cc +struct stat; +const char * bad_version( const unsigned version ); +const char * format_ds( const unsigned dictionary_size ); +void show_header( const unsigned dictionary_size ); +int open_instream( const char * const name, struct stat * const in_statsp, + const bool one_to_one, const bool reg_only = false ); +void cleanup_and_fail( const int retval = 1 ); // terminate the program +void show_error( const char * const msg, const int errcode = 0, + const bool help = false ); +void show_file_error( const char * const filename, const char * const msg, + const int errcode = 0 ); +void internal_error( const char * const msg ); +void show_progress( const unsigned long long packet_size, + const unsigned long long cfile_size = 0, + const Pretty_print * const p = 0 ); + + +class Slot_tally + { + const int num_slots; // total slots + int num_free; // remaining free slots + pthread_mutex_t mutex; + pthread_cond_t slot_av; // slot available + + Slot_tally( const Slot_tally & ); // declared as private + void operator=( const Slot_tally & ); // declared as private + +public: + explicit Slot_tally( const int slots ) + : num_slots( slots ), num_free( slots ) + { xinit_mutex( &mutex ); xinit_cond( &slot_av ); } + + ~Slot_tally() { xdestroy_cond( &slot_av ); xdestroy_mutex( &mutex ); } + + bool all_free() { return num_free == num_slots; } + + void get_slot() // wait for a free slot + { + xlock( &mutex ); + while( num_free <= 0 ) xwait( &slot_av, &mutex ); + --num_free; + xunlock( &mutex ); + } + + void leave_slot() // return a slot to the tally + { + xlock( &mutex ); + if( ++num_free == 1 ) xsignal( &slot_av ); // num_free was 0 + xunlock( &mutex ); + } + }; + + +class Shared_retval // shared return value protected by a mutex + { + int retval; + pthread_mutex_t mutex; + + Shared_retval( const Shared_retval & ); // declared as private + void operator=( const Shared_retval & ); // declared as private + +public: + Shared_retval() : retval( 0 ) { xinit_mutex( &mutex ); } + + bool set_value( const int val ) // only one thread can set retval > 0 + { // (and print an error message) + xlock( &mutex ); + const bool done = ( retval == 0 && val > 0 ); + if( done ) retval = val; + xunlock( &mutex ); + return done; + } + + int operator()() const { return retval; } + }; diff --git a/lzip_index.cc b/lzip_index.cc new file mode 100644 index 0000000..8d1aa0c --- /dev/null +++ b/lzip_index.cc @@ -0,0 +1,209 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <algorithm> +#include <cerrno> +#include <cstdio> +#include <cstring> +#include <string> +#include <vector> +#include <stdint.h> +#include <unistd.h> + +#include "lzip.h" +#include "lzip_index.h" + + +namespace { + +int seek_read( const int fd, uint8_t * const buf, const int size, + const long long pos ) + { + if( lseek( fd, pos, SEEK_SET ) == pos ) + return readblock( fd, buf, size ); + return 0; + } + +} // end namespace + + +bool Lzip_index::check_header( const Lzip_header & header, const bool first ) + { + if( !header.check_magic() ) + { error_ = bad_magic_msg; retval_ = 2; if( first ) bad_magic_ = true; + return false; } + if( !header.check_version() ) + { error_ = bad_version( header.version() ); retval_ = 2; return false; } + if( !isvalid_ds( header.dictionary_size() ) ) + { error_ = bad_dict_msg; retval_ = 2; return false; } + return true; + } + +void Lzip_index::set_errno_error( const char * const msg ) + { + error_ = msg; error_ += std::strerror( errno ); + retval_ = 1; + } + +void Lzip_index::set_num_error( const char * const msg, unsigned long long num ) + { + char buf[80]; + snprintf( buf, sizeof buf, "%s%llu", msg, num ); + error_ = buf; + retval_ = 2; + } + + +bool Lzip_index::read_header( const int fd, Lzip_header & header, + const long long pos ) + { + if( seek_read( fd, header.data, header.size, pos ) != header.size ) + { set_errno_error( "Error reading member header: " ); return false; } + return true; + } + + +// If successful, push last member and set pos to member header. +bool Lzip_index::skip_trailing_data( const int fd, unsigned long long & pos, + const Cl_options & cl_opts ) + { + if( pos < min_member_size ) return false; + enum { block_size = 16384, + buffer_size = block_size + Lzip_trailer::size - 1 + Lzip_header::size }; + uint8_t buffer[buffer_size]; + int bsize = pos % block_size; // total bytes in buffer + if( bsize <= buffer_size - block_size ) bsize += block_size; + int search_size = bsize; // bytes to search for trailer + int rd_size = bsize; // bytes to read from file + unsigned long long ipos = pos - rd_size; // aligned to block_size + + while( true ) + { + if( seek_read( fd, buffer, rd_size, ipos ) != rd_size ) + { set_errno_error( "Error seeking member trailer: " ); return false; } + const uint8_t max_msb = ( ipos + search_size ) >> 56; + for( int i = search_size; i >= Lzip_trailer::size; --i ) + if( buffer[i-1] <= max_msb ) // most significant byte of member_size + { + const Lzip_trailer & trailer = + *(const Lzip_trailer *)( buffer + i - trailer.size ); + const unsigned long long member_size = trailer.member_size(); + if( member_size == 0 ) // skip trailing zeros + { while( i > trailer.size && buffer[i-9] == 0 ) --i; continue; } + if( member_size > ipos + i || !trailer.check_consistency() ) continue; + Lzip_header header; + if( !read_header( fd, header, ipos + i - member_size ) ) return false; + if( !header.check() ) continue; + const Lzip_header & header2 = *(const Lzip_header *)( buffer + i ); + const bool full_h2 = bsize - i >= header.size; + if( header2.check_prefix( bsize - i ) ) // last member + { + if( !full_h2 ) error_ = "Last member in input file is truncated."; + else if( check_header( header2, false ) ) + error_ = "Last member in input file is truncated or corrupt."; + retval_ = 2; return false; + } + if( !cl_opts.loose_trailing && full_h2 && header2.check_corrupt() ) + { error_ = corrupt_mm_msg; retval_ = 2; return false; } + if( !cl_opts.ignore_trailing ) + { error_ = trailing_msg; retval_ = 2; return false; } + pos = ipos + i - member_size; // good member + const unsigned dictionary_size = header.dictionary_size(); + if( dictionary_size_ < dictionary_size ) + dictionary_size_ = dictionary_size; + member_vector.push_back( Member( 0, trailer.data_size(), pos, + member_size, dictionary_size ) ); + return true; + } + if( ipos == 0 ) + { set_num_error( "Bad trailer at pos ", pos - Lzip_trailer::size ); + return false; } + bsize = buffer_size; + search_size = bsize - Lzip_header::size; + rd_size = block_size; + ipos -= rd_size; + std::memcpy( buffer + rd_size, buffer, buffer_size - rd_size ); + } + } + + +Lzip_index::Lzip_index( const int infd, const Cl_options & cl_opts ) + : insize( lseek( infd, 0, SEEK_END ) ), retval_( 0 ), dictionary_size_( 0 ), + bad_magic_( false ) + { + if( insize < 0 ) + { set_errno_error( "Input file is not seekable: " ); return; } + if( insize < min_member_size ) + { error_ = "Input file is too short."; retval_ = 2; return; } + if( insize > INT64_MAX ) + { error_ = "Input file is too long (2^63 bytes or more)."; + retval_ = 2; return; } + + Lzip_header header; + if( !read_header( infd, header, 0 ) || + !check_header( header, true ) ) return; + + unsigned long long pos = insize; // always points to a header or to EOF + while( pos >= min_member_size ) + { + Lzip_trailer trailer; + if( seek_read( infd, trailer.data, trailer.size, pos - trailer.size ) != + trailer.size ) + { set_errno_error( "Error reading member trailer: " ); break; } + const unsigned long long member_size = trailer.member_size(); + if( member_size > pos || !trailer.check_consistency() ) // bad trailer + { + if( member_vector.empty() ) + { if( skip_trailing_data( infd, pos, cl_opts ) ) continue; return; } + set_num_error( "Bad trailer at pos ", pos - trailer.size ); break; + } + if( !read_header( infd, header, pos - member_size ) ) break; + if( !header.check() ) // bad header + { + if( member_vector.empty() ) + { if( skip_trailing_data( infd, pos, cl_opts ) ) continue; return; } + set_num_error( "Bad header at pos ", pos - member_size ); break; + } + pos -= member_size; // good member + const unsigned dictionary_size = header.dictionary_size(); + if( dictionary_size_ < dictionary_size ) + dictionary_size_ = dictionary_size; + member_vector.push_back( Member( 0, trailer.data_size(), pos, + member_size, dictionary_size ) ); + } + if( pos != 0 || member_vector.empty() || retval_ != 0 ) + { + member_vector.clear(); + if( retval_ == 0 ) { error_ = "Can't create file index."; retval_ = 2; } + return; + } + std::reverse( member_vector.begin(), member_vector.end() ); + for( unsigned long i = 0; ; ++i ) + { + const long long end = member_vector[i].dblock.end(); + if( end < 0 || end > INT64_MAX ) + { + member_vector.clear(); + error_ = "Data in input file is too long (2^63 bytes or more)."; + retval_ = 2; return; + } + if( i + 1 >= member_vector.size() ) break; + member_vector[i+1].dblock.pos( end ); + } + } diff --git a/lzip_index.h b/lzip_index.h new file mode 100644 index 0000000..a994f1c --- /dev/null +++ b/lzip_index.h @@ -0,0 +1,94 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ + +#ifndef INT64_MAX +#define INT64_MAX 0x7FFFFFFFFFFFFFFFLL +#endif + + +class Block + { + long long pos_, size_; // pos >= 0, size >= 0, pos + size <= INT64_MAX + +public: + Block( const long long p, const long long s ) : pos_( p ), size_( s ) {} + + long long pos() const { return pos_; } + long long size() const { return size_; } + long long end() const { return pos_ + size_; } + + void pos( const long long p ) { pos_ = p; } + void size( const long long s ) { size_ = s; } + }; + + +class Lzip_index + { + struct Member + { + Block dblock, mblock; // data block, member block + unsigned dictionary_size; + + Member( const long long dpos, const long long dsize, + const long long mpos, const long long msize, + const unsigned dict_size ) + : dblock( dpos, dsize ), mblock( mpos, msize ), + dictionary_size( dict_size ) {} + }; + + std::vector< Member > member_vector; + std::string error_; + const long long insize; + int retval_; + unsigned dictionary_size_; // largest dictionary size in the file + bool bad_magic_; // bad magic in first header + + bool check_header( const Lzip_header & header, const bool first ); + void set_errno_error( const char * const msg ); + void set_num_error( const char * const msg, unsigned long long num ); + bool read_header( const int fd, Lzip_header & header, const long long pos ); + bool skip_trailing_data( const int fd, unsigned long long & pos, + const Cl_options & cl_opts ); + +public: + Lzip_index( const int infd, const Cl_options & cl_opts ); + + long members() const { return member_vector.size(); } + const std::string & error() const { return error_; } + int retval() const { return retval_; } + unsigned dictionary_size() const { return dictionary_size_; } + bool bad_magic() const { return bad_magic_; } + + long long udata_size() const + { if( member_vector.empty() ) return 0; + return member_vector.back().dblock.end(); } + + long long cdata_size() const + { if( member_vector.empty() ) return 0; + return member_vector.back().mblock.end(); } + + // total size including trailing data (if any) + long long file_size() const + { if( insize >= 0 ) return insize; else return 0; } + + const Block & dblock( const long i ) const + { return member_vector[i].dblock; } + const Block & mblock( const long i ) const + { return member_vector[i].mblock; } + unsigned dictionary_size( const long i ) const + { return member_vector[i].dictionary_size; } + }; @@ -0,0 +1,1016 @@ +/* Plzip - Massively parallel implementation of lzip + Copyright (C) 2009 Laszlo Ersek. + Copyright (C) 2009-2024 Antonio Diaz Diaz. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. +*/ +/* + Exit status: 0 for a normal exit, 1 for environmental problems + (file not found, invalid command-line options, I/O errors, etc), 2 to + indicate a corrupt or invalid input file, 3 for an internal consistency + error (e.g., bug) which caused plzip to panic. +*/ + +#define _FILE_OFFSET_BITS 64 + +#include <algorithm> +#include <cerrno> +#include <climits> // SSIZE_MAX +#include <csignal> +#include <cstdio> +#include <cstdlib> +#include <cstring> +#include <string> +#include <vector> +#include <fcntl.h> +#include <stdint.h> // SIZE_MAX +#include <unistd.h> +#include <utime.h> +#include <sys/stat.h> +#include <lzlib.h> +#if defined __MSVCRT__ || defined __OS2__ +#include <io.h> +#if defined __MSVCRT__ +#define fchmod(x,y) 0 +#define fchown(x,y,z) 0 +#define strtoull std::strtoul +#define SIGHUP SIGTERM +#define S_ISSOCK(x) 0 +#ifndef S_IRGRP +#define S_IRGRP 0 +#define S_IWGRP 0 +#define S_IROTH 0 +#define S_IWOTH 0 +#endif +#endif +#endif + +#include "arg_parser.h" +#include "lzip.h" + +#ifndef O_BINARY +#define O_BINARY 0 +#endif + +#if CHAR_BIT != 8 +#error "Environments where CHAR_BIT != 8 are not supported." +#endif + +#if ( defined SIZE_MAX && SIZE_MAX < UINT_MAX ) || \ + ( defined SSIZE_MAX && SSIZE_MAX < INT_MAX ) +#error "Environments where 'size_t' is narrower than 'int' are not supported." +#endif + +int verbosity = 0; + +namespace { + +const char * const program_name = "plzip"; +const char * const program_year = "2024"; +const char * invocation_name = program_name; // default value + +const struct { const char * from; const char * to; } known_extensions[] = { + { ".lz", "" }, + { ".tlz", ".tar" }, + { 0, 0 } }; + +struct Lzma_options + { + int dictionary_size; // 4 KiB .. 512 MiB + int match_len_limit; // 5 .. 273 + }; + +enum Mode { m_compress, m_decompress, m_list, m_test }; + +/* Variables used in signal handler context. + They are not declared volatile because the handler never returns. */ +std::string output_filename; +int outfd = -1; +bool delete_output_on_interrupt = false; + + +void show_help( const long num_online ) + { + std::printf( "Plzip is a massively parallel (multi-threaded) implementation of lzip,\n" + "compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.\n" + "\nLzip is a lossless data compressor with a user interface similar to the one\n" + "of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov\n" + "chain-Algorithm' (LZMA) stream format to maximize interoperability. The\n" + "maximum dictionary size is 512 MiB so that any lzip file can be decompressed\n" + "on 32-bit machines. Lzip provides accurate and robust 3-factor integrity\n" + "checking. Lzip can compress about as fast as gzip (lzip -0) or compress most\n" + "files more than bzip2 (lzip -9). Decompression speed is intermediate between\n" + "gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery\n" + "perspective. Lzip has been designed, written, and tested with great care to\n" + "replace gzip and bzip2 as the standard general-purpose compressed format for\n" + "Unix-like systems.\n" + "\nPlzip can compress/decompress large files on multiprocessor machines much\n" + "faster than lzip, at the cost of a slightly reduced compression ratio (0.4\n" + "to 2 percent larger compressed files). Note that the number of usable\n" + "threads is limited by file size; on files larger than a few GB plzip can use\n" + "hundreds of processors, but on files of only a few MB plzip is no faster\n" + "than lzip.\n" + "\nUsage: %s [options] [files]\n", invocation_name ); + std::printf( "\nOptions:\n" + " -h, --help display this help and exit\n" + " -V, --version output version information and exit\n" + " -a, --trailing-error exit with error status if trailing data\n" + " -B, --data-size=<bytes> set size of input data blocks [2x8=16 MiB]\n" + " -c, --stdout write to standard output, keep input files\n" + " -d, --decompress decompress, test compressed file integrity\n" + " -f, --force overwrite existing output files\n" + " -F, --recompress force re-compression of compressed files\n" + " -k, --keep keep (don't delete) input files\n" + " -l, --list print (un)compressed file sizes\n" + " -m, --match-length=<bytes> set match length limit in bytes [36]\n" + " -n, --threads=<n> set number of (de)compression threads [%ld]\n" + " -o, --output=<file> write to <file>, keep input files\n" + " -q, --quiet suppress all messages\n" + " -s, --dictionary-size=<bytes> set dictionary size limit in bytes [8 MiB]\n" + " -t, --test test compressed file integrity\n" + " -v, --verbose be verbose (a 2nd -v gives more)\n" + " -0 .. -9 set compression level [default 6]\n" + " --fast alias for -0\n" + " --best alias for -9\n" + " --loose-trailing allow trailing data seeming corrupt header\n" + " --in-slots=<n> number of 1 MiB input packets buffered [4]\n" + " --out-slots=<n> number of 1 MiB output packets buffered [64]\n" + " --check-lib compare version of lzlib.h with liblz.{a,so}\n", + num_online ); + if( verbosity >= 1 ) + { + std::printf( " --debug=<level> print mode(2), debug statistics(1) to stderr\n" ); + } + std::printf( "\nIf no file names are given, or if a file is '-', plzip compresses or\n" + "decompresses from standard input to standard output.\n" + "Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,\n" + "Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...\n" + "Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 to\n" + "2^29 bytes.\n" + "\nThe bidimensional parameter space of LZMA can't be mapped to a linear scale\n" + "optimal for all files. If your files are large, very repetitive, etc, you\n" + "may need to use the options --dictionary-size and --match-length directly\n" + "to achieve optimal performance.\n" + "\nTo extract all the files from archive 'foo.tar.lz', use the commands\n" + "'tar -xf foo.tar.lz' or 'plzip -cd foo.tar.lz | tar -xf -'.\n" + "\nExit status: 0 for a normal exit, 1 for environmental problems\n" + "(file not found, invalid command-line options, I/O errors, etc), 2 to\n" + "indicate a corrupt or invalid input file, 3 for an internal consistency\n" + "error (e.g., bug) which caused plzip to panic.\n" + "\nReport bugs to lzip-bug@nongnu.org\n" + "Plzip home page: http://www.nongnu.org/lzip/plzip.html\n" ); + } + + +void show_lzlib_version() + { + std::printf( "Using lzlib %s\n", LZ_version() ); +#if !defined LZ_API_VERSION + std::fputs( "LZ_API_VERSION is not defined.\n", stdout ); +#elif LZ_API_VERSION >= 1012 + std::printf( "Using LZ_API_VERSION = %u\n", LZ_api_version() ); +#else + std::printf( "Compiled with LZ_API_VERSION = %u. " + "Using an unknown LZ_API_VERSION\n", LZ_API_VERSION ); +#endif + } + + +void show_version() + { + std::printf( "%s %s\n", program_name, PROGVERSION ); + std::printf( "Copyright (C) 2009 Laszlo Ersek.\n" ); + std::printf( "Copyright (C) %s Antonio Diaz Diaz.\n", program_year ); + std::printf( "License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>\n" + "This is free software: you are free to change and redistribute it.\n" + "There is NO WARRANTY, to the extent permitted by law.\n" ); + show_lzlib_version(); + } + + +int check_lzlib_ver() // <major>.<minor> or <major>.<minor>[a-z.-]* + { +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + const unsigned char * p = (unsigned char *)LZ_version_string; + unsigned major = 0, minor = 0; + while( major < 100000 && isdigit( *p ) ) + { major *= 10; major += *p - '0'; ++p; } + if( *p == '.' ) ++p; + else +out: { show_error( "Invalid LZ_version_string in lzlib.h" ); return 2; } + while( minor < 100 && isdigit( *p ) ) + { minor *= 10; minor += *p - '0'; ++p; } + if( *p && *p != '-' && *p != '.' && !std::islower( *p ) ) goto out; + const unsigned version = major * 1000 + minor; + if( LZ_API_VERSION != version ) + { + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: Version mismatch in lzlib.h: " + "LZ_API_VERSION = %u, should be %u.\n", + program_name, LZ_API_VERSION, version ); + return 2; + } +#endif + return 0; + } + + +int check_lib() + { + int retval = check_lzlib_ver(); + if( std::strcmp( LZ_version_string, LZ_version() ) != 0 ) + { set_retval( retval, 1 ); + if( verbosity >= 0 ) + std::printf( "warning: LZ_version_string != LZ_version() (%s vs %s)\n", + LZ_version_string, LZ_version() ); } +#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012 + if( LZ_API_VERSION != LZ_api_version() ) + { set_retval( retval, 1 ); + if( verbosity >= 0 ) + std::printf( "warning: LZ_API_VERSION != LZ_api_version() (%u vs %u)\n", + LZ_API_VERSION, LZ_api_version() ); } +#endif + if( verbosity >= 1 ) show_lzlib_version(); + return retval; + } + +} // end namespace + +void Pretty_print::operator()( const char * const msg ) const + { + if( verbosity < 0 ) return; + if( first_post ) + { + first_post = false; + std::fputs( padded_name.c_str(), stderr ); + if( !msg ) std::fflush( stderr ); + } + if( msg ) std::fprintf( stderr, "%s\n", msg ); + } + + +const char * bad_version( const unsigned version ) + { + static char buf[80]; + snprintf( buf, sizeof buf, "Version %u member format not supported.", + version ); + return buf; + } + + +const char * format_ds( const unsigned dictionary_size ) + { + enum { bufsize = 16, factor = 1024, n = 3 }; + static char buf[bufsize]; + const char * const prefix[n] = { "Ki", "Mi", "Gi" }; + const char * p = ""; + const char * np = " "; + unsigned num = dictionary_size; + bool exact = ( num % factor == 0 ); + + for( int i = 0; i < n && ( num > 9999 || ( exact && num >= factor ) ); ++i ) + { num /= factor; if( num % factor != 0 ) exact = false; + p = prefix[i]; np = ""; } + snprintf( buf, bufsize, "%s%4u %sB", np, num, p ); + return buf; + } + + +void show_header( const unsigned dictionary_size ) + { + std::fprintf( stderr, "dict %s, ", format_ds( dictionary_size ) ); + } + +namespace { + +// separate numbers of 5 or more digits in groups of 3 digits using '_' +const char * format_num3( unsigned long long num ) + { + enum { buffers = 8, bufsize = 4 * sizeof num, n = 10 }; + const char * const si_prefix = "kMGTPEZYRQ"; + const char * const binary_prefix = "KMGTPEZYRQ"; + static char buffer[buffers][bufsize]; // circle of static buffers for printf + static int current = 0; + + char * const buf = buffer[current++]; current %= buffers; + char * p = buf + bufsize - 1; // fill the buffer backwards + *p = 0; // terminator + if( num > 1024 ) + { + char prefix = 0; // try binary first, then si + for( int i = 0; i < n && num != 0 && num % 1024 == 0; ++i ) + { num /= 1024; prefix = binary_prefix[i]; } + if( prefix ) *(--p) = 'i'; + else + for( int i = 0; i < n && num != 0 && num % 1000 == 0; ++i ) + { num /= 1000; prefix = si_prefix[i]; } + if( prefix ) *(--p) = prefix; + } + const bool split = num >= 10000; + + for( int i = 0; ; ) + { + *(--p) = num % 10 + '0'; num /= 10; if( num == 0 ) break; + if( split && ++i >= 3 ) { i = 0; *(--p) = '_'; } + } + return p; + } + + +void show_option_error( const char * const arg, const char * const msg, + const char * const option_name ) + { + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: '%s': %s option '%s'.\n", + program_name, arg, msg, option_name ); + } + + +// Recognized formats: <num>k, <num>Ki, <num>[MGTPEZYRQ][i] +unsigned long long getnum( const char * const arg, + const char * const option_name, + const unsigned long long llimit, + const unsigned long long ulimit ) + { + char * tail; + errno = 0; + unsigned long long result = strtoull( arg, &tail, 0 ); + if( tail == arg ) + { show_option_error( arg, "Bad or missing numerical argument in", + option_name ); std::exit( 1 ); } + + if( !errno && tail[0] ) + { + const unsigned factor = ( tail[1] == 'i' ) ? 1024 : 1000; + int exponent = 0; // 0 = bad multiplier + switch( tail[0] ) + { + case 'Q': exponent = 10; break; + case 'R': exponent = 9; break; + case 'Y': exponent = 8; break; + case 'Z': exponent = 7; break; + case 'E': exponent = 6; break; + case 'P': exponent = 5; break; + case 'T': exponent = 4; break; + case 'G': exponent = 3; break; + case 'M': exponent = 2; break; + case 'K': if( factor == 1024 ) exponent = 1; break; + case 'k': if( factor == 1000 ) exponent = 1; break; + } + if( exponent <= 0 ) + { show_option_error( arg, "Bad multiplier in numerical argument of", + option_name ); std::exit( 1 ); } + for( int i = 0; i < exponent; ++i ) + { + if( ulimit / factor >= result ) result *= factor; + else { errno = ERANGE; break; } + } + } + if( !errno && ( result < llimit || result > ulimit ) ) errno = ERANGE; + if( errno ) + { + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: '%s': Value out of limits [%s,%s] in " + "option '%s'.\n", program_name, arg, format_num3( llimit ), + format_num3( ulimit ), option_name ); + std::exit( 1 ); + } + return result; + } + + +int get_dict_size( const char * const arg, const char * const option_name ) + { + char * tail; + const long bits = std::strtol( arg, &tail, 0 ); + if( bits >= LZ_min_dictionary_bits() && + bits <= LZ_max_dictionary_bits() && *tail == 0 ) + return 1 << bits; + int dictionary_size = getnum( arg, option_name, LZ_min_dictionary_size(), + LZ_max_dictionary_size() ); + if( dictionary_size == 65535 ) ++dictionary_size; // no fast encoder + return dictionary_size; + } + + +void set_mode( Mode & program_mode, const Mode new_mode ) + { + if( program_mode != m_compress && program_mode != new_mode ) + { + show_error( "Only one operation can be specified.", 0, true ); + std::exit( 1 ); + } + program_mode = new_mode; + } + + +int extension_index( const std::string & name ) + { + for( int eindex = 0; known_extensions[eindex].from; ++eindex ) + { + const std::string ext( known_extensions[eindex].from ); + if( name.size() > ext.size() && + name.compare( name.size() - ext.size(), ext.size(), ext ) == 0 ) + return eindex; + } + return -1; + } + + +void set_c_outname( const std::string & name, const bool filenames_given, + const bool force_ext ) + { + /* zupdate < 1.9 depends on lzip adding the extension '.lz' to name when + reading from standard input. */ + output_filename = name; + if( force_ext || + ( !filenames_given && extension_index( output_filename ) < 0 ) ) + output_filename += known_extensions[0].from; + } + + +void set_d_outname( const std::string & name, const int eindex ) + { + if( eindex >= 0 ) + { + const std::string from( known_extensions[eindex].from ); + if( name.size() > from.size() ) + { + output_filename.assign( name, 0, name.size() - from.size() ); + output_filename += known_extensions[eindex].to; + return; + } + } + output_filename = name; output_filename += ".out"; + if( verbosity >= 1 ) + std::fprintf( stderr, "%s: %s: Can't guess original name -- using '%s'\n", + program_name, name.c_str(), output_filename.c_str() ); + } + +} // end namespace + +int open_instream( const char * const name, struct stat * const in_statsp, + const bool one_to_one, const bool reg_only ) + { + int infd = open( name, O_RDONLY | O_BINARY ); + if( infd < 0 ) + show_file_error( name, "Can't open input file", errno ); + else + { + const int i = fstat( infd, in_statsp ); + const mode_t mode = in_statsp->st_mode; + const bool can_read = ( i == 0 && !reg_only && + ( S_ISBLK( mode ) || S_ISCHR( mode ) || + S_ISFIFO( mode ) || S_ISSOCK( mode ) ) ); + if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || one_to_one ) ) ) + { + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: %s: Input file is not a regular file%s.\n", + program_name, name, ( can_read && one_to_one ) ? + ",\n and neither '-c' nor '-o' were specified" : "" ); + close( infd ); + infd = -1; + } + } + return infd; + } + +namespace { + +int open_instream2( const char * const name, struct stat * const in_statsp, + const Mode program_mode, const int eindex, + const bool one_to_one, const bool recompress ) + { + if( program_mode == m_compress && !recompress && eindex >= 0 ) + { + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: %s: Input file already has '%s' suffix.\n", + program_name, name, known_extensions[eindex].from ); + return -1; + } + return open_instream( name, in_statsp, one_to_one, false ); + } + + +bool make_dirs( const std::string & name ) + { + int i = name.size(); + while( i > 0 && name[i-1] != '/' ) --i; // remove last component + while( i > 0 && name[i-1] == '/' ) --i; // remove slash(es) + const int dirsize = i; // size of dirname without trailing slash(es) + + for( i = 0; i < dirsize; ) // if dirsize == 0, dirname is '/' or empty + { + while( i < dirsize && name[i] == '/' ) ++i; + const int first = i; + while( i < dirsize && name[i] != '/' ) ++i; + if( first < i ) + { + const std::string partial( name, 0, i ); + const mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH; + struct stat st; + if( stat( partial.c_str(), &st ) == 0 ) + { if( !S_ISDIR( st.st_mode ) ) { errno = ENOTDIR; return false; } } + else if( mkdir( partial.c_str(), mode ) != 0 && errno != EEXIST ) + return false; // if EEXIST, another process created the dir + } + } + return true; + } + + +bool open_outstream( const bool force, const bool protect ) + { + const mode_t usr_rw = S_IRUSR | S_IWUSR; + const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; + const mode_t outfd_mode = protect ? usr_rw : all_rw; + int flags = O_CREAT | O_WRONLY | O_BINARY; + if( force ) flags |= O_TRUNC; else flags |= O_EXCL; + + outfd = -1; + if( output_filename.size() && + output_filename[output_filename.size()-1] == '/' ) errno = EISDIR; + else { + if( !protect && !make_dirs( output_filename ) ) + { show_file_error( output_filename.c_str(), + "Error creating intermediate directory", errno ); return false; } + outfd = open( output_filename.c_str(), flags, outfd_mode ); + if( outfd >= 0 ) { delete_output_on_interrupt = true; return true; } + if( errno == EEXIST ) + { show_file_error( output_filename.c_str(), + "Output file already exists, skipping." ); return false; } + } + show_file_error( output_filename.c_str(), "Can't create output file", errno ); + return false; + } + + +void set_signals( void (*action)(int) ) + { + std::signal( SIGHUP, action ); + std::signal( SIGINT, action ); + std::signal( SIGTERM, action ); + } + +} // end namespace + +/* This can be called from any thread, main thread or sub-threads alike, + since they all call common helper functions like 'xlock' that call + cleanup_and_fail() in case of an error. +*/ +void cleanup_and_fail( const int retval ) + { + // only one thread can delete and exit + static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; + + set_signals( SIG_IGN ); // ignore signals + pthread_mutex_lock( &mutex ); // ignore errors to avoid loop + const int saved_verbosity = verbosity; + verbosity = -1; // suppress messages from other threads + if( delete_output_on_interrupt ) + { + delete_output_on_interrupt = false; + if( saved_verbosity >= 0 ) + std::fprintf( stderr, "%s: %s: Deleting output file, if it exists.\n", + program_name, output_filename.c_str() ); + if( outfd >= 0 ) { close( outfd ); outfd = -1; } + if( std::remove( output_filename.c_str() ) != 0 && errno != ENOENT && + saved_verbosity >= 0 ) + std::fprintf( stderr, "%s: warning: deletion of output file failed: %s\n", + program_name, std::strerror( errno ) ); + } + std::exit( retval ); + } + +namespace { + +extern "C" void signal_handler( int ) + { + show_error( "Control-C or similar caught, quitting." ); + cleanup_and_fail( 1 ); + } + + +bool check_tty_in( const char * const input_filename, const int infd, + const Mode program_mode, int & retval ) + { + if( ( program_mode == m_decompress || program_mode == m_test ) && + isatty( infd ) ) // for example /dev/tty + { show_file_error( input_filename, + "I won't read compressed data from a terminal." ); + close( infd ); set_retval( retval, 2 ); + if( program_mode != m_test ) cleanup_and_fail( retval ); + return false; } + return true; + } + +bool check_tty_out( const Mode program_mode ) + { + if( program_mode == m_compress && isatty( outfd ) ) + { show_file_error( output_filename.size() ? + output_filename.c_str() : "(stdout)", + "I won't write compressed data to a terminal." ); + return false; } + return true; + } + + +// Set permissions, owner, and times. +void close_and_set_permissions( const struct stat * const in_statsp ) + { + bool warning = false; + if( in_statsp ) + { + const mode_t mode = in_statsp->st_mode; + // fchown in many cases returns with EPERM, which can be safely ignored. + if( fchown( outfd, in_statsp->st_uid, in_statsp->st_gid ) == 0 ) + { if( fchmod( outfd, mode ) != 0 ) warning = true; } + else + if( errno != EPERM || + fchmod( outfd, mode & ~( S_ISUID | S_ISGID | S_ISVTX ) ) != 0 ) + warning = true; + } + if( close( outfd ) != 0 ) + { show_file_error( output_filename.c_str(), "Error closing output file", + errno ); cleanup_and_fail( 1 ); } + outfd = -1; + delete_output_on_interrupt = false; + if( in_statsp ) + { + struct utimbuf t; + t.actime = in_statsp->st_atime; + t.modtime = in_statsp->st_mtime; + if( utime( output_filename.c_str(), &t ) != 0 ) warning = true; + } + if( warning && verbosity >= 1 ) + show_file_error( output_filename.c_str(), + "warning: can't change output file attributes", errno ); + } + +} // end namespace + + +void show_error( const char * const msg, const int errcode, const bool help ) + { + if( verbosity < 0 ) return; + if( msg && msg[0] ) + std::fprintf( stderr, "%s: %s%s%s\n", program_name, msg, + ( errcode > 0 ) ? ": " : "", + ( errcode > 0 ) ? std::strerror( errcode ) : "" ); + if( help ) + std::fprintf( stderr, "Try '%s --help' for more information.\n", + invocation_name ); + } + + +void show_file_error( const char * const filename, const char * const msg, + const int errcode ) + { + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: %s: %s%s%s\n", program_name, filename, msg, + ( errcode > 0 ) ? ": " : "", + ( errcode > 0 ) ? std::strerror( errcode ) : "" ); + } + + +void internal_error( const char * const msg ) + { + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: internal error: %s\n", program_name, msg ); + std::exit( 3 ); + } + + +void show_progress( const unsigned long long packet_size, + const unsigned long long cfile_size, + const Pretty_print * const p ) + { + static unsigned long long csize = 0; // file_size / 100 + static unsigned long long pos = 0; + static const Pretty_print * pp = 0; + static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; + static bool enabled = true; + + if( !enabled ) return; + if( p ) // initialize static vars + { + if( verbosity < 2 || !isatty( STDERR_FILENO ) ) { enabled = false; return; } + csize = cfile_size; pos = 0; pp = p; + } + if( pp ) + { + xlock( &mutex ); + pos += packet_size; + if( csize > 0 ) + std::fprintf( stderr, "%4llu%% %.1f MB\r", pos / csize, pos / 1000000.0 ); + else + std::fprintf( stderr, " %.1f MB\r", pos / 1000000.0 ); + pp->reset(); (*pp)(); // restore cursor position + xunlock( &mutex ); + } + } + + +#if defined __MSVCRT__ +#include <windows.h> +#define _SC_NPROCESSORS_ONLN 1 +#define _SC_THREAD_THREADS_MAX 2 + +long sysconf( int flag ) + { + if( flag == _SC_NPROCESSORS_ONLN ) + { + SYSTEM_INFO si; + GetSystemInfo( &si ); + return si.dwNumberOfProcessors; + } + if( flag != _SC_THREAD_THREADS_MAX ) errno = EINVAL; + return -1; // unlimited threads or error + } + +#endif // __MSVCRT__ + + +int main( const int argc, const char * const argv[] ) + { + /* Mapping from gzip/bzip2 style 0..9 compression levels to the + corresponding LZMA compression parameters. */ + const Lzma_options option_mapping[] = + { + { 65535, 16 }, // -0 (65535,16 chooses fast encoder) + { 1 << 20, 5 }, // -1 + { 3 << 19, 6 }, // -2 + { 1 << 21, 8 }, // -3 + { 3 << 20, 12 }, // -4 + { 1 << 22, 20 }, // -5 + { 1 << 23, 36 }, // -6 + { 1 << 24, 68 }, // -7 + { 3 << 23, 132 }, // -8 + { 1 << 25, 273 } }; // -9 + Lzma_options encoder_options = option_mapping[6]; // default = "-6" + std::string default_output_filename; + int data_size = 0; + int debug_level = 0; + int num_workers = 0; // start this many worker threads + int in_slots = 4; + int out_slots = 64; + Mode program_mode = m_compress; + Cl_options cl_opts; // command-line options + bool force = false; + bool keep_input_files = false; + bool recompress = false; + bool to_stdout = false; + if( argc > 0 ) invocation_name = argv[0]; + + enum { opt_chk = 256, opt_dbg, opt_in, opt_lt, opt_out }; + const Arg_parser::Option options[] = + { + { '0', "fast", Arg_parser::no }, + { '1', 0, Arg_parser::no }, + { '2', 0, Arg_parser::no }, + { '3', 0, Arg_parser::no }, + { '4', 0, Arg_parser::no }, + { '5', 0, Arg_parser::no }, + { '6', 0, Arg_parser::no }, + { '7', 0, Arg_parser::no }, + { '8', 0, Arg_parser::no }, + { '9', "best", Arg_parser::no }, + { 'a', "trailing-error", Arg_parser::no }, + { 'b', "member-size", Arg_parser::yes }, + { 'B', "data-size", Arg_parser::yes }, + { 'c', "stdout", Arg_parser::no }, + { 'd', "decompress", Arg_parser::no }, + { 'f', "force", Arg_parser::no }, + { 'F', "recompress", Arg_parser::no }, + { 'h', "help", Arg_parser::no }, + { 'k', "keep", Arg_parser::no }, + { 'l', "list", Arg_parser::no }, + { 'm', "match-length", Arg_parser::yes }, + { 'n', "threads", Arg_parser::yes }, + { 'o', "output", Arg_parser::yes }, + { 'q', "quiet", Arg_parser::no }, + { 's', "dictionary-size", Arg_parser::yes }, + { 'S', "volume-size", Arg_parser::yes }, + { 't', "test", Arg_parser::no }, + { 'v', "verbose", Arg_parser::no }, + { 'V', "version", Arg_parser::no }, + { opt_chk, "check-lib", Arg_parser::no }, + { opt_dbg, "debug", Arg_parser::yes }, + { opt_in, "in-slots", Arg_parser::yes }, + { opt_lt, "loose-trailing", Arg_parser::no }, + { opt_out, "out-slots", Arg_parser::yes }, + { 0, 0, Arg_parser::no } }; + + const Arg_parser parser( argc, argv, options ); + if( parser.error().size() ) // bad option + { show_error( parser.error().c_str(), 0, true ); return 1; } + + const long num_online = std::max( 1L, sysconf( _SC_NPROCESSORS_ONLN ) ); + long max_workers = sysconf( _SC_THREAD_THREADS_MAX ); + if( max_workers < 1 || max_workers > INT_MAX / (int)sizeof (pthread_t) ) + max_workers = INT_MAX / sizeof (pthread_t); + + int argind = 0; + for( ; argind < parser.arguments(); ++argind ) + { + const int code = parser.code( argind ); + if( !code ) break; // no more options + const char * const pn = parser.parsed_name( argind ).c_str(); + const std::string & sarg = parser.argument( argind ); + const char * const arg = sarg.c_str(); + switch( code ) + { + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + encoder_options = option_mapping[code-'0']; break; + case 'a': cl_opts.ignore_trailing = false; break; + case 'b': break; + case 'B': data_size = getnum( arg, pn, 2 * LZ_min_dictionary_size(), + 2 * LZ_max_dictionary_size() ); break; + case 'c': to_stdout = true; break; + case 'd': set_mode( program_mode, m_decompress ); break; + case 'f': force = true; break; + case 'F': recompress = true; break; + case 'h': show_help( num_online ); return 0; + case 'k': keep_input_files = true; break; + case 'l': set_mode( program_mode, m_list ); break; + case 'm': encoder_options.match_len_limit = + getnum( arg, pn, LZ_min_match_len_limit(), + LZ_max_match_len_limit() ); break; + case 'n': num_workers = getnum( arg, pn, 1, max_workers ); break; + case 'o': if( sarg == "-" ) to_stdout = true; + else { default_output_filename = sarg; } break; + case 'q': verbosity = -1; break; + case 's': encoder_options.dictionary_size = get_dict_size( arg, pn ); + break; + case 'S': break; + case 't': set_mode( program_mode, m_test ); break; + case 'v': if( verbosity < 4 ) ++verbosity; break; + case 'V': show_version(); return 0; + case opt_chk: return check_lib(); + case opt_dbg: debug_level = getnum( arg, pn, 0, 3 ); break; + case opt_in: in_slots = getnum( arg, pn, 1, 64 ); break; + case opt_lt: cl_opts.loose_trailing = true; break; + case opt_out: out_slots = getnum( arg, pn, 1, 1024 ); break; + default: internal_error( "uncaught option." ); + } + } // end process options + + if( LZ_version()[0] < '1' ) + { show_error( "Wrong library version. At least lzlib 1.0 is required." ); + return 1; } + +#if defined __MSVCRT__ || defined __OS2__ + setmode( STDIN_FILENO, O_BINARY ); + setmode( STDOUT_FILENO, O_BINARY ); +#endif + + std::vector< std::string > filenames; + bool filenames_given = false; + for( ; argind < parser.arguments(); ++argind ) + { + filenames.push_back( parser.argument( argind ) ); + if( filenames.back() != "-" ) filenames_given = true; + } + if( filenames.empty() ) filenames.push_back("-"); + + if( program_mode == m_list ) return list_files( filenames, cl_opts ); + + const bool fast = encoder_options.dictionary_size == 65535 && + encoder_options.match_len_limit == 16; + if( data_size <= 0 ) + { + if( fast ) data_size = 1 << 20; + else data_size = 2 * std::max( 65536, encoder_options.dictionary_size ); + } + else if( !fast && data_size < encoder_options.dictionary_size ) + encoder_options.dictionary_size = + std::max( data_size, LZ_min_dictionary_size() ); + + if( num_workers <= 0 ) + { + if( program_mode == m_compress && sizeof (void *) <= 4 ) + { + // use less than 2.22 GiB on 32 bit systems + const long long limit = ( 27LL << 25 ) + ( 11LL << 27 ); // 4 * 568 MiB + const long long mem = ( 27LL * data_size ) / 8 + + ( fast ? 3LL << 19 : 11LL * encoder_options.dictionary_size ); + const int nmax32 = std::max( limit / mem, 1LL ); + if( max_workers > nmax32 ) max_workers = nmax32; + } + num_workers = std::min( num_online, max_workers ); + } + + if( program_mode == m_test ) to_stdout = false; // apply overrides + if( program_mode == m_test || to_stdout ) default_output_filename.clear(); + + if( to_stdout && program_mode != m_test ) // check tty only once + { outfd = STDOUT_FILENO; if( !check_tty_out( program_mode ) ) return 1; } + else outfd = -1; + + const bool to_file = !to_stdout && program_mode != m_test && + default_output_filename.size(); + if( !to_stdout && program_mode != m_test && ( filenames_given || to_file ) ) + set_signals( signal_handler ); + + Pretty_print pp( filenames ); + + int failed_tests = 0; + int retval = 0; + const bool one_to_one = !to_stdout && program_mode != m_test && !to_file; + bool stdin_used = false; + struct stat in_stats; + for( unsigned i = 0; i < filenames.size(); ++i ) + { + std::string input_filename; + int infd; + + pp.set_name( filenames[i] ); + if( filenames[i] == "-" ) + { + if( stdin_used ) continue; else stdin_used = true; + infd = STDIN_FILENO; + if( !check_tty_in( pp.name(), infd, program_mode, retval ) ) continue; + if( one_to_one ) { outfd = STDOUT_FILENO; output_filename.clear(); } + } + else + { + const int eindex = extension_index( input_filename = filenames[i] ); + infd = open_instream2( input_filename.c_str(), &in_stats, program_mode, + eindex, one_to_one, recompress ); + if( infd < 0 ) { set_retval( retval, 1 ); continue; } + if( !check_tty_in( pp.name(), infd, program_mode, retval ) ) continue; + if( one_to_one ) // open outfd after checking infd + { + if( program_mode == m_compress ) + set_c_outname( input_filename, true, true ); + else set_d_outname( input_filename, eindex ); + if( !open_outstream( force, true ) ) + { close( infd ); set_retval( retval, 1 ); continue; } + } + } + + if( one_to_one && !check_tty_out( program_mode ) ) + { set_retval( retval, 1 ); return retval; } // don't delete a tty + + if( to_file && outfd < 0 ) // open outfd after checking infd + { + if( program_mode == m_compress ) set_c_outname( default_output_filename, + filenames_given, false ); + else output_filename = default_output_filename; + if( !open_outstream( force, false ) || !check_tty_out( program_mode ) ) + return 1; // check tty only once and don't try to delete a tty + } + + const struct stat * const in_statsp = + ( input_filename.size() && one_to_one ) ? &in_stats : 0; + const bool infd_isreg = input_filename.size() && S_ISREG( in_stats.st_mode ); + const unsigned long long cfile_size = + infd_isreg ? ( in_stats.st_size + 99 ) / 100 : 0; + int tmp; + if( program_mode == m_compress ) + tmp = compress( cfile_size, data_size, encoder_options.dictionary_size, + encoder_options.match_len_limit, num_workers, + infd, outfd, pp, debug_level ); + else + tmp = decompress( cfile_size, num_workers, infd, outfd, cl_opts, pp, + debug_level, in_slots, out_slots, infd_isreg, + one_to_one ); + if( close( infd ) != 0 ) + { show_file_error( pp.name(), "Error closing input file", errno ); + set_retval( tmp, 1 ); } + set_retval( retval, tmp ); + if( tmp ) + { if( program_mode != m_test ) cleanup_and_fail( retval ); + else ++failed_tests; } + + if( delete_output_on_interrupt && one_to_one ) + close_and_set_permissions( in_statsp ); + if( input_filename.size() && !keep_input_files && one_to_one ) + std::remove( input_filename.c_str() ); + } + if( delete_output_on_interrupt ) // -o + close_and_set_permissions( ( retval == 0 && !stdin_used && + filenames_given && filenames.size() == 1 ) ? &in_stats : 0 ); + else if( outfd >= 0 && close( outfd ) != 0 ) // -c + { + show_error( "Error closing stdout", errno ); + set_retval( retval, 1 ); + } + if( failed_tests > 0 && verbosity >= 1 && filenames.size() > 1 ) + std::fprintf( stderr, "%s: warning: %d %s failed the test.\n", + program_name, failed_tests, + ( failed_tests == 1 ) ? "file" : "files" ); + return retval; + } diff --git a/testsuite/check.sh b/testsuite/check.sh new file mode 100755 index 0000000..7ec899e --- /dev/null +++ b/testsuite/check.sh @@ -0,0 +1,447 @@ +#! /bin/sh +# check script for Plzip - Massively parallel implementation of lzip +# Copyright (C) 2009-2024 Antonio Diaz Diaz. +# +# This script is free software: you have unlimited permission +# to copy, distribute, and modify it. + +LC_ALL=C +export LC_ALL +objdir=`pwd` +testdir=`cd "$1" ; pwd` +LZIP="${objdir}"/plzip +framework_failure() { echo "failure in testing framework" ; exit 1 ; } + +if [ ! -f "${LZIP}" ] || [ ! -x "${LZIP}" ] ; then + echo "${LZIP}: cannot execute" + exit 1 +fi + +[ -e "${LZIP}" ] 2> /dev/null || + { + echo "$0: a POSIX shell is required to run the tests" + echo "Try bash -c \"$0 $1 $2\"" + exit 1 + } + +if [ -d tmp ] ; then rm -rf tmp ; fi +mkdir tmp +cd "${objdir}"/tmp || framework_failure + +cat "${testdir}"/test.txt > in || framework_failure +in_lz="${testdir}"/test.txt.lz +in_em="${testdir}"/test_em.txt.lz +fox_lz="${testdir}"/fox.lz +fail=0 +lwarn8=0 +lwarn10=0 +test_failed() { fail=1 ; printf " $1" ; [ -z "$2" ] || printf "($2)" ; } +lzlib_1_8() { [ ${lwarn8} = 0 ] && + printf "\nwarning: header truncation detection requires lzlib 1.8 or newer" + lwarn8=1 ; } +lzlib_1_10() { [ ${lwarn10} = 0 ] && + printf "\nwarning: header HD=3 detection requires lzlib 1.10 or newer" + lwarn10=1 ; } + +"${LZIP}" --check-lib # just print warning +[ $? != 2 ] || test_failed $LINENO # unless bad lzlib.h + +printf "testing plzip-%s..." "$2" + +"${LZIP}" -fkqm4 in +[ $? = 1 ] || test_failed $LINENO +[ ! -e in.lz ] || test_failed $LINENO +"${LZIP}" -fkqm274 in +[ $? = 1 ] || test_failed $LINENO +[ ! -e in.lz ] || test_failed $LINENO +for i in bad_size -1 0 4095 513MiB 1G 1T 1P 1E 1Z 1Y 10KB ; do + "${LZIP}" -fkqs $i in + [ $? = 1 ] || test_failed $LINENO $i + [ ! -e in.lz ] || test_failed $LINENO $i +done +"${LZIP}" -lq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq < in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -cdq in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -cdq < in +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -dq -o in < "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -dq -o in "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -dq -o out nx_file.lz +[ $? = 1 ] || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO +"${LZIP}" -q -o out.lz nx_file +[ $? = 1 ] || test_failed $LINENO +[ ! -e out.lz ] || test_failed $LINENO +# these are for code coverage +"${LZIP}" -lt "${in_lz}" 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdl "${in_lz}" 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdt "${in_lz}" 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -t -- nx_file.lz 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -t "" < /dev/null 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --help > /dev/null || test_failed $LINENO +"${LZIP}" -n1 -V > /dev/null || test_failed $LINENO +"${LZIP}" -m 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -z 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --bad_option 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --t 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --test=2 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output= 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" --output 2> /dev/null +[ $? = 1 ] || test_failed $LINENO +printf "LZIP\001-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\002-.............................." | "${LZIP}" -t 2> /dev/null +printf "LZIP\001+.............................." | "${LZIP}" -t 2> /dev/null + +printf "\ntesting decompression..." + +for i in "${in_lz}" "${in_em}" ; do + "${LZIP}" -lq "$i" || test_failed $LINENO "$i" + "${LZIP}" -t "$i" || test_failed $LINENO "$i" + "${LZIP}" -d "$i" -o out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + "${LZIP}" -cd "$i" > out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + "${LZIP}" -d "$i" -o - > out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + "${LZIP}" -d < "$i" > out || test_failed $LINENO "$i" + cmp in out || test_failed $LINENO "$i" + rm -f out || framework_failure +done + +lines=`"${LZIP}" -tvv "${in_em}" 2>&1 | wc -l` || test_failed $LINENO +[ "${lines}" -eq 1 ] || test_failed $LINENO "${lines}" + +lines=`"${LZIP}" -lvv "${in_em}" | wc -l` || test_failed $LINENO +[ "${lines}" -eq 11 ] || test_failed $LINENO "${lines}" + +cat "${in_lz}" > out.lz || framework_failure +"${LZIP}" -dk out.lz || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out || framework_failure +"${LZIP}" -cd "${fox_lz}" > fox || test_failed $LINENO +cat fox > copy || framework_failure +cat "${in_lz}" > copy.lz || framework_failure +"${LZIP}" -d copy.lz out.lz 2> /dev/null # skip copy, decompress out +[ $? = 1 ] || test_failed $LINENO +[ ! -e out.lz ] || test_failed $LINENO +cmp fox copy || test_failed $LINENO +cmp in out || test_failed $LINENO +"${LZIP}" -df copy.lz || test_failed $LINENO +[ ! -e copy.lz ] || test_failed $LINENO +cmp in copy || test_failed $LINENO +rm -f copy out || framework_failure + +printf "to be overwritten" > out || framework_failure +"${LZIP}" -df -o out < "${in_lz}" || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out || framework_failure +"${LZIP}" -d -o ./- "${in_lz}" || test_failed $LINENO +cmp in ./- || test_failed $LINENO +rm -f ./- || framework_failure +"${LZIP}" -d -o ./- < "${in_lz}" || test_failed $LINENO +cmp in ./- || test_failed $LINENO +rm -f ./- || framework_failure + +cat "${in_lz}" > anyothername || framework_failure +"${LZIP}" -dv - anyothername - < "${in_lz}" > out 2> /dev/null || + test_failed $LINENO +cmp in out || test_failed $LINENO +cmp in anyothername.out || test_failed $LINENO +rm -f out anyothername.out || framework_failure + +"${LZIP}" -lq in "${in_lz}" +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -lq nx_file.lz "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -tq in "${in_lz}" +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq nx_file.lz "${in_lz}" +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -cdq in "${in_lz}" > out +[ $? = 2 ] || test_failed $LINENO +cat out in | cmp in - || test_failed $LINENO # out must be empty +"${LZIP}" -cdq nx_file.lz "${in_lz}" > out # skip nx_file, decompress in +[ $? = 1 ] || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out || framework_failure +cat "${in_lz}" > out.lz || framework_failure +for i in 1 2 3 4 5 6 7 ; do + printf "g" >> out.lz || framework_failure + "${LZIP}" -alvv out.lz "${in_lz}" > /dev/null 2>&1 + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -atvvvv out.lz "${in_lz}" 2> /dev/null + [ $? = 2 ] || test_failed $LINENO $i +done +"${LZIP}" -dq in out.lz +[ $? = 2 ] || test_failed $LINENO +[ -e out.lz ] || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO +[ ! -e in.out ] || test_failed $LINENO +"${LZIP}" -dq nx_file.lz out.lz +[ $? = 1 ] || test_failed $LINENO +[ ! -e out.lz ] || test_failed $LINENO +[ ! -e nx_file ] || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out || framework_failure + +cat in in > in2 || framework_failure +"${LZIP}" -lq "${in_lz}" "${in_lz}" || test_failed $LINENO +"${LZIP}" -t "${in_lz}" "${in_lz}" || test_failed $LINENO +"${LZIP}" -cd "${in_lz}" "${in_lz}" -o out > out2 || test_failed $LINENO +[ ! -e out ] || test_failed $LINENO # override -o +cmp in2 out2 || test_failed $LINENO +rm -f out2 || framework_failure +"${LZIP}" -d "${in_lz}" "${in_lz}" -o out2 || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f out2 || framework_failure + +cat "${in_lz}" "${in_lz}" > out2.lz || framework_failure +printf "\ngarbage" >> out2.lz || framework_failure +"${LZIP}" -tvvvv out2.lz 2> /dev/null || test_failed $LINENO +"${LZIP}" -alq out2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq out2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq < out2.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -adkq out2.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e out2 ] || test_failed $LINENO +"${LZIP}" -adkq -o out2 < out2.lz +[ $? = 2 ] || test_failed $LINENO +[ ! -e out2 ] || test_failed $LINENO +printf "to be overwritten" > out2 || framework_failure +"${LZIP}" -df out2.lz || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f out2 || framework_failure + +"${LZIP}" -d "${fox_lz}" -o a/b/c/fox || test_failed $LINENO +cmp fox a/b/c/fox || test_failed $LINENO +rm -rf a || framework_failure +"${LZIP}" -d -o a/b/c/fox < "${fox_lz}" || test_failed $LINENO +cmp fox a/b/c/fox || test_failed $LINENO +rm -rf a || framework_failure +"${LZIP}" -dq "${fox_lz}" -o a/b/c/ +[ $? = 1 ] || test_failed $LINENO +[ ! -e a ] || test_failed $LINENO + +printf "\ntesting compression..." + +"${LZIP}" -c -0 in in in -o out3.lz > copy2.lz || test_failed $LINENO +[ ! -e out3.lz ] || test_failed $LINENO # override -o +"${LZIP}" -0f in in --output=copy2.lz || test_failed $LINENO +"${LZIP}" -d copy2.lz -o out2 || test_failed $LINENO +[ -e copy2.lz ] || test_failed $LINENO +cmp in2 out2 || test_failed $LINENO +rm -f in2 out2 copy2.lz || framework_failure + +"${LZIP}" -cf "${in_lz}" > lzlz 2> /dev/null # /dev/null is a tty on OS/2 +[ $? = 1 ] || test_failed $LINENO +"${LZIP}" -Fvvm36 -o - "${in_lz}" > lzlz 2> /dev/null || test_failed $LINENO +"${LZIP}" -cd lzlz | "${LZIP}" -d > out || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f lzlz out || framework_failure + +"${LZIP}" -0 -o ./- in || test_failed $LINENO +"${LZIP}" -cd ./- | cmp in - || test_failed $LINENO +rm -f ./- || framework_failure +"${LZIP}" -0 -o ./- < in || test_failed $LINENO # add .lz +[ ! -e ./- ] || test_failed $LINENO +"${LZIP}" -cd -- -.lz | cmp in - || test_failed $LINENO +rm -f ./-.lz || framework_failure + +for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do + "${LZIP}" -k -$i in || test_failed $LINENO $i + mv in.lz out.lz || test_failed $LINENO $i + printf "garbage" >> out.lz || framework_failure + "${LZIP}" -df out.lz || test_failed $LINENO $i + cmp in out || test_failed $LINENO $i + + "${LZIP}" -$i in -c > out || test_failed $LINENO $i + "${LZIP}" -$i in -o o_out || test_failed $LINENO $i # don't add .lz + [ ! -e o_out.lz ] || test_failed $LINENO + cmp out o_out || test_failed $LINENO $i + rm -f o_out || framework_failure + printf "g" >> out || framework_failure + "${LZIP}" -cd out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i + + "${LZIP}" -$i < in > out || test_failed $LINENO $i + "${LZIP}" -d < out > copy || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i + + rm -f out || framework_failure + printf "to be overwritten" > out.lz || framework_failure + "${LZIP}" -f -$i -o out < in || test_failed $LINENO $i # add .lz + [ ! -e out ] || test_failed $LINENO + "${LZIP}" -df -o copy < out.lz || test_failed $LINENO $i + cmp in copy || test_failed $LINENO $i +done +rm -f copy out.lz || framework_failure + +cat in in in in > in4 || framework_failure +for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ; do + "${LZIP}" -s4Ki -B8Ki -n$i < in4 > out4.lz || test_failed $LINENO $i + printf "g" >> out4.lz || framework_failure + "${LZIP}" -d -n$i < out4.lz > out4 || test_failed $LINENO $i + cmp in4 out4 || test_failed $LINENO $i + "${LZIP}" -d --in-slots=$i < out4.lz > out4 || test_failed $LINENO $i + cmp in4 out4 || test_failed $LINENO $i + "${LZIP}" -d --out-slots=$i < out4.lz > out4 || test_failed $LINENO $i + cmp in4 out4 || test_failed $LINENO $i + + "${LZIP}" -c -s4Ki -B8Ki -n$i in4 > out4.lz || test_failed $LINENO $i + printf "g" >> out4.lz || framework_failure + "${LZIP}" -cd -n$i out4.lz > out4 || test_failed $LINENO $i + cmp in4 out4 || test_failed $LINENO $i + "${LZIP}" -cd --out-slots=$i out4.lz > out4 || test_failed $LINENO $i + cmp in4 out4 || test_failed $LINENO $i + rm -f out4 || framework_failure + "${LZIP}" -d -n$i out4.lz || test_failed $LINENO $i + cmp in4 out4 || test_failed $LINENO $i +done +rm -f in4 out4 || framework_failure + +cat in in in in in in in in | "${LZIP}" -1s4Ki | "${LZIP}" -t || + test_failed $LINENO + +"${LZIP}" fox -o a/b/c/fox.lz || test_failed $LINENO +cmp "${fox_lz}" a/b/c/fox.lz || test_failed $LINENO +rm -rf a || framework_failure +"${LZIP}" -o a/b/c/fox.lz < fox || test_failed $LINENO +cmp "${fox_lz}" a/b/c/fox.lz || test_failed $LINENO +rm -rf a fox || framework_failure + +printf "\ntesting bad input..." + +headers='LZIp LZiP LZip LzIP LzIp LziP lZIP lZIp lZiP lzIP' +body='\001\014\000\203\377\373\377\377\300\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000$\000\000\000\000\000\000\000' +cat "${in_lz}" > int.lz || framework_failure +printf "LZIP${body}" >> int.lz || framework_failure +if "${LZIP}" -tq int.lz ; then + for header in ${headers} ; do + printf "${header}${body}" > int.lz || framework_failure + "${LZIP}" -lq int.lz # first member + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -lq --loose-trailing int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq --loose-trailing int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + cat "${in_lz}" > int.lz || framework_failure + printf "${header}${body}" >> int.lz || framework_failure + "${LZIP}" -lq int.lz # trailing data + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq < int.lz + [ $? = 2 ] || lzlib_1_10 # requires lzlib 1.10 + "${LZIP}" -cdq int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -lq --loose-trailing int.lz || + test_failed $LINENO ${header} + "${LZIP}" -t --loose-trailing int.lz || + test_failed $LINENO ${header} + "${LZIP}" -t --loose-trailing < int.lz || + test_failed $LINENO ${header} + "${LZIP}" -cd --loose-trailing int.lz > /dev/null || + test_failed $LINENO ${header} + "${LZIP}" -lq --loose-trailing --trailing-error int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing --trailing-error int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -tq --loose-trailing --trailing-error < int.lz + [ $? = 2 ] || test_failed $LINENO ${header} + "${LZIP}" -cdq --loose-trailing --trailing-error int.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO ${header} + done +else + printf "\nwarning: skipping header test: 'printf' does not work on your system." +fi +rm -f int.lz || framework_failure + +for i in fox_v2.lz fox_s11.lz fox_de20.lz \ + fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do + "${LZIP}" -tq "${testdir}"/$i + [ $? = 2 ] || test_failed $LINENO $i +done + +cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure +cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure +if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null && + [ -e trunc.lz ] && cmp in2.lz trunc.lz > /dev/null 2>&1 ; then + for i in 6 20 14734 14753 14754 14755 14756 14757 14758 ; do + dd if=in3.lz of=trunc.lz bs=$i count=1 2> /dev/null + "${LZIP}" -lq trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -tq trunc.lz + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -tq < trunc.lz + [ $? = 2 ] || lzlib_1_8 # requires lzlib 1.8 + "${LZIP}" -cdq trunc.lz > /dev/null + [ $? = 2 ] || test_failed $LINENO $i + "${LZIP}" -dq < trunc.lz > /dev/null + [ $? = 2 ] || lzlib_1_8 # requires lzlib 1.8 + done +else + printf "\nwarning: skipping truncation test: 'dd' does not work on your system." +fi +rm -f in2.lz in3.lz trunc.lz || framework_failure + +cat "${in_lz}" > ingin.lz || framework_failure +printf "g" >> ingin.lz || framework_failure +cat "${in_lz}" >> ingin.lz || framework_failure +"${LZIP}" -lq ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -atq < ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -acdq ingin.lz > /dev/null +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -adq < ingin.lz > /dev/null +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -tq ingin.lz +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -t < ingin.lz || test_failed $LINENO +"${LZIP}" -cdq ingin.lz > out +[ $? = 2 ] || test_failed $LINENO +"${LZIP}" -d < ingin.lz > out || test_failed $LINENO +cmp in out || test_failed $LINENO +rm -f out ingin.lz || framework_failure + +echo +if [ ${fail} = 0 ] ; then + echo "tests completed successfully." + cd "${objdir}" && rm -r tmp +else + echo "tests failed." +fi +exit ${fail} diff --git a/testsuite/fox.lz b/testsuite/fox.lz Binary files differnew file mode 100644 index 0000000..509da82 --- /dev/null +++ b/testsuite/fox.lz diff --git a/testsuite/fox_bcrc.lz b/testsuite/fox_bcrc.lz Binary files differnew file mode 100644 index 0000000..8f6a7c4 --- /dev/null +++ b/testsuite/fox_bcrc.lz diff --git a/testsuite/fox_crc0.lz b/testsuite/fox_crc0.lz Binary files differnew file mode 100644 index 0000000..1abe926 --- /dev/null +++ b/testsuite/fox_crc0.lz diff --git a/testsuite/fox_das46.lz b/testsuite/fox_das46.lz Binary files differnew file mode 100644 index 0000000..43ed9f9 --- /dev/null +++ b/testsuite/fox_das46.lz diff --git a/testsuite/fox_de20.lz b/testsuite/fox_de20.lz Binary files differnew file mode 100644 index 0000000..10949d8 --- /dev/null +++ b/testsuite/fox_de20.lz diff --git a/testsuite/fox_mes81.lz b/testsuite/fox_mes81.lz Binary files differnew file mode 100644 index 0000000..d50ef2e --- /dev/null +++ b/testsuite/fox_mes81.lz diff --git a/testsuite/fox_s11.lz b/testsuite/fox_s11.lz Binary files differnew file mode 100644 index 0000000..dca909c --- /dev/null +++ b/testsuite/fox_s11.lz diff --git a/testsuite/fox_v2.lz b/testsuite/fox_v2.lz Binary files differnew file mode 100644 index 0000000..8620981 --- /dev/null +++ b/testsuite/fox_v2.lz diff --git a/testsuite/test.txt b/testsuite/test.txt new file mode 100644 index 0000000..9196a3a --- /dev/null +++ b/testsuite/test.txt @@ -0,0 +1,676 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) <year> <name of author> + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + <signature of Ty Coon>, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. + GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) <year> <name of author>
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
diff --git a/testsuite/test.txt.lz b/testsuite/test.txt.lz Binary files differnew file mode 100644 index 0000000..22cea6e --- /dev/null +++ b/testsuite/test.txt.lz diff --git a/testsuite/test_em.txt.lz b/testsuite/test_em.txt.lz Binary files differnew file mode 100644 index 0000000..7e96250 --- /dev/null +++ b/testsuite/test_em.txt.lz |