summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-14 12:57:09 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-14 12:57:09 +0000
commitae36dc179f7f9c26dd9e1463b9eb90fbc804d4d3 (patch)
tree3346607a82ffe01ef0a4a19bed6c8b8b6f96864e
parentInitial commit. (diff)
downloadplzip-ae36dc179f7f9c26dd9e1463b9eb90fbc804d4d3.tar.xz
plzip-ae36dc179f7f9c26dd9e1463b9eb90fbc804d4d3.zip
Adding upstream version 1.11.upstream/1.11upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--AUTHORS1
-rw-r--r--COPYING338
-rw-r--r--ChangeLog235
-rw-r--r--INSTALL92
-rw-r--r--Makefile.in145
-rw-r--r--NEWS14
-rw-r--r--README112
-rw-r--r--arg_parser.cc197
-rw-r--r--arg_parser.h110
-rw-r--r--compress.cc558
-rwxr-xr-xconfigure210
-rw-r--r--dec_stdout.cc337
-rw-r--r--dec_stream.cc650
-rw-r--r--decompress.cc363
-rw-r--r--doc/plzip.1148
-rw-r--r--doc/plzip.info833
-rw-r--r--doc/plzip.texi907
-rw-r--r--list.cc114
-rw-r--r--lzip.h340
-rw-r--r--lzip_index.cc209
-rw-r--r--lzip_index.h94
-rw-r--r--main.cc1016
-rwxr-xr-xtestsuite/check.sh447
-rw-r--r--testsuite/fox.lzbin0 -> 80 bytes
-rw-r--r--testsuite/fox_bcrc.lzbin0 -> 80 bytes
-rw-r--r--testsuite/fox_crc0.lzbin0 -> 80 bytes
-rw-r--r--testsuite/fox_das46.lzbin0 -> 80 bytes
-rw-r--r--testsuite/fox_de20.lzbin0 -> 80 bytes
-rw-r--r--testsuite/fox_mes81.lzbin0 -> 80 bytes
-rw-r--r--testsuite/fox_s11.lzbin0 -> 80 bytes
-rw-r--r--testsuite/fox_v2.lzbin0 -> 80 bytes
-rw-r--r--testsuite/test.txt676
-rw-r--r--testsuite/test.txt.lzbin0 -> 7376 bytes
-rw-r--r--testsuite/test_em.txt.lzbin0 -> 14024 bytes
34 files changed, 8146 insertions, 0 deletions
diff --git a/AUTHORS b/AUTHORS
new file mode 100644
index 0000000..ad5ddb5
--- /dev/null
+++ b/AUTHORS
@@ -0,0 +1 @@
+Plzip was written by Laszlo Ersek and Antonio Diaz Diaz.
diff --git a/COPYING b/COPYING
new file mode 100644
index 0000000..4ad17ae
--- /dev/null
+++ b/COPYING
@@ -0,0 +1,338 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) <year> <name of author>
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
diff --git a/ChangeLog b/ChangeLog
new file mode 100644
index 0000000..ed480ff
--- /dev/null
+++ b/ChangeLog
@@ -0,0 +1,235 @@
+2024-01-21 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.11 released.
+ * main.cc: Reformat file diagnostics as 'PROGRAM: FILE: MESSAGE'.
+ (show_option_error): New function showing argument and option name.
+ (main): Make -o preserve date/mode/owner if 1 input file.
+ (open_outstream): Create missing intermediate directories.
+ * configure, Makefile.in: New variable 'MAKEINFO'.
+
+2022-01-24 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.10 released.
+ * main.cc (getnum): Show option name and valid range if error.
+ (check_lib): Check that LZ_API_VERSION and LZ_version_string match.
+ * configure: Set variable LIBS.
+ * Improve several descriptions in manual, '--help', and man page.
+ * plzip.texi: Change GNU Texinfo category to 'Compression'.
+ (Reported by Alfred M. Szmidt).
+
+2021-01-03 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.9 released.
+ * main.cc (main): Report an error if a file name is empty.
+ Make '-o' behave like '-c', but writing to file instead of stdout.
+ Make '-c' and '-o' check whether the output is a terminal only once.
+ Do not open output if input is a terminal.
+ * main.cc: New option '--check-lib'.
+ * Replace 'decompressed', 'compressed' with 'out', 'in' in output.
+ * decompress.cc, dec_stream.cc, dec_stdout.cc:
+ Continue testing if any input file fails the test.
+ Show the largest dictionary size in a multimember file.
+ * main.cc: Show final diagnostic when testing multiple files.
+ * decompress.cc, dec_stream.cc [LZ_API_VERSION >= 1012]: Avoid
+ copying decompressed data when testing with lzlib 1.12 or newer.
+ * compress.cc, dec_stream.cc: Start only the worker threads required.
+ * dec_stream.cc: Splitter stops reading when trailing data is found.
+ Don't include trailing data in the compressed size shown.
+ Use plain comparison instead of Boyer-Moore to search for headers.
+ * lzip_index.cc: Improve messages for corruption in last header.
+ * decompress.cc: Shorten messages 'Data error' and 'Unexpected EOF'.
+ * main.cc: Set a valid invocation_name even if argc == 0.
+ * Document extraction from tar.lz in manual, '--help', and man page.
+ * plzip.texi (Introduction): Mention tarlz as an alternative.
+ * plzip.texi: Several fixes and improvements.
+ * testsuite: Add 8 new test files.
+
+2019-01-05 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.8 released.
+ * Rename File_* to Lzip_*.
+ * main.cc: New options '--in-slots' and '--out-slots'.
+ * main.cc: Increase default in_slots per worker from 2 to 4.
+ * main.cc: Increase default out_slots per worker from 32 to 64.
+ * lzip.h (Lzip_trailer): New function 'verify_consistency'.
+ * lzip_index.cc: Detect some kinds of corrupt trailers.
+ * main.cc (main): Check return value of close( infd ).
+ * plzip.texi: Improve description of '-0..-9', '-m', and '-s'.
+ * configure: New option '--with-mingw'.
+ * configure: Accept appending to CXXFLAGS; 'CXXFLAGS+=OPTIONS'.
+ * INSTALL: Document use of CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'.
+
+2018-02-07 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.7 released.
+ * compress.cc: Use 'LZ_compress_restart_member' and replace input
+ packet queue by a circular buffer to reduce memory fragmentation.
+ * compress.cc: Return one empty packet at a time to reduce mem use.
+ * main.cc: Reduce threads on 32 bit systems to use under 2.22 GiB.
+ * main.cc: New option '--loose-trailing'.
+ * Improve corrupt header detection to HD = 3 on seekable files.
+ (On all files with lzlib 1.10 or newer).
+ * Replace 'bits/byte' with inverse compression ratio in output.
+ * Show progress of decompression at verbosity level 2 (-vv).
+ * Show progress of (de)compression only if stderr is a terminal.
+ * main.cc: Do not add a second .lz extension to the arg of -o.
+ * Show dictionary size at verbosity level 4 (-vvvv).
+ * main.cc (cleanup_and_fail): Suppress messages from other threads.
+ * list.cc: Add missing '#include <pthread.h>'.
+ * plzip.texi: New chapter 'Output'.
+ * plzip.texi (Memory requirements): Add table.
+ * plzip.texi (Program design): Add a block diagram.
+
+2017-04-12 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.6 released.
+ * The option '-l, --list' has been ported from lziprecover.
+ * Don't allow mixing different operations (-d, -l or -t).
+ * main.cc: Continue testing if any input file is a terminal.
+ * lzip_index.cc: Improve detection of bad dict and trailing data.
+ * lzip.h: Unify messages for bad magic, trailing data, etc.
+
+2016-05-14 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.5 released.
+ * main.cc: New option '-a, --trailing-error'.
+ * main.cc (main): Delete '--output' file if infd is a terminal.
+ * main.cc (main): Don't use stdin more than once.
+ * plzip.texi: New chapters 'Trailing data' and 'Examples'.
+ * configure: Avoid warning on some shells when testing for g++.
+ * Makefile.in: Detect the existence of install-info.
+ * check.sh: A POSIX shell is required to run the tests.
+ * check.sh: Don't check error messages.
+
+2015-07-09 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.4 released.
+ * Option '-0' now uses the fast encoder of lzlib 1.7.
+
+2015-01-22 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.3 released.
+ * dec_stream.cc: Don't use output packets or muxer when testing.
+ * Make '-dvvv' and '-tvvv' show dictionary size like lzip.
+ * lzip.h: Add missing 'const' to the declaration of 'compress'.
+ * plzip.texi: New chapters 'Memory requirements' and
+ 'Minimum file sizes'.
+ * Makefile.in: New targets 'install*-compress'.
+
+2014-08-29 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.2 released.
+ * main.cc (close_and_set_permissions): Behave like 'cp -p'.
+ * dec_stdout.cc, dec_stream.cc: Make 'slot_av' a vector to limit
+ the number of packets produced by each worker individually.
+ * plzip.texinfo: Rename to plzip.texi.
+ * plzip.texi: Document the approximate amount of memory required.
+ * Change license to GPL version 2 or later.
+
+2013-09-17 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.1 released.
+ * Show progress of compression at verbosity level 2 (-vv).
+ * SIGUSR1 and SIGUSR2 are no longer used to signal a fatal error.
+
+2013-05-29 Antonio Diaz Diaz <antonio@gnu.org>
+
+ * Version 1.0 released.
+ * compress.cc: Change 'deliver_packet' to 'deliver_packets'.
+ * Scalability of decompression from/to regular files has been
+ increased by removing splitter and muxer when not needed.
+ * The number of worker threads is now limited to the number of
+ members when decompressing from a regular file.
+ * configure: Options now accept a separate argument.
+ * Makefile.in: New targets 'install-as-lzip' and 'install-bin'.
+ * main.cc: Use 'setmode' instead of '_setmode' on Windows and OS/2.
+ * main.cc: Define 'strtoull' to 'std::strtoul' on Windows.
+
+2012-03-01 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.9 released.
+ * Minor fixes and cleanups.
+ * configure: Rename 'datadir' to 'datarootdir'.
+
+2012-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.8 released.
+ * main.cc: New option '-F, --recompress'.
+ * decompress.cc (decompress): Show compression ratio.
+ * main.cc (close_and_set_permissions): Inability to change output
+ file attributes has been downgraded from error to warning.
+ * Small change in '--help' output and man page.
+ * Change quote characters in messages as advised by GNU Standards.
+ * main.cc: Set stdin/stdout in binary mode on OS2.
+ * compress.cc: Reduce memory use of compressed packets.
+ * decompress.cc: Use Boyer-Moore algorithm to search for headers.
+
+2010-12-03 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.7 released.
+ * Match length limits set by options -1 to -9 have been changed
+ to match those of lzip 1.11.
+ * decompress.cc: A limit has been set on the number of packets
+ produced by workers to limit the amount of memory used.
+ * main.cc (open_instream): Don't show the message
+ " and '--stdout' was not specified" for directories, etc.
+ Exit with status 1 if any output file exists and is skipped.
+ * main.cc: Fix warning about fchown return value being ignored.
+ * testsuite: Rename 'test1' to 'test.txt'. New tests.
+
+2010-03-20 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.6 released.
+ * Small portability fixes.
+ * plzip.texinfo: New chapter 'Program Design'.
+ Add missing description of option '-n, --threads'.
+ * Fix debug statistics.
+
+2010-02-10 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.5 released.
+ * Parallel decompression has been implemented.
+
+2010-01-31 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.4 released.
+ * main.cc (show_version): Show the version of lzlib being used.
+ * Code reorganization. Class Packet_courier now coordinates data
+ movement and synchronization among threads.
+
+2010-01-24 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.3 released.
+ * New option '-B, --data-size'.
+ * Output file is now removed if plzip is interrupted.
+ * This version automatically chooses the smallest possible
+ dictionary size for each member during compression, saving
+ memory during decompression.
+ * main.cc: New constant 'o_binary'.
+
+2010-01-17 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.2 released.
+ * New options '-s, --dictionary-size' and '-m, --match-length'.
+ * 'lacos_rbtree' has been replaced with a circular buffer.
+
+2009-12-05 Antonio Diaz Diaz <ant_diaz@teleline.es>
+
+ * Version 0.1 released.
+ * This version is based on llzip-0.03 (2009-11-21), written by
+ Laszlo Ersek <lacos@caesar.elte.hu>. Thanks Laszlo!
+ From llzip-0.03/README:
+
+ llzip is a hack on my lbzip2-0.17 release. I ripped out the
+ decompression stuff, and replaced the bzip2 compression with
+ the lzma compression from lzlib-0.7. llzip is mainly meant
+ as an assisted fork point for the lzip developers.
+ Nonetheless, I tried to review the diff against lbzip2-0.17
+ thoroughly, and I think llzip should be usable on its own
+ until something better appears on the net.
+
+
+Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+This file is a collection of facts, and thus it is not copyrightable, but just
+in case, you have unlimited permission to copy, distribute, and modify it.
diff --git a/INSTALL b/INSTALL
new file mode 100644
index 0000000..fe514a0
--- /dev/null
+++ b/INSTALL
@@ -0,0 +1,92 @@
+Requirements
+------------
+You will need a C++98 compiler with support for 'long long', and the
+compression library lzlib installed. (gcc 3.3.6 or newer is recommended).
+I use gcc 6.1.0 and 3.3.6, but the code should compile with any standards
+compliant compiler.
+Gcc is available at http://gcc.gnu.org.
+Lzlib is available at http://www.nongnu.org/lzip/lzlib.html.
+
+Lzlib must be version 1.0 or newer, but the fast encoder requires lzlib 1.7
+or newer, the Hamming distance (HD) = 3 detection of corrupt headers in
+non-seekable multimember files requires lzlib 1.10 or newer, and the
+'no copy' optimization for testing requires lzlib 1.12 or newer.
+
+The operating system must allow signal handlers read access to objects with
+static storage duration so that the cleanup handler for Control-C can delete
+the partial output file.
+
+
+Procedure
+---------
+1. Unpack the archive if you have not done so already:
+
+ tar -xf plzip[version].tar.lz
+or
+ lzip -cd plzip[version].tar.lz | tar -xf -
+
+This creates the directory ./plzip[version] containing the source code
+extracted from the archive.
+
+2. Change to plzip directory and run configure.
+ (Try 'configure --help' for usage instructions).
+
+ cd plzip[version]
+ ./configure
+
+ To link against a lzlib not installed in a standard place, use:
+
+ ./configure CPPFLAGS='-I <includedir>' LDFLAGS='-L <libdir>'
+
+ (Replace <includedir> with the directory containing the file lzlib.h,
+ and <libdir> with the directory containing the file liblz.a).
+
+ If you are compiling on MinGW, use --with-mingw (note that the Windows
+ I/O functions used with MinGW are not guaranteed to be thread safe):
+
+ ./configure --with-mingw CXXFLAGS+='-D __USE_MINGW_ANSI_STDIO'
+
+3. Run make.
+
+ make
+
+4. Optionally, type 'make check' to run the tests that come with plzip.
+
+5. Type 'make install' to install the program and any data files and
+ documentation. You need root privileges to install into a prefix owned
+ by root.
+
+ Or type 'make install-compress', which additionally compresses the
+ info manual and the man page after installation.
+ (Installing compressed docs may become the default in the future).
+
+ You can install only the program, the info manual, or the man page by
+ typing 'make install-bin', 'make install-info', or 'make install-man'
+ respectively.
+
+ Instead of 'make install', you can type 'make install-as-lzip' to
+ install the program and any data files and documentation, and link
+ the program to the name 'lzip'.
+
+
+Another way
+-----------
+You can also compile plzip into a separate directory.
+To do this, you must use a version of 'make' that supports the variable
+'VPATH', such as GNU 'make'. 'cd' to the directory where you want the
+object files and executables to go and run the 'configure' script.
+'configure' automatically checks for the source code in '.', in '..', and
+in the directory that 'configure' is in.
+
+'configure' recognizes the option '--srcdir=DIR' to control where to look
+for the source code. Usually 'configure' can determine that directory
+automatically.
+
+After running 'configure', you can run 'make' and 'make install' as
+explained above.
+
+
+Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+This file is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
diff --git a/Makefile.in b/Makefile.in
new file mode 100644
index 0000000..bb3afc0
--- /dev/null
+++ b/Makefile.in
@@ -0,0 +1,145 @@
+
+DISTNAME = $(pkgname)-$(pkgversion)
+INSTALL = install
+INSTALL_PROGRAM = $(INSTALL) -m 755
+INSTALL_DATA = $(INSTALL) -m 644
+INSTALL_DIR = $(INSTALL) -d -m 755
+SHELL = /bin/sh
+CAN_RUN_INSTALLINFO = $(SHELL) -c "install-info --version" > /dev/null 2>&1
+
+objs = arg_parser.o lzip_index.o list.o compress.o dec_stdout.o \
+ dec_stream.o decompress.o main.o
+
+
+.PHONY : all install install-bin install-info install-man \
+ install-strip install-compress install-strip-compress \
+ install-bin-strip install-info-compress install-man-compress \
+ install-as-lzip \
+ uninstall uninstall-bin uninstall-info uninstall-man \
+ doc info man check dist clean distclean
+
+all : $(progname)
+
+$(progname) : $(objs)
+ $(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $(objs) $(LIBS)
+
+decompress.o : decompress.cc
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(with_mingw) -c -o $@ $<
+
+main.o : main.cc
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) -DPROGVERSION=\"$(pkgversion)\" -c -o $@ $<
+
+%.o : %.cc
+ $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c -o $@ $<
+
+# prevent 'make' from trying to remake source files
+$(VPATH)/configure $(VPATH)/Makefile.in $(VPATH)/doc/$(pkgname).texi : ;
+%.h %.cc : ;
+
+$(objs) : Makefile
+arg_parser.o : arg_parser.h
+compress.o : lzip.h
+dec_stdout.o : lzip.h lzip_index.h
+dec_stream.o : lzip.h
+decompress.o : lzip.h lzip_index.h
+list.o : lzip.h lzip_index.h
+lzip_index.o : lzip.h lzip_index.h
+main.o : arg_parser.h lzip.h
+
+doc : info man
+
+info : $(VPATH)/doc/$(pkgname).info
+
+$(VPATH)/doc/$(pkgname).info : $(VPATH)/doc/$(pkgname).texi
+ cd $(VPATH)/doc && $(MAKEINFO) $(pkgname).texi
+
+man : $(VPATH)/doc/$(progname).1
+
+$(VPATH)/doc/$(progname).1 : $(progname)
+ help2man -n 'reduces the size of files' -o $@ ./$(progname)
+
+Makefile : $(VPATH)/configure $(VPATH)/Makefile.in
+ ./config.status
+
+check : all
+ @$(VPATH)/testsuite/check.sh $(VPATH)/testsuite $(pkgversion)
+
+install : install-bin install-info install-man
+install-strip : install-bin-strip install-info install-man
+install-compress : install-bin install-info-compress install-man-compress
+install-strip-compress : install-bin-strip install-info-compress install-man-compress
+
+install-bin : all
+ if [ ! -d "$(DESTDIR)$(bindir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(bindir)" ; fi
+ $(INSTALL_PROGRAM) ./$(progname) "$(DESTDIR)$(bindir)/$(progname)"
+
+install-bin-strip : all
+ $(MAKE) INSTALL_PROGRAM='$(INSTALL_PROGRAM) -s' install-bin
+
+install-info :
+ if [ ! -d "$(DESTDIR)$(infodir)" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(infodir)" ; fi
+ -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"*
+ $(INSTALL_DATA) $(VPATH)/doc/$(pkgname).info "$(DESTDIR)$(infodir)/$(pkgname).info"
+ -if $(CAN_RUN_INSTALLINFO) ; then \
+ install-info --info-dir="$(DESTDIR)$(infodir)" "$(DESTDIR)$(infodir)/$(pkgname).info" ; \
+ fi
+
+install-info-compress : install-info
+ lzip -v -9 "$(DESTDIR)$(infodir)/$(pkgname).info"
+
+install-man :
+ if [ ! -d "$(DESTDIR)$(mandir)/man1" ] ; then $(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1" ; fi
+ -rm -f "$(DESTDIR)$(mandir)/man1/$(progname).1"*
+ $(INSTALL_DATA) $(VPATH)/doc/$(progname).1 "$(DESTDIR)$(mandir)/man1/$(progname).1"
+
+install-man-compress : install-man
+ lzip -v -9 "$(DESTDIR)$(mandir)/man1/$(progname).1"
+
+install-as-lzip : install
+ -rm -f "$(DESTDIR)$(bindir)/lzip"
+ cd "$(DESTDIR)$(bindir)" && ln -s $(progname) lzip
+
+uninstall : uninstall-man uninstall-info uninstall-bin
+
+uninstall-bin :
+ -rm -f "$(DESTDIR)$(bindir)/$(progname)"
+
+uninstall-info :
+ -if $(CAN_RUN_INSTALLINFO) ; then \
+ install-info --info-dir="$(DESTDIR)$(infodir)" --remove "$(DESTDIR)$(infodir)/$(pkgname).info" ; \
+ fi
+ -rm -f "$(DESTDIR)$(infodir)/$(pkgname).info"*
+
+uninstall-man :
+ -rm -f "$(DESTDIR)$(mandir)/man1/$(progname).1"*
+
+dist : doc
+ ln -sf $(VPATH) $(DISTNAME)
+ tar -Hustar --owner=root --group=root -cvf $(DISTNAME).tar \
+ $(DISTNAME)/AUTHORS \
+ $(DISTNAME)/COPYING \
+ $(DISTNAME)/ChangeLog \
+ $(DISTNAME)/INSTALL \
+ $(DISTNAME)/Makefile.in \
+ $(DISTNAME)/NEWS \
+ $(DISTNAME)/README \
+ $(DISTNAME)/configure \
+ $(DISTNAME)/doc/$(progname).1 \
+ $(DISTNAME)/doc/$(pkgname).info \
+ $(DISTNAME)/doc/$(pkgname).texi \
+ $(DISTNAME)/*.h \
+ $(DISTNAME)/*.cc \
+ $(DISTNAME)/testsuite/check.sh \
+ $(DISTNAME)/testsuite/test.txt \
+ $(DISTNAME)/testsuite/fox.lz \
+ $(DISTNAME)/testsuite/fox_*.lz \
+ $(DISTNAME)/testsuite/test.txt.lz \
+ $(DISTNAME)/testsuite/test_em.txt.lz
+ rm -f $(DISTNAME)
+ lzip -v -9 $(DISTNAME).tar
+
+clean :
+ -rm -f $(progname) $(objs)
+
+distclean : clean
+ -rm -f Makefile config.status *.tar *.tar.lz
diff --git a/NEWS b/NEWS
new file mode 100644
index 0000000..3a2ef98
--- /dev/null
+++ b/NEWS
@@ -0,0 +1,14 @@
+Changes in version 1.11:
+
+File diagnostics have been reformatted as 'PROGRAM: FILE: MESSAGE'.
+
+Diagnostics caused by invalid arguments to command-line options now show the
+argument and the name of the option.
+
+The option '-o, --output' now preserves dates, permissions, and ownership of
+the file when (de)compressing exactly one file.
+
+The option '-o, --output' now creates missing intermediate directories when
+writing to a file.
+
+The variable MAKEINFO has been added to configure and Makefile.in.
diff --git a/README b/README
new file mode 100644
index 0000000..90d1fc9
--- /dev/null
+++ b/README
@@ -0,0 +1,112 @@
+Description
+
+Plzip is a massively parallel (multi-threaded) implementation of lzip,
+compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
+
+Lzip is a lossless data compressor with a user interface similar to the one
+of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
+chain-Algorithm' (LZMA) stream format to maximize interoperability. The
+maximum dictionary size is 512 MiB so that any lzip file can be decompressed
+on 32-bit machines. Lzip provides accurate and robust 3-factor integrity
+checking. Lzip can compress about as fast as gzip (lzip -0) or compress most
+files more than bzip2 (lzip -9). Decompression speed is intermediate between
+gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
+perspective. Lzip has been designed, written, and tested with great care to
+replace gzip and bzip2 as the standard general-purpose compressed format for
+Unix-like systems.
+
+Plzip can compress/decompress large files on multiprocessor machines much
+faster than lzip, at the cost of a slightly reduced compression ratio (0.4
+to 2 percent larger compressed files). Note that the number of usable
+threads is limited by file size; on files larger than a few GB plzip can use
+hundreds of processors, but on files of only a few MB plzip is no faster
+than lzip.
+
+For creation and manipulation of compressed tar archives tarlz can be more
+efficient than using tar and plzip because tarlz is able to keep the
+alignment between tar members and lzip members.
+
+When compressing, plzip divides the input file into chunks and compresses as
+many chunks simultaneously as worker threads are chosen, creating a
+multimember compressed file. Each chunk is compressed in-place (using the
+same buffer for input and output), reducing the amount of RAM required.
+
+When decompressing, plzip decompresses as many members simultaneously as
+worker threads are chosen. Files that were compressed with lzip are not
+decompressed faster than using lzip (unless the option '-b' was used)
+because lzip usually produces single-member files, which can't be
+decompressed in parallel.
+
+The lzip file format is designed for data sharing and long-term archiving,
+taking into account both data integrity and decoder availability:
+
+ * The lzip format provides very safe integrity checking and some data
+ recovery means. The program lziprecover can repair bit flip errors
+ (one of the most common forms of data corruption) in lzip files, and
+ provides data recovery capabilities, including error-checked merging
+ of damaged copies of a file.
+
+ * The lzip format is as simple as possible (but not simpler). The lzip
+ manual provides the source code of a simple decompressor along with a
+ detailed explanation of how it works, so that with the only help of the
+ lzip manual it would be possible for a digital archaeologist to extract
+ the data from a lzip file long after quantum computers eventually
+ render LZMA obsolete.
+
+ * Additionally the lzip reference implementation is copylefted, which
+ guarantees that it will remain free forever.
+
+A nice feature of the lzip format is that a corrupt byte is easier to repair
+the nearer it is from the beginning of the file. Therefore, with the help of
+lziprecover, losing an entire archive just because of a corrupt byte near
+the beginning is a thing of the past.
+
+Plzip uses the same well-defined exit status values used by lzip, which
+makes it safer than compressors returning ambiguous warning values (like
+gzip) when it is used as a back end for other programs like tar or zutils.
+
+Plzip automatically uses for each file the largest dictionary size that does
+not exceed neither the file size nor the limit given. Keep in mind that the
+decompression memory requirement is affected at compression time by the
+choice of dictionary size limit.
+
+When compressing, plzip replaces every file given in the command line
+with a compressed version of itself, with the name "original_name.lz".
+When decompressing, plzip attempts to guess the name for the decompressed
+file from that of the compressed file as follows:
+
+filename.lz becomes filename
+filename.tlz becomes filename.tar
+anyothername becomes anyothername.out
+
+(De)compressing a file is much like copying or moving it. Therefore plzip
+preserves the access and modification dates, permissions, and, if you have
+appropriate privileges, ownership of the file just as 'cp -p' does. (If the
+user ID or the group ID can't be duplicated, the file permission bits
+S_ISUID and S_ISGID are cleared).
+
+Plzip is able to read from some types of non-regular files if either the
+option '-c' or the option '-o' is specified.
+
+If no file names are specified, plzip compresses (or decompresses) from
+standard input to standard output. Plzip refuses to read compressed data
+from a terminal or write compressed data to a terminal, as this would be
+entirely incomprehensible and might leave the terminal in an abnormal state.
+
+Plzip correctly decompresses a file which is the concatenation of two or
+more compressed files. The result is the concatenation of the corresponding
+decompressed files. Integrity testing of concatenated compressed files is
+also supported.
+
+LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
+been compressed. Decompressed is used to refer to data which have undergone
+the process of decompression.
+
+
+Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+This file is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
+
+The file Makefile.in is a data file used by configure to produce the Makefile.
+It has the same copyright owner and permissions that configure itself.
diff --git a/arg_parser.cc b/arg_parser.cc
new file mode 100644
index 0000000..0c04d8e
--- /dev/null
+++ b/arg_parser.cc
@@ -0,0 +1,197 @@
+/* Arg_parser - POSIX/GNU command-line argument parser. (C++ version)
+ Copyright (C) 2006-2024 Antonio Diaz Diaz.
+
+ This library is free software. Redistribution and use in source and
+ binary forms, with or without modification, are permitted provided
+ that the following conditions are met:
+
+ 1. Redistributions of source code must retain the above copyright
+ notice, this list of conditions, and the following disclaimer.
+
+ 2. Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions, and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ This library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+*/
+
+#include <cstring>
+#include <string>
+#include <vector>
+
+#include "arg_parser.h"
+
+
+bool Arg_parser::parse_long_option( const char * const opt, const char * const arg,
+ const Option options[], int & argind )
+ {
+ unsigned len;
+ int index = -1;
+ bool exact = false, ambig = false;
+
+ for( len = 0; opt[len+2] && opt[len+2] != '='; ++len ) ;
+
+ // Test all long options for either exact match or abbreviated matches.
+ for( int i = 0; options[i].code != 0; ++i )
+ if( options[i].long_name &&
+ std::strncmp( options[i].long_name, &opt[2], len ) == 0 )
+ {
+ if( std::strlen( options[i].long_name ) == len ) // Exact match found
+ { index = i; exact = true; break; }
+ else if( index < 0 ) index = i; // First nonexact match found
+ else if( options[index].code != options[i].code ||
+ options[index].has_arg != options[i].has_arg )
+ ambig = true; // Second or later nonexact match found
+ }
+
+ if( ambig && !exact )
+ {
+ error_ = "option '"; error_ += opt; error_ += "' is ambiguous";
+ return false;
+ }
+
+ if( index < 0 ) // nothing found
+ {
+ error_ = "unrecognized option '"; error_ += opt; error_ += '\'';
+ return false;
+ }
+
+ ++argind;
+ data.push_back( Record( options[index].code, options[index].long_name ) );
+
+ if( opt[len+2] ) // '--<long_option>=<argument>' syntax
+ {
+ if( options[index].has_arg == no )
+ {
+ error_ = "option '--"; error_ += options[index].long_name;
+ error_ += "' doesn't allow an argument";
+ return false;
+ }
+ if( options[index].has_arg == yes && !opt[len+3] )
+ {
+ error_ = "option '--"; error_ += options[index].long_name;
+ error_ += "' requires an argument";
+ return false;
+ }
+ data.back().argument = &opt[len+3];
+ return true;
+ }
+
+ if( options[index].has_arg == yes )
+ {
+ if( !arg || !arg[0] )
+ {
+ error_ = "option '--"; error_ += options[index].long_name;
+ error_ += "' requires an argument";
+ return false;
+ }
+ ++argind; data.back().argument = arg;
+ return true;
+ }
+
+ return true;
+ }
+
+
+bool Arg_parser::parse_short_option( const char * const opt, const char * const arg,
+ const Option options[], int & argind )
+ {
+ int cind = 1; // character index in opt
+
+ while( cind > 0 )
+ {
+ int index = -1;
+ const unsigned char c = opt[cind];
+
+ if( c != 0 )
+ for( int i = 0; options[i].code; ++i )
+ if( c == options[i].code )
+ { index = i; break; }
+
+ if( index < 0 )
+ {
+ error_ = "invalid option -- '"; error_ += c; error_ += '\'';
+ return false;
+ }
+
+ data.push_back( Record( c ) );
+ if( opt[++cind] == 0 ) { ++argind; cind = 0; } // opt finished
+
+ if( options[index].has_arg != no && cind > 0 && opt[cind] )
+ {
+ data.back().argument = &opt[cind]; ++argind; cind = 0;
+ }
+ else if( options[index].has_arg == yes )
+ {
+ if( !arg || !arg[0] )
+ {
+ error_ = "option requires an argument -- '"; error_ += c;
+ error_ += '\'';
+ return false;
+ }
+ data.back().argument = arg; ++argind; cind = 0;
+ }
+ }
+ return true;
+ }
+
+
+Arg_parser::Arg_parser( const int argc, const char * const argv[],
+ const Option options[], const bool in_order )
+ {
+ if( argc < 2 || !argv || !options ) return;
+
+ std::vector< const char * > non_options; // skipped non-options
+ int argind = 1; // index in argv
+
+ while( argind < argc )
+ {
+ const unsigned char ch1 = argv[argind][0];
+ const unsigned char ch2 = ch1 ? argv[argind][1] : 0;
+
+ if( ch1 == '-' && ch2 ) // we found an option
+ {
+ const char * const opt = argv[argind];
+ const char * const arg = ( argind + 1 < argc ) ? argv[argind+1] : 0;
+ if( ch2 == '-' )
+ {
+ if( !argv[argind][2] ) { ++argind; break; } // we found "--"
+ else if( !parse_long_option( opt, arg, options, argind ) ) break;
+ }
+ else if( !parse_short_option( opt, arg, options, argind ) ) break;
+ }
+ else
+ {
+ if( in_order ) data.push_back( Record( argv[argind++] ) );
+ else non_options.push_back( argv[argind++] );
+ }
+ }
+ if( !error_.empty() ) data.clear();
+ else
+ {
+ for( unsigned i = 0; i < non_options.size(); ++i )
+ data.push_back( Record( non_options[i] ) );
+ while( argind < argc )
+ data.push_back( Record( argv[argind++] ) );
+ }
+ }
+
+
+Arg_parser::Arg_parser( const char * const opt, const char * const arg,
+ const Option options[] )
+ {
+ if( !opt || !opt[0] || !options ) return;
+
+ if( opt[0] == '-' && opt[1] ) // we found an option
+ {
+ int argind = 1; // dummy
+ if( opt[1] == '-' )
+ { if( opt[2] ) parse_long_option( opt, arg, options, argind ); }
+ else
+ parse_short_option( opt, arg, options, argind );
+ if( !error_.empty() ) data.clear();
+ }
+ else data.push_back( Record( opt ) );
+ }
diff --git a/arg_parser.h b/arg_parser.h
new file mode 100644
index 0000000..1eeec9a
--- /dev/null
+++ b/arg_parser.h
@@ -0,0 +1,110 @@
+/* Arg_parser - POSIX/GNU command-line argument parser. (C++ version)
+ Copyright (C) 2006-2024 Antonio Diaz Diaz.
+
+ This library is free software. Redistribution and use in source and
+ binary forms, with or without modification, are permitted provided
+ that the following conditions are met:
+
+ 1. Redistributions of source code must retain the above copyright
+ notice, this list of conditions, and the following disclaimer.
+
+ 2. Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions, and the following disclaimer in the
+ documentation and/or other materials provided with the distribution.
+
+ This library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+*/
+
+/* Arg_parser reads the arguments in 'argv' and creates a number of
+ option codes, option arguments, and non-option arguments.
+
+ In case of error, 'error' returns a non-empty error message.
+
+ 'options' is an array of 'struct Option' terminated by an element
+ containing a code which is zero. A null long_name means a short-only
+ option. A code value outside the unsigned char range means a long-only
+ option.
+
+ Arg_parser normally makes it appear as if all the option arguments
+ were specified before all the non-option arguments for the purposes
+ of parsing, even if the user of your program intermixed option and
+ non-option arguments. If you want the arguments in the exact order
+ the user typed them, call 'Arg_parser' with 'in_order' = true.
+
+ The argument '--' terminates all options; any following arguments are
+ treated as non-option arguments, even if they begin with a hyphen.
+
+ The syntax for optional option arguments is '-<short_option><argument>'
+ (without whitespace), or '--<long_option>=<argument>'.
+*/
+
+class Arg_parser
+ {
+public:
+ enum Has_arg { no, yes, maybe };
+
+ struct Option
+ {
+ int code; // Short option letter or code ( code != 0 )
+ const char * long_name; // Long option name (maybe null)
+ Has_arg has_arg;
+ };
+
+private:
+ struct Record
+ {
+ int code;
+ std::string parsed_name;
+ std::string argument;
+ explicit Record( const unsigned char c )
+ : code( c ), parsed_name( "-" ) { parsed_name += c; }
+ Record( const int c, const char * const long_name )
+ : code( c ), parsed_name( "--" ) { parsed_name += long_name; }
+ explicit Record( const char * const arg ) : code( 0 ), argument( arg ) {}
+ };
+
+ const std::string empty_arg;
+ std::string error_;
+ std::vector< Record > data;
+
+ bool parse_long_option( const char * const opt, const char * const arg,
+ const Option options[], int & argind );
+ bool parse_short_option( const char * const opt, const char * const arg,
+ const Option options[], int & argind );
+
+public:
+ Arg_parser( const int argc, const char * const argv[],
+ const Option options[], const bool in_order = false );
+
+ // Restricted constructor. Parses a single token and argument (if any).
+ Arg_parser( const char * const opt, const char * const arg,
+ const Option options[] );
+
+ const std::string & error() const { return error_; }
+
+ // The number of arguments parsed. May be different from argc.
+ int arguments() const { return data.size(); }
+
+ /* If code( i ) is 0, argument( i ) is a non-option.
+ Else argument( i ) is the option's argument (or empty). */
+ int code( const int i ) const
+ {
+ if( i >= 0 && i < arguments() ) return data[i].code;
+ else return 0;
+ }
+
+ // Full name of the option parsed (short or long).
+ const std::string & parsed_name( const int i ) const
+ {
+ if( i >= 0 && i < arguments() ) return data[i].parsed_name;
+ else return empty_arg;
+ }
+
+ const std::string & argument( const int i ) const
+ {
+ if( i >= 0 && i < arguments() ) return data[i].argument;
+ else return empty_arg;
+ }
+ };
diff --git a/compress.cc b/compress.cc
new file mode 100644
index 0000000..defa58d
--- /dev/null
+++ b/compress.cc
@@ -0,0 +1,558 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009 Laszlo Ersek.
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _FILE_OFFSET_BITS 64
+
+#include <algorithm>
+#include <cerrno>
+#include <climits>
+#include <csignal>
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+#include <string>
+#include <vector>
+#include <stdint.h>
+#include <unistd.h>
+#include <lzlib.h>
+
+#include "lzip.h"
+
+#ifndef LLONG_MAX
+#define LLONG_MAX 0x7FFFFFFFFFFFFFFFLL
+#endif
+
+
+/* Return the number of bytes really read.
+ If (value returned < size) and (errno == 0), means EOF was reached.
+*/
+int readblock( const int fd, uint8_t * const buf, const int size )
+ {
+ int sz = 0;
+ errno = 0;
+ while( sz < size )
+ {
+ const int n = read( fd, buf + sz, size - sz );
+ if( n > 0 ) sz += n;
+ else if( n == 0 ) break; // EOF
+ else if( errno != EINTR ) break;
+ errno = 0;
+ }
+ return sz;
+ }
+
+
+/* Return the number of bytes really written.
+ If (value returned < size), it is always an error.
+*/
+int writeblock( const int fd, const uint8_t * const buf, const int size )
+ {
+ int sz = 0;
+ errno = 0;
+ while( sz < size )
+ {
+ const int n = write( fd, buf + sz, size - sz );
+ if( n > 0 ) sz += n;
+ else if( n < 0 && errno != EINTR ) break;
+ errno = 0;
+ }
+ return sz;
+ }
+
+
+void xinit_mutex( pthread_mutex_t * const mutex )
+ {
+ const int errcode = pthread_mutex_init( mutex, 0 );
+ if( errcode )
+ { show_error( "pthread_mutex_init", errcode ); cleanup_and_fail(); }
+ }
+
+void xinit_cond( pthread_cond_t * const cond )
+ {
+ const int errcode = pthread_cond_init( cond, 0 );
+ if( errcode )
+ { show_error( "pthread_cond_init", errcode ); cleanup_and_fail(); }
+ }
+
+
+void xdestroy_mutex( pthread_mutex_t * const mutex )
+ {
+ const int errcode = pthread_mutex_destroy( mutex );
+ if( errcode )
+ { show_error( "pthread_mutex_destroy", errcode ); cleanup_and_fail(); }
+ }
+
+void xdestroy_cond( pthread_cond_t * const cond )
+ {
+ const int errcode = pthread_cond_destroy( cond );
+ if( errcode )
+ { show_error( "pthread_cond_destroy", errcode ); cleanup_and_fail(); }
+ }
+
+
+void xlock( pthread_mutex_t * const mutex )
+ {
+ const int errcode = pthread_mutex_lock( mutex );
+ if( errcode )
+ { show_error( "pthread_mutex_lock", errcode ); cleanup_and_fail(); }
+ }
+
+
+void xunlock( pthread_mutex_t * const mutex )
+ {
+ const int errcode = pthread_mutex_unlock( mutex );
+ if( errcode )
+ { show_error( "pthread_mutex_unlock", errcode ); cleanup_and_fail(); }
+ }
+
+
+void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex )
+ {
+ const int errcode = pthread_cond_wait( cond, mutex );
+ if( errcode )
+ { show_error( "pthread_cond_wait", errcode ); cleanup_and_fail(); }
+ }
+
+
+void xsignal( pthread_cond_t * const cond )
+ {
+ const int errcode = pthread_cond_signal( cond );
+ if( errcode )
+ { show_error( "pthread_cond_signal", errcode ); cleanup_and_fail(); }
+ }
+
+
+void xbroadcast( pthread_cond_t * const cond )
+ {
+ const int errcode = pthread_cond_broadcast( cond );
+ if( errcode )
+ { show_error( "pthread_cond_broadcast", errcode ); cleanup_and_fail(); }
+ }
+
+
+namespace {
+
+unsigned long long in_size = 0;
+unsigned long long out_size = 0;
+const char * const mem_msg2 = "Not enough memory. Try a smaller dictionary size.";
+
+
+struct Packet // data block with a serial number
+ {
+ uint8_t * data;
+ int size; // number of bytes in data (if any)
+ unsigned id; // serial number assigned as received
+ Packet() : data( 0 ), size( 0 ), id( 0 ) {}
+ void init( uint8_t * const d, const int s, const unsigned i )
+ { data = d; size = s; id = i; }
+ };
+
+
+class Packet_courier // moves packets around
+ {
+public:
+ unsigned icheck_counter;
+ unsigned iwait_counter;
+ unsigned ocheck_counter;
+ unsigned owait_counter;
+private:
+ unsigned receive_id; // id assigned to next packet received
+ unsigned distrib_id; // id of next packet to be distributed
+ unsigned deliver_id; // id of next packet to be delivered
+ Slot_tally slot_tally; // limits the number of input packets
+ std::vector< Packet > circular_ibuffer;
+ std::vector< const Packet * > circular_obuffer;
+ int num_working; // number of workers still running
+ const int num_slots; // max packets in circulation
+ pthread_mutex_t imutex;
+ pthread_cond_t iav_or_eof; // input packet available or splitter done
+ pthread_mutex_t omutex;
+ pthread_cond_t oav_or_exit; // output packet available or all workers exited
+ bool eof; // splitter done
+
+ Packet_courier( const Packet_courier & ); // declared as private
+ void operator=( const Packet_courier & ); // declared as private
+
+public:
+ Packet_courier( const int workers, const int slots )
+ : icheck_counter( 0 ), iwait_counter( 0 ),
+ ocheck_counter( 0 ), owait_counter( 0 ),
+ receive_id( 0 ), distrib_id( 0 ), deliver_id( 0 ),
+ slot_tally( slots ), circular_ibuffer( slots ),
+ circular_obuffer( slots, (const Packet *) 0 ),
+ num_working( workers ), num_slots( slots ), eof( false )
+ {
+ xinit_mutex( &imutex ); xinit_cond( &iav_or_eof );
+ xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
+ }
+
+ ~Packet_courier()
+ {
+ xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
+ xdestroy_cond( &iav_or_eof ); xdestroy_mutex( &imutex );
+ }
+
+ // fill a packet with data received from splitter
+ void receive_packet( uint8_t * const data, const int size )
+ {
+ slot_tally.get_slot(); // wait for a free slot
+ xlock( &imutex );
+ circular_ibuffer[receive_id % num_slots].init( data, size, receive_id );
+ ++receive_id;
+ xsignal( &iav_or_eof );
+ xunlock( &imutex );
+ }
+
+ // distribute a packet to a worker
+ Packet * distribute_packet()
+ {
+ Packet * ipacket = 0;
+ xlock( &imutex );
+ ++icheck_counter;
+ while( receive_id == distrib_id && !eof ) // no packets to distribute
+ {
+ ++iwait_counter;
+ xwait( &iav_or_eof, &imutex );
+ }
+ if( receive_id != distrib_id )
+ { ipacket = &circular_ibuffer[distrib_id % num_slots]; ++distrib_id; }
+ xunlock( &imutex );
+ if( !ipacket ) // EOF
+ {
+ xlock( &omutex ); // notify muxer when last worker exits
+ if( --num_working == 0 ) xsignal( &oav_or_exit );
+ xunlock( &omutex );
+ }
+ return ipacket;
+ }
+
+ // collect a packet from a worker
+ void collect_packet( const Packet * const opacket )
+ {
+ const int i = opacket->id % num_slots;
+ xlock( &omutex );
+ // id collision shouldn't happen
+ if( circular_obuffer[i] != 0 )
+ internal_error( "id collision in collect_packet." );
+ // merge packet into circular buffer
+ circular_obuffer[i] = opacket;
+ if( opacket->id == deliver_id ) xsignal( &oav_or_exit );
+ xunlock( &omutex );
+ }
+
+ // deliver packets to muxer
+ void deliver_packets( std::vector< const Packet * > & packet_vector )
+ {
+ xlock( &omutex );
+ ++ocheck_counter;
+ int i = deliver_id % num_slots;
+ while( circular_obuffer[i] == 0 && num_working > 0 )
+ {
+ ++owait_counter;
+ xwait( &oav_or_exit, &omutex );
+ }
+ packet_vector.clear();
+ while( true )
+ {
+ const Packet * const opacket = circular_obuffer[i];
+ if( !opacket ) break;
+ packet_vector.push_back( opacket );
+ circular_obuffer[i] = 0;
+ ++deliver_id;
+ i = deliver_id % num_slots;
+ }
+ xunlock( &omutex );
+ }
+
+ void return_empty_packet() // return a slot to the tally
+ { slot_tally.leave_slot(); }
+
+ void finish( const int workers_spared )
+ {
+ xlock( &imutex ); // splitter has no more packets to send
+ eof = true;
+ xbroadcast( &iav_or_eof );
+ xunlock( &imutex );
+ xlock( &omutex ); // notify muxer if all workers have exited
+ num_working -= workers_spared;
+ if( num_working <= 0 ) xsignal( &oav_or_exit );
+ xunlock( &omutex );
+ }
+
+ bool finished() // all packets delivered to muxer
+ {
+ if( !slot_tally.all_free() || !eof || receive_id != distrib_id ||
+ num_working != 0 ) return false;
+ for( int i = 0; i < num_slots; ++i )
+ if( circular_obuffer[i] != 0 ) return false;
+ return true;
+ }
+ };
+
+
+struct Worker_arg
+ {
+ Packet_courier * courier;
+ const Pretty_print * pp;
+ int dictionary_size;
+ int match_len_limit;
+ int offset;
+ };
+
+struct Splitter_arg
+ {
+ struct Worker_arg worker_arg;
+ pthread_t * worker_threads;
+ int infd;
+ int data_size;
+ int num_workers; // returned by splitter to main thread
+ };
+
+
+/* Get packets from courier, replace their contents, and return them to
+ courier. */
+extern "C" void * cworker( void * arg )
+ {
+ const Worker_arg & tmp = *(const Worker_arg *)arg;
+ Packet_courier & courier = *tmp.courier;
+ const Pretty_print & pp = *tmp.pp;
+ const int dictionary_size = tmp.dictionary_size;
+ const int match_len_limit = tmp.match_len_limit;
+ const int offset = tmp.offset;
+ LZ_Encoder * encoder = 0;
+
+ while( true )
+ {
+ Packet * const packet = courier.distribute_packet();
+ if( !packet ) break; // no more packets to process
+
+ if( !encoder )
+ {
+ const bool fast = dictionary_size == 65535 && match_len_limit == 16;
+ const int dict_size = fast ? dictionary_size :
+ std::max( std::min( dictionary_size, packet->size ),
+ LZ_min_dictionary_size() );
+ encoder = LZ_compress_open( dict_size, match_len_limit, LLONG_MAX );
+ if( !encoder || LZ_compress_errno( encoder ) != LZ_ok )
+ {
+ if( !encoder || LZ_compress_errno( encoder ) == LZ_mem_error )
+ pp( mem_msg2 );
+ else
+ internal_error( "invalid argument to encoder." );
+ cleanup_and_fail();
+ }
+ }
+ else
+ if( LZ_compress_restart_member( encoder, LLONG_MAX ) < 0 )
+ { pp( "LZ_compress_restart_member failed." ); cleanup_and_fail(); }
+
+ int written = 0;
+ int new_pos = 0;
+ while( true )
+ {
+ if( written < packet->size )
+ {
+ const int wr = LZ_compress_write( encoder,
+ packet->data + offset + written,
+ packet->size - written );
+ if( wr < 0 ) internal_error( "library error (LZ_compress_write)." );
+ written += wr;
+ }
+ if( written >= packet->size ) LZ_compress_finish( encoder );
+ const int rd = LZ_compress_read( encoder, packet->data + new_pos,
+ offset + written - new_pos );
+ if( rd < 0 )
+ {
+ pp();
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "LZ_compress_read error: %s\n",
+ LZ_strerror( LZ_compress_errno( encoder ) ) );
+ cleanup_and_fail();
+ }
+ new_pos += rd;
+ if( new_pos >= offset + written )
+ internal_error( "packet size exceeded in worker." );
+ if( LZ_compress_finished( encoder ) == 1 ) break;
+ }
+
+ if( packet->size > 0 ) show_progress( packet->size );
+ packet->size = new_pos;
+ courier.collect_packet( packet );
+ }
+ if( encoder && LZ_compress_close( encoder ) < 0 )
+ { pp( "LZ_compress_close failed." ); cleanup_and_fail(); }
+ return 0;
+ }
+
+
+/* Split data from input file into chunks and pass them to courier for
+ packaging and distribution to workers.
+ Start a worker per packet up to a maximum of num_workers.
+*/
+extern "C" void * csplitter( void * arg )
+ {
+ Splitter_arg & tmp = *(Splitter_arg *)arg;
+ Packet_courier & courier = *tmp.worker_arg.courier;
+ const Pretty_print & pp = *tmp.worker_arg.pp;
+ pthread_t * const worker_threads = tmp.worker_threads;
+ const int offset = tmp.worker_arg.offset;
+ const int infd = tmp.infd;
+ const int data_size = tmp.data_size;
+ int i = 0; // number of workers started
+
+ for( bool first_post = true; ; first_post = false )
+ {
+ uint8_t * const data = new( std::nothrow ) uint8_t[offset+data_size];
+ if( !data ) { pp( mem_msg2 ); cleanup_and_fail(); }
+ const int size = readblock( infd, data + offset, data_size );
+ if( size != data_size && errno )
+ { pp(); show_error( "Read error", errno ); cleanup_and_fail(); }
+
+ if( size > 0 || first_post ) // first packet may be empty
+ {
+ in_size += size;
+ courier.receive_packet( data, size );
+ if( i < tmp.num_workers ) // start a new worker
+ {
+ const int errcode =
+ pthread_create( &worker_threads[i++], 0, cworker, &tmp.worker_arg );
+ if( errcode ) { show_error( "Can't create worker threads", errcode );
+ cleanup_and_fail(); }
+ }
+ if( size < data_size ) break; // EOF
+ }
+ else
+ {
+ delete[] data;
+ break;
+ }
+ }
+ courier.finish( tmp.num_workers - i ); // no more packets to send
+ tmp.num_workers = i;
+ return 0;
+ }
+
+
+/* Get from courier the processed and sorted packets, and write their
+ contents to the output file.
+*/
+void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd )
+ {
+ std::vector< const Packet * > packet_vector;
+ while( true )
+ {
+ courier.deliver_packets( packet_vector );
+ if( packet_vector.empty() ) break; // all workers exited
+
+ for( unsigned i = 0; i < packet_vector.size(); ++i )
+ {
+ const Packet * const opacket = packet_vector[i];
+ out_size += opacket->size;
+
+ if( writeblock( outfd, opacket->data, opacket->size ) != opacket->size )
+ { pp(); show_error( "Write error", errno ); cleanup_and_fail(); }
+ delete[] opacket->data;
+ courier.return_empty_packet();
+ }
+ }
+ }
+
+} // end namespace
+
+
+/* Init the courier, then start the splitter and the workers and call the
+ muxer. */
+int compress( const unsigned long long cfile_size,
+ const int data_size, const int dictionary_size,
+ const int match_len_limit, const int num_workers,
+ const int infd, const int outfd,
+ const Pretty_print & pp, const int debug_level )
+ {
+ const int offset = data_size / 8; // offset for compression in-place
+ const int slots_per_worker = 2;
+ const int num_slots =
+ ( ( num_workers > 1 ) ? num_workers * slots_per_worker : 1 );
+ in_size = 0;
+ out_size = 0;
+ Packet_courier courier( num_workers, num_slots );
+
+ if( debug_level & 2 ) std::fputs( "compress.\n", stderr );
+
+ pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
+ if( !worker_threads ) { pp( mem_msg ); return 1; }
+
+ Splitter_arg splitter_arg;
+ splitter_arg.worker_arg.courier = &courier;
+ splitter_arg.worker_arg.pp = &pp;
+ splitter_arg.worker_arg.dictionary_size = dictionary_size;
+ splitter_arg.worker_arg.match_len_limit = match_len_limit;
+ splitter_arg.worker_arg.offset = offset;
+ splitter_arg.worker_threads = worker_threads;
+ splitter_arg.infd = infd;
+ splitter_arg.data_size = data_size;
+ splitter_arg.num_workers = num_workers;
+
+ pthread_t splitter_thread;
+ int errcode = pthread_create( &splitter_thread, 0, csplitter, &splitter_arg );
+ if( errcode )
+ { show_error( "Can't create splitter thread", errcode );
+ delete[] worker_threads; return 1; }
+ if( verbosity >= 1 ) pp();
+ show_progress( 0, cfile_size, &pp ); // init
+
+ muxer( courier, pp, outfd );
+
+ errcode = pthread_join( splitter_thread, 0 );
+ if( errcode ) { show_error( "Can't join splitter thread", errcode );
+ cleanup_and_fail(); }
+
+ for( int i = splitter_arg.num_workers; --i >= 0; )
+ { // join only the workers started
+ errcode = pthread_join( worker_threads[i], 0 );
+ if( errcode ) { show_error( "Can't join worker threads", errcode );
+ cleanup_and_fail(); }
+ }
+ delete[] worker_threads;
+
+ if( verbosity >= 1 )
+ {
+ if( in_size == 0 || out_size == 0 )
+ std::fputs( " no data compressed.\n", stderr );
+ else
+ std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved, "
+ "%llu in, %llu out.\n",
+ (double)in_size / out_size,
+ ( 100.0 * out_size ) / in_size,
+ 100.0 - ( ( 100.0 * out_size ) / in_size ),
+ in_size, out_size );
+ }
+
+ if( debug_level & 1 )
+ std::fprintf( stderr,
+ "workers started %8u\n"
+ "any worker tried to consume from splitter %8u times\n"
+ "any worker had to wait %8u times\n"
+ "muxer tried to consume from workers %8u times\n"
+ "muxer had to wait %8u times\n",
+ splitter_arg.num_workers,
+ courier.icheck_counter, courier.iwait_counter,
+ courier.ocheck_counter, courier.owait_counter );
+
+ if( !courier.finished() ) internal_error( "courier not finished." );
+ return 0;
+ }
diff --git a/configure b/configure
new file mode 100755
index 0000000..4e627d5
--- /dev/null
+++ b/configure
@@ -0,0 +1,210 @@
+#! /bin/sh
+# configure script for Plzip - Massively parallel implementation of lzip
+# Copyright (C) 2009-2024 Antonio Diaz Diaz.
+#
+# This configure script is free software: you have unlimited permission
+# to copy, distribute, and modify it.
+
+pkgname=plzip
+pkgversion=1.11
+progname=plzip
+with_mingw=
+srctrigger=doc/${pkgname}.texi
+
+# clear some things potentially inherited from environment.
+LC_ALL=C
+export LC_ALL
+srcdir=
+prefix=/usr/local
+exec_prefix='$(prefix)'
+bindir='$(exec_prefix)/bin'
+datarootdir='$(prefix)/share'
+infodir='$(datarootdir)/info'
+mandir='$(datarootdir)/man'
+CXX=g++
+CPPFLAGS=
+CXXFLAGS='-Wall -W -O2'
+LDFLAGS=
+LIBS='-llz -lpthread'
+MAKEINFO=makeinfo
+
+# checking whether we are using GNU C++.
+/bin/sh -c "${CXX} --version" > /dev/null 2>&1 || { CXX=c++ ; CXXFLAGS=-O2 ; }
+
+# Loop over all args
+args=
+no_create=
+while [ $# != 0 ] ; do
+
+ # Get the first arg, and shuffle
+ option=$1 ; arg2=no
+ shift
+
+ # Add the argument quoted to args
+ if [ -z "${args}" ] ; then args="\"${option}\""
+ else args="${args} \"${option}\"" ; fi
+
+ # Split out the argument for options that take them
+ case ${option} in
+ *=*) optarg=`echo "${option}" | sed -e 's,^[^=]*=,,;s,/$,,'` ;;
+ esac
+
+ # Process the options
+ case ${option} in
+ --help | -h)
+ echo "Usage: $0 [OPTION]... [VAR=VALUE]..."
+ echo
+ echo "To assign makefile variables (e.g., CXX, CXXFLAGS...), specify them as"
+ echo "arguments to configure in the form VAR=VALUE."
+ echo
+ echo "Options and variables: [defaults in brackets]"
+ echo " -h, --help display this help and exit"
+ echo " -V, --version output version information and exit"
+ echo " --srcdir=DIR find the source code in DIR [. or ..]"
+ echo " --prefix=DIR install into DIR [${prefix}]"
+ echo " --exec-prefix=DIR base directory for arch-dependent files [${exec_prefix}]"
+ echo " --bindir=DIR user executables directory [${bindir}]"
+ echo " --datarootdir=DIR base directory for doc and data [${datarootdir}]"
+ echo " --infodir=DIR info files directory [${infodir}]"
+ echo " --mandir=DIR man pages directory [${mandir}]"
+ echo " --with-mingw use included pread/pwrite functions missing in MinGW"
+ echo " CXX=COMPILER C++ compiler to use [${CXX}]"
+ echo " CPPFLAGS=OPTIONS command-line options for the preprocessor [${CPPFLAGS}]"
+ echo " CXXFLAGS=OPTIONS command-line options for the C++ compiler [${CXXFLAGS}]"
+ echo " CXXFLAGS+=OPTIONS append options to the current value of CXXFLAGS"
+ echo " LDFLAGS=OPTIONS command-line options for the linker [${LDFLAGS}]"
+ echo " LIBS=OPTIONS libraries to pass to the linker [${LIBS}]"
+ echo " MAKEINFO=NAME makeinfo program to use [${MAKEINFO}]"
+ echo
+ exit 0 ;;
+ --version | -V)
+ echo "Configure script for ${pkgname} version ${pkgversion}"
+ exit 0 ;;
+ --srcdir) srcdir=$1 ; arg2=yes ;;
+ --prefix) prefix=$1 ; arg2=yes ;;
+ --exec-prefix) exec_prefix=$1 ; arg2=yes ;;
+ --bindir) bindir=$1 ; arg2=yes ;;
+ --datarootdir) datarootdir=$1 ; arg2=yes ;;
+ --infodir) infodir=$1 ; arg2=yes ;;
+ --mandir) mandir=$1 ; arg2=yes ;;
+
+ --srcdir=*) srcdir=${optarg} ;;
+ --prefix=*) prefix=${optarg} ;;
+ --exec-prefix=*) exec_prefix=${optarg} ;;
+ --bindir=*) bindir=${optarg} ;;
+ --datarootdir=*) datarootdir=${optarg} ;;
+ --infodir=*) infodir=${optarg} ;;
+ --mandir=*) mandir=${optarg} ;;
+ --no-create) no_create=yes ;;
+ --with-mingw) with_mingw=-DWITH_MINGW ;;
+
+ CXX=*) CXX=${optarg} ;;
+ CPPFLAGS=*) CPPFLAGS=${optarg} ;;
+ CXXFLAGS=*) CXXFLAGS=${optarg} ;;
+ CXXFLAGS+=*) CXXFLAGS="${CXXFLAGS} ${optarg}" ;;
+ LDFLAGS=*) LDFLAGS=${optarg} ;;
+ LIBS=*) LIBS="${optarg} ${LIBS}" ;;
+ MAKEINFO=*) MAKEINFO=${optarg} ;;
+
+ --*)
+ echo "configure: WARNING: unrecognized option: '${option}'" 1>&2 ;;
+ *=* | *-*-*) ;;
+ *)
+ echo "configure: unrecognized option: '${option}'" 1>&2
+ echo "Try 'configure --help' for more information." 1>&2
+ exit 1 ;;
+ esac
+
+ # Check if the option took a separate argument
+ if [ "${arg2}" = yes ] ; then
+ if [ $# != 0 ] ; then args="${args} \"$1\"" ; shift
+ else echo "configure: Missing argument to '${option}'" 1>&2
+ exit 1
+ fi
+ fi
+done
+
+# Find the source code, if location was not specified.
+srcdirtext=
+if [ -z "${srcdir}" ] ; then
+ srcdirtext="or . or .." ; srcdir=.
+ if [ ! -r "${srcdir}/${srctrigger}" ] ; then srcdir=.. ; fi
+ if [ ! -r "${srcdir}/${srctrigger}" ] ; then
+ ## the sed command below emulates the dirname command
+ srcdir=`echo "$0" | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
+ fi
+fi
+
+if [ ! -r "${srcdir}/${srctrigger}" ] ; then
+ echo "configure: Can't find source code in ${srcdir} ${srcdirtext}" 1>&2
+ echo "configure: (At least ${srctrigger} is missing)." 1>&2
+ exit 1
+fi
+
+# Set srcdir to . if that's what it is.
+if [ "`pwd`" = "`cd "${srcdir}" ; pwd`" ] ; then srcdir=. ; fi
+
+echo
+if [ -z "${no_create}" ] ; then
+ echo "creating config.status"
+ rm -f config.status
+ cat > config.status << EOF
+#! /bin/sh
+# This file was generated automatically by configure. Don't edit.
+# Run this file to recreate the current configuration.
+#
+# This script is free software: you have unlimited permission
+# to copy, distribute, and modify it.
+
+exec /bin/sh "$0" ${args} --no-create
+EOF
+ chmod +x config.status
+fi
+
+echo "creating Makefile"
+if [ -n "${with_mingw}" ] ; then echo "WITH_MINGW = yes" ; fi
+echo "VPATH = ${srcdir}"
+echo "prefix = ${prefix}"
+echo "exec_prefix = ${exec_prefix}"
+echo "bindir = ${bindir}"
+echo "datarootdir = ${datarootdir}"
+echo "infodir = ${infodir}"
+echo "mandir = ${mandir}"
+echo "CXX = ${CXX}"
+echo "CPPFLAGS = ${CPPFLAGS}"
+echo "CXXFLAGS = ${CXXFLAGS}"
+echo "LDFLAGS = ${LDFLAGS}"
+echo "LIBS = ${LIBS}"
+echo "MAKEINFO = ${MAKEINFO}"
+rm -f Makefile
+cat > Makefile << EOF
+# Makefile for Plzip - Massively parallel implementation of lzip
+# Copyright (C) 2009-2024 Antonio Diaz Diaz.
+# This file was generated automatically by configure. Don't edit.
+#
+# This Makefile is free software: you have unlimited permission
+# to copy, distribute, and modify it.
+
+pkgname = ${pkgname}
+pkgversion = ${pkgversion}
+progname = ${progname}
+with_mingw = ${with_mingw}
+VPATH = ${srcdir}
+prefix = ${prefix}
+exec_prefix = ${exec_prefix}
+bindir = ${bindir}
+datarootdir = ${datarootdir}
+infodir = ${infodir}
+mandir = ${mandir}
+CXX = ${CXX}
+CPPFLAGS = ${CPPFLAGS}
+CXXFLAGS = ${CXXFLAGS}
+LDFLAGS = ${LDFLAGS}
+LIBS = ${LIBS}
+MAKEINFO = ${MAKEINFO}
+EOF
+cat "${srcdir}/Makefile.in" >> Makefile
+
+echo "OK. Now you can run make."
+echo "If make fails, check that the compression library lzlib is correctly installed"
+echo "(see INSTALL)."
diff --git a/dec_stdout.cc b/dec_stdout.cc
new file mode 100644
index 0000000..6ffed07
--- /dev/null
+++ b/dec_stdout.cc
@@ -0,0 +1,337 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009 Laszlo Ersek.
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _FILE_OFFSET_BITS 64
+
+#include <algorithm>
+#include <cerrno>
+#include <climits>
+#include <csignal>
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+#include <queue>
+#include <string>
+#include <vector>
+#include <stdint.h>
+#include <unistd.h>
+#include <lzlib.h>
+
+#include "lzip.h"
+#include "lzip_index.h"
+
+
+namespace {
+
+enum { max_packet_size = 1 << 20 };
+
+
+struct Packet // data block
+ {
+ uint8_t * data; // data may be null if size == 0
+ int size; // number of bytes in data (if any)
+ bool eom; // end of member
+ Packet() : data( 0 ), size( 0 ), eom( true ) {}
+ Packet( uint8_t * const d, const int s, const bool e )
+ : data( d ), size( s ), eom ( e ) {}
+ ~Packet() { if( data ) delete[] data; }
+ };
+
+
+class Packet_courier // moves packets around
+ {
+public:
+ unsigned ocheck_counter;
+ unsigned owait_counter;
+private:
+ int deliver_worker_id; // worker queue currently delivering packets
+ std::vector< std::queue< const Packet * > > opacket_queues;
+ int num_working; // number of workers still running
+ const int num_workers; // number of workers
+ const unsigned out_slots; // max output packets per queue
+ pthread_mutex_t omutex;
+ pthread_cond_t oav_or_exit; // output packet available or all workers exited
+ std::vector< pthread_cond_t > slot_av; // output slot available
+ const Shared_retval & shared_retval; // discard new packets on error
+
+ Packet_courier( const Packet_courier & ); // declared as private
+ void operator=( const Packet_courier & ); // declared as private
+
+public:
+ Packet_courier( const Shared_retval & sh_ret, const int workers,
+ const int slots )
+ : ocheck_counter( 0 ), owait_counter( 0 ), deliver_worker_id( 0 ),
+ opacket_queues( workers ), num_working( workers ),
+ num_workers( workers ), out_slots( slots ), slot_av( workers ),
+ shared_retval( sh_ret )
+ {
+ xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
+ for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] );
+ }
+
+ ~Packet_courier()
+ {
+ if( shared_retval() ) // cleanup to avoid memory leaks
+ for( int i = 0; i < num_workers; ++i )
+ while( !opacket_queues[i].empty() )
+ { delete opacket_queues[i].front(); opacket_queues[i].pop(); }
+ for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
+ xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
+ }
+
+ void worker_finished()
+ {
+ // notify muxer when last worker exits
+ xlock( &omutex );
+ if( --num_working == 0 ) xsignal( &oav_or_exit );
+ xunlock( &omutex );
+ }
+
+ // collect a packet from a worker, discard packet on error
+ void collect_packet( const Packet * const opacket, const int worker_id )
+ {
+ xlock( &omutex );
+ if( opacket->data )
+ while( opacket_queues[worker_id].size() >= out_slots )
+ {
+ if( shared_retval() ) { delete opacket; goto done; }
+ xwait( &slot_av[worker_id], &omutex );
+ }
+ opacket_queues[worker_id].push( opacket );
+ if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
+done:
+ xunlock( &omutex );
+ }
+
+ /* deliver a packet to muxer
+ if packet->eom, move to next queue
+ if packet data == 0, wait again */
+ const Packet * deliver_packet()
+ {
+ const Packet * opacket = 0;
+ xlock( &omutex );
+ ++ocheck_counter;
+ while( true )
+ {
+ while( opacket_queues[deliver_worker_id].empty() && num_working > 0 )
+ {
+ ++owait_counter;
+ xwait( &oav_or_exit, &omutex );
+ }
+ if( opacket_queues[deliver_worker_id].empty() ) break;
+ opacket = opacket_queues[deliver_worker_id].front();
+ opacket_queues[deliver_worker_id].pop();
+ if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
+ xsignal( &slot_av[deliver_worker_id] );
+ if( opacket->eom && ++deliver_worker_id >= num_workers )
+ deliver_worker_id = 0;
+ if( opacket->data ) break;
+ delete opacket; opacket = 0;
+ }
+ xunlock( &omutex );
+ return opacket;
+ }
+
+ bool finished() // all packets delivered to muxer
+ {
+ if( num_working != 0 ) return false;
+ for( int i = 0; i < num_workers; ++i )
+ if( !opacket_queues[i].empty() ) return false;
+ return true;
+ }
+ };
+
+
+struct Worker_arg
+ {
+ const Lzip_index * lzip_index;
+ Packet_courier * courier;
+ const Pretty_print * pp;
+ Shared_retval * shared_retval;
+ int worker_id;
+ int num_workers;
+ int infd;
+ };
+
+
+/* Read members from file, decompress their contents, and give to courier
+ the packets produced.
+*/
+extern "C" void * dworker_o( void * arg )
+ {
+ const Worker_arg & tmp = *(const Worker_arg *)arg;
+ const Lzip_index & lzip_index = *tmp.lzip_index;
+ Packet_courier & courier = *tmp.courier;
+ const Pretty_print & pp = *tmp.pp;
+ Shared_retval & shared_retval = *tmp.shared_retval;
+ const int worker_id = tmp.worker_id;
+ const int num_workers = tmp.num_workers;
+ const int infd = tmp.infd;
+ const int buffer_size = 65536;
+
+ int new_pos = 0;
+ uint8_t * new_data = 0;
+ uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size];
+ LZ_Decoder * const decoder = LZ_decompress_open();
+ if( !ibuffer || !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
+ { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
+
+ for( long i = worker_id; i < lzip_index.members(); i += num_workers )
+ {
+ long long member_pos = lzip_index.mblock( i ).pos();
+ long long member_rest = lzip_index.mblock( i ).size();
+
+ while( member_rest > 0 )
+ {
+ if( shared_retval() ) goto done; // other worker found a problem
+ while( LZ_decompress_write_size( decoder ) > 0 )
+ {
+ const int size = std::min( LZ_decompress_write_size( decoder ),
+ (int)std::min( (long long)buffer_size, member_rest ) );
+ if( size > 0 )
+ {
+ if( preadblock( infd, ibuffer, size, member_pos ) != size )
+ { if( shared_retval.set_value( 1 ) )
+ { pp(); show_error( "Read error", errno ); } goto done; }
+ member_pos += size;
+ member_rest -= size;
+ if( LZ_decompress_write( decoder, ibuffer, size ) != size )
+ internal_error( "library error (LZ_decompress_write)." );
+ }
+ if( member_rest <= 0 ) { LZ_decompress_finish( decoder ); break; }
+ }
+ while( true ) // read and pack decompressed data
+ {
+ if( !new_data &&
+ !( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) )
+ { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
+ const int rd = LZ_decompress_read( decoder, new_data + new_pos,
+ max_packet_size - new_pos );
+ if( rd < 0 )
+ { decompress_error( decoder, pp, shared_retval, worker_id );
+ goto done; }
+ new_pos += rd;
+ if( new_pos > max_packet_size )
+ internal_error( "opacket size exceeded in worker." );
+ const bool eom = LZ_decompress_finished( decoder ) == 1;
+ if( new_pos == max_packet_size || eom ) // make data packet
+ {
+ const Packet * const opacket =
+ new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
+ courier.collect_packet( opacket, worker_id );
+ if( new_pos > 0 ) { new_pos = 0; new_data = 0; }
+ if( eom )
+ { LZ_decompress_reset( decoder ); // prepare for new member
+ break; }
+ }
+ if( rd == 0 ) break;
+ }
+ }
+ show_progress( lzip_index.mblock( i ).size() );
+ }
+done:
+ delete[] ibuffer; if( new_data ) delete[] new_data;
+ if( LZ_decompress_member_position( decoder ) != 0 &&
+ shared_retval.set_value( 1 ) )
+ pp( "Error, some data remains in decoder." );
+ if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
+ pp( "LZ_decompress_close failed." );
+ courier.worker_finished();
+ return 0;
+ }
+
+
+/* Get from courier the processed and sorted packets, and write their
+ contents to the output file. Drain queue on error.
+*/
+void muxer( Packet_courier & courier, const Pretty_print & pp,
+ Shared_retval & shared_retval, const int outfd )
+ {
+ while( true )
+ {
+ const Packet * const opacket = courier.deliver_packet();
+ if( !opacket ) break; // queue is empty. all workers exited
+
+ if( shared_retval() == 0 &&
+ writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
+ shared_retval.set_value( 1 ) )
+ { pp(); show_error( "Write error", errno ); }
+ delete opacket;
+ }
+ }
+
+} // end namespace
+
+
+// init the courier, then start the workers and call the muxer.
+int dec_stdout( const int num_workers, const int infd, const int outfd,
+ const Pretty_print & pp, const int debug_level,
+ const int out_slots, const Lzip_index & lzip_index )
+ {
+ Shared_retval shared_retval;
+ Packet_courier courier( shared_retval, num_workers, out_slots );
+
+ Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
+ pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
+ if( !worker_args || !worker_threads )
+ { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
+
+ int i = 0; // number of workers started
+ for( ; i < num_workers; ++i )
+ {
+ worker_args[i].lzip_index = &lzip_index;
+ worker_args[i].courier = &courier;
+ worker_args[i].pp = &pp;
+ worker_args[i].shared_retval = &shared_retval;
+ worker_args[i].worker_id = i;
+ worker_args[i].num_workers = num_workers;
+ worker_args[i].infd = infd;
+ const int errcode =
+ pthread_create( &worker_threads[i], 0, dworker_o, &worker_args[i] );
+ if( errcode )
+ { if( shared_retval.set_value( 1 ) )
+ { show_error( "Can't create worker threads", errcode ); } break; }
+ }
+
+ muxer( courier, pp, shared_retval, outfd );
+
+ while( --i >= 0 )
+ {
+ const int errcode = pthread_join( worker_threads[i], 0 );
+ if( errcode && shared_retval.set_value( 1 ) )
+ show_error( "Can't join worker threads", errcode );
+ }
+ delete[] worker_threads;
+ delete[] worker_args;
+
+ if( shared_retval() ) return shared_retval(); // some thread found a problem
+
+ if( verbosity >= 1 )
+ show_results( lzip_index.cdata_size(), lzip_index.udata_size(),
+ lzip_index.dictionary_size(), false );
+
+ if( debug_level & 1 )
+ std::fprintf( stderr,
+ "workers started %8u\n"
+ "muxer tried to consume from workers %8u times\n"
+ "muxer had to wait %8u times\n",
+ num_workers, courier.ocheck_counter, courier.owait_counter );
+
+ if( !courier.finished() ) internal_error( "courier not finished." );
+ return 0;
+ }
diff --git a/dec_stream.cc b/dec_stream.cc
new file mode 100644
index 0000000..6ea4ed7
--- /dev/null
+++ b/dec_stream.cc
@@ -0,0 +1,650 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009 Laszlo Ersek.
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _FILE_OFFSET_BITS 64
+
+#include <algorithm>
+#include <cerrno>
+#include <climits>
+#include <csignal>
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+#include <queue>
+#include <string>
+#include <vector>
+#include <stdint.h>
+#include <unistd.h>
+#include <lzlib.h>
+
+#include "lzip.h"
+
+/* When a problem is detected by any thread:
+ - the thread sets shared_retval to 1 or 2.
+ - the splitter sets eof and returns.
+ - the courier discards new packets received or collected.
+ - the workers drain the queue and return.
+ - the muxer drains the queue and returns.
+ (Draining seems to be faster than cleaning up later). */
+
+namespace {
+
+enum { max_packet_size = 1 << 20 };
+unsigned long long in_size = 0;
+unsigned long long out_size = 0;
+
+
+struct Packet // data block
+ {
+ uint8_t * data; // data may be null if size == 0
+ int size; // number of bytes in data (if any)
+ bool eom; // end of member
+ Packet() : data( 0 ), size( 0 ), eom( true ) {}
+ Packet( uint8_t * const d, const int s, const bool e )
+ : data( d ), size( s ), eom ( e ) {}
+ ~Packet() { if( data ) delete[] data; }
+ };
+
+
+class Packet_courier // moves packets around
+ {
+public:
+ unsigned icheck_counter;
+ unsigned iwait_counter;
+ unsigned ocheck_counter;
+ unsigned owait_counter;
+private:
+ int receive_worker_id; // worker queue currently receiving packets
+ int deliver_worker_id; // worker queue currently delivering packets
+ Slot_tally slot_tally; // limits the number of input packets
+ std::vector< std::queue< const Packet * > > ipacket_queues;
+ std::vector< std::queue< const Packet * > > opacket_queues;
+ int num_working; // number of workers still running
+ const int num_workers; // number of workers
+ const unsigned out_slots; // max output packets per queue
+ pthread_mutex_t imutex;
+ pthread_cond_t iav_or_eof; // input packet available or splitter done
+ pthread_mutex_t omutex;
+ pthread_cond_t oav_or_exit; // output packet available or all workers exited
+ std::vector< pthread_cond_t > slot_av; // output slot available
+ const Shared_retval & shared_retval; // discard new packets on error
+ bool eof; // splitter done
+ bool trailing_data_found_; // a worker found trailing data
+
+ Packet_courier( const Packet_courier & ); // declared as private
+ void operator=( const Packet_courier & ); // declared as private
+
+public:
+ Packet_courier( const Shared_retval & sh_ret, const int workers,
+ const int in_slots, const int oslots )
+ : icheck_counter( 0 ), iwait_counter( 0 ),
+ ocheck_counter( 0 ), owait_counter( 0 ),
+ receive_worker_id( 0 ), deliver_worker_id( 0 ),
+ slot_tally( in_slots ), ipacket_queues( workers ),
+ opacket_queues( workers ), num_working( workers ),
+ num_workers( workers ), out_slots( oslots ), slot_av( workers ),
+ shared_retval( sh_ret ), eof( false ), trailing_data_found_( false )
+ {
+ xinit_mutex( &imutex ); xinit_cond( &iav_or_eof );
+ xinit_mutex( &omutex ); xinit_cond( &oav_or_exit );
+ for( unsigned i = 0; i < slot_av.size(); ++i ) xinit_cond( &slot_av[i] );
+ }
+
+ ~Packet_courier()
+ {
+ if( shared_retval() ) // cleanup to avoid memory leaks
+ for( int i = 0; i < num_workers; ++i )
+ {
+ while( !ipacket_queues[i].empty() )
+ { delete ipacket_queues[i].front(); ipacket_queues[i].pop(); }
+ while( !opacket_queues[i].empty() )
+ { delete opacket_queues[i].front(); opacket_queues[i].pop(); }
+ }
+ for( unsigned i = 0; i < slot_av.size(); ++i ) xdestroy_cond( &slot_av[i] );
+ xdestroy_cond( &oav_or_exit ); xdestroy_mutex( &omutex );
+ xdestroy_cond( &iav_or_eof ); xdestroy_mutex( &imutex );
+ }
+
+ /* Make a packet with data received from splitter.
+ If eom == true (end of member), move to next queue. */
+ void receive_packet( uint8_t * const data, const int size, const bool eom )
+ {
+ if( shared_retval() ) { delete[] data; return; } // discard packet on error
+ const Packet * const ipacket = new Packet( data, size, eom );
+ slot_tally.get_slot(); // wait for a free slot
+ xlock( &imutex );
+ ipacket_queues[receive_worker_id].push( ipacket );
+ xbroadcast( &iav_or_eof );
+ xunlock( &imutex );
+ if( eom && ++receive_worker_id >= num_workers ) receive_worker_id = 0;
+ }
+
+ // distribute a packet to a worker
+ const Packet * distribute_packet( const int worker_id )
+ {
+ const Packet * ipacket = 0;
+ xlock( &imutex );
+ ++icheck_counter;
+ while( ipacket_queues[worker_id].empty() && !eof )
+ {
+ ++iwait_counter;
+ xwait( &iav_or_eof, &imutex );
+ }
+ if( !ipacket_queues[worker_id].empty() )
+ {
+ ipacket = ipacket_queues[worker_id].front();
+ ipacket_queues[worker_id].pop();
+ }
+ xunlock( &imutex );
+ if( ipacket ) slot_tally.leave_slot();
+ else // no more packets
+ {
+ xlock( &omutex ); // notify muxer when last worker exits
+ if( --num_working == 0 ) xsignal( &oav_or_exit );
+ xunlock( &omutex );
+ }
+ return ipacket;
+ }
+
+ // collect a packet from a worker, discard packet on error
+ void collect_packet( const Packet * const opacket, const int worker_id )
+ {
+ xlock( &omutex );
+ if( opacket->data )
+ while( opacket_queues[worker_id].size() >= out_slots )
+ {
+ if( shared_retval() ) { delete opacket; goto done; }
+ xwait( &slot_av[worker_id], &omutex );
+ }
+ opacket_queues[worker_id].push( opacket );
+ if( worker_id == deliver_worker_id ) xsignal( &oav_or_exit );
+done:
+ xunlock( &omutex );
+ }
+
+ /* deliver a packet to muxer
+ if packet->eom, move to next queue
+ if packet data == 0, wait again */
+ const Packet * deliver_packet()
+ {
+ const Packet * opacket = 0;
+ xlock( &omutex );
+ ++ocheck_counter;
+ while( true )
+ {
+ while( opacket_queues[deliver_worker_id].empty() && num_working > 0 )
+ {
+ ++owait_counter;
+ xwait( &oav_or_exit, &omutex );
+ }
+ if( opacket_queues[deliver_worker_id].empty() ) break;
+ opacket = opacket_queues[deliver_worker_id].front();
+ opacket_queues[deliver_worker_id].pop();
+ if( opacket_queues[deliver_worker_id].size() + 1 == out_slots )
+ xsignal( &slot_av[deliver_worker_id] );
+ if( opacket->eom && ++deliver_worker_id >= num_workers )
+ deliver_worker_id = 0;
+ if( opacket->data ) break;
+ delete opacket; opacket = 0;
+ }
+ xunlock( &omutex );
+ return opacket;
+ }
+
+ void add_sizes( const unsigned long long partial_in_size,
+ const unsigned long long partial_out_size )
+ {
+ xlock( &imutex );
+ in_size += partial_in_size;
+ out_size += partial_out_size;
+ xunlock( &imutex );
+ }
+
+ void set_trailing_flag() { trailing_data_found_ = true; }
+ bool trailing_data_found() { return trailing_data_found_; }
+
+ void finish( const int workers_started )
+ {
+ xlock( &imutex ); // splitter has no more packets to send
+ eof = true;
+ xbroadcast( &iav_or_eof );
+ xunlock( &imutex );
+ xlock( &omutex ); // notify muxer if all workers have exited
+ num_working -= num_workers - workers_started; // workers spared
+ if( num_working <= 0 ) xsignal( &oav_or_exit );
+ xunlock( &omutex );
+ }
+
+ bool finished() // all packets delivered to muxer
+ {
+ if( !slot_tally.all_free() || !eof || num_working != 0 ) return false;
+ for( int i = 0; i < num_workers; ++i )
+ if( !ipacket_queues[i].empty() ) return false;
+ for( int i = 0; i < num_workers; ++i )
+ if( !opacket_queues[i].empty() ) return false;
+ return true;
+ }
+ };
+
+
+struct Worker_arg
+ {
+ Packet_courier * courier;
+ const Pretty_print * pp;
+ Shared_retval * shared_retval;
+ int worker_id;
+ bool ignore_trailing;
+ bool loose_trailing;
+ bool testing;
+ bool nocopy; // avoid copying decompressed data when testing
+ };
+
+struct Splitter_arg
+ {
+ struct Worker_arg worker_arg;
+ Worker_arg * worker_args;
+ pthread_t * worker_threads;
+ unsigned long long cfile_size;
+ int infd;
+ unsigned dictionary_size; // returned by splitter to main thread
+ int num_workers; // returned by splitter to main thread
+ };
+
+
+/* Consume packets from courier, decompress their contents and, if not
+ testing, give to courier the packets produced.
+*/
+extern "C" void * dworker_s( void * arg )
+ {
+ const Worker_arg & tmp = *(const Worker_arg *)arg;
+ Packet_courier & courier = *tmp.courier;
+ const Pretty_print & pp = *tmp.pp;
+ Shared_retval & shared_retval = *tmp.shared_retval;
+ const int worker_id = tmp.worker_id;
+ const bool ignore_trailing = tmp.ignore_trailing;
+ const bool loose_trailing = tmp.loose_trailing;
+ const bool testing = tmp.testing;
+ const bool nocopy = tmp.nocopy;
+
+ unsigned long long partial_in_size = 0, partial_out_size = 0;
+ int new_pos = 0;
+ bool draining = false; // either trailing data or an error were found
+ uint8_t * new_data = 0;
+ LZ_Decoder * const decoder = LZ_decompress_open();
+ if( !decoder || LZ_decompress_errno( decoder ) != LZ_ok )
+ { draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg ); }
+
+ while( true )
+ {
+ const Packet * const ipacket = courier.distribute_packet( worker_id );
+ if( !ipacket ) break; // no more packets to process
+
+ int written = 0;
+ while( !draining ) // else discard trailing data or drain queue
+ {
+ if( LZ_decompress_write_size( decoder ) > 0 && written < ipacket->size )
+ {
+ const int wr = LZ_decompress_write( decoder, ipacket->data + written,
+ ipacket->size - written );
+ if( wr < 0 ) internal_error( "library error (LZ_decompress_write)." );
+ written += wr;
+ if( written > ipacket->size )
+ internal_error( "ipacket size exceeded in worker." );
+ }
+ if( ipacket->eom && written == ipacket->size )
+ LZ_decompress_finish( decoder );
+ unsigned long long total_in = 0; // detect empty member + corrupt header
+ while( !draining ) // read and pack decompressed data
+ {
+ if( !nocopy && !new_data &&
+ !( new_data = new( std::nothrow ) uint8_t[max_packet_size] ) )
+ { draining = true; if( shared_retval.set_value( 1 ) ) pp( mem_msg );
+ break; }
+ const int rd = LZ_decompress_read( decoder,
+ nocopy ? 0 : new_data + new_pos,
+ max_packet_size - new_pos );
+ if( rd < 0 ) // trailing data or decoder error
+ {
+ draining = true;
+ const enum LZ_Errno lz_errno = LZ_decompress_errno( decoder );
+ if( lz_errno == LZ_header_error )
+ {
+ courier.set_trailing_flag();
+ if( !ignore_trailing )
+ { if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); }
+ }
+ else if( lz_errno == LZ_data_error &&
+ LZ_decompress_member_position( decoder ) == 0 )
+ {
+ courier.set_trailing_flag();
+ if( !loose_trailing )
+ { if( shared_retval.set_value( 2 ) ) pp( corrupt_mm_msg ); }
+ else if( !ignore_trailing )
+ { if( shared_retval.set_value( 2 ) ) pp( trailing_msg ); }
+ }
+ else
+ decompress_error( decoder, pp, shared_retval, worker_id );
+ }
+ else new_pos += rd;
+ if( new_pos > max_packet_size )
+ internal_error( "opacket size exceeded in worker." );
+ if( LZ_decompress_member_finished( decoder ) == 1 )
+ {
+ partial_in_size += LZ_decompress_member_position( decoder );
+ partial_out_size += LZ_decompress_data_position( decoder );
+ }
+ const bool eom = draining || LZ_decompress_finished( decoder ) == 1;
+ if( new_pos == max_packet_size || eom )
+ {
+ if( !testing ) // make data packet
+ {
+ const Packet * const opacket =
+ new Packet( ( new_pos > 0 ) ? new_data : 0, new_pos, eom );
+ courier.collect_packet( opacket, worker_id );
+ if( new_pos > 0 ) new_data = 0;
+ }
+ new_pos = 0;
+ if( eom )
+ { LZ_decompress_reset( decoder ); // prepare for new member
+ break; }
+ }
+ if( rd == 0 )
+ {
+ const unsigned long long size = LZ_decompress_total_in_size( decoder );
+ if( total_in == size ) break; else total_in = size;
+ }
+ }
+ if( !ipacket->data || written == ipacket->size ) break;
+ }
+ delete ipacket;
+ }
+
+ if( new_data ) delete[] new_data;
+ courier.add_sizes( partial_in_size, partial_out_size );
+ if( LZ_decompress_member_position( decoder ) != 0 &&
+ shared_retval.set_value( 1 ) )
+ pp( "Error, some data remains in decoder." );
+ if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
+ pp( "LZ_decompress_close failed." );
+ return 0;
+ }
+
+
+bool start_worker( const Worker_arg & worker_arg,
+ Worker_arg * const worker_args,
+ pthread_t * const worker_threads, const int worker_id,
+ Shared_retval & shared_retval )
+ {
+ worker_args[worker_id] = worker_arg;
+ worker_args[worker_id].worker_id = worker_id;
+ const int errcode = pthread_create( &worker_threads[worker_id], 0,
+ dworker_s, &worker_args[worker_id] );
+ if( errcode && shared_retval.set_value( 1 ) )
+ show_error( "Can't create worker threads", errcode );
+ return errcode == 0;
+ }
+
+
+/* Split data from input file into chunks and pass them to courier for
+ packaging and distribution to workers.
+ Start a worker per member up to a maximum of num_workers.
+*/
+extern "C" void * dsplitter_s( void * arg )
+ {
+ Splitter_arg & tmp = *(Splitter_arg *)arg;
+ const Worker_arg & worker_arg = tmp.worker_arg;
+ Packet_courier & courier = *worker_arg.courier;
+ const Pretty_print & pp = *worker_arg.pp;
+ Shared_retval & shared_retval = *worker_arg.shared_retval;
+ Worker_arg * const worker_args = tmp.worker_args;
+ pthread_t * const worker_threads = tmp.worker_threads;
+ const int infd = tmp.infd;
+ int worker_id = 0; // number of workers started
+ const int hsize = Lzip_header::size;
+ const int tsize = Lzip_trailer::size;
+ const int buffer_size = max_packet_size;
+ // buffer with room for trailer, header, data, and sentinel "LZIP"
+ const int base_buffer_size = tsize + hsize + buffer_size + 4;
+ uint8_t * const base_buffer = new( std::nothrow ) uint8_t[base_buffer_size];
+ if( !base_buffer )
+ {
+mem_fail:
+ if( shared_retval.set_value( 1 ) ) pp( mem_msg );
+fail:
+ delete[] base_buffer;
+ courier.finish( worker_id ); // no more packets to send
+ tmp.num_workers = worker_id;
+ return 0;
+ }
+ uint8_t * const buffer = base_buffer + tsize;
+
+ int size = readblock( infd, buffer, buffer_size + hsize ) - hsize;
+ bool at_stream_end = ( size < buffer_size );
+ if( size != buffer_size && errno )
+ { if( shared_retval.set_value( 1 ) )
+ { pp(); show_error( "Read error", errno ); } goto fail; }
+ if( size + hsize < min_member_size )
+ { if( shared_retval.set_value( 2 ) ) show_file_error( pp.name(),
+ ( size <= 0 ) ? "File ends unexpectedly at member header." :
+ "Input file is too short." ); goto fail; }
+ const Lzip_header & header = *(const Lzip_header *)buffer;
+ if( !header.check_magic() )
+ { if( shared_retval.set_value( 2 ) )
+ { show_file_error( pp.name(), bad_magic_msg ); } goto fail; }
+ if( !header.check_version() )
+ { if( shared_retval.set_value( 2 ) )
+ { pp( bad_version( header.version() ) ); } goto fail; }
+ tmp.dictionary_size = header.dictionary_size();
+ if( !isvalid_ds( tmp.dictionary_size ) )
+ { if( shared_retval.set_value( 2 ) ) { pp( bad_dict_msg ); } goto fail; }
+ if( verbosity >= 1 ) pp();
+ show_progress( 0, tmp.cfile_size, &pp ); // init
+
+ unsigned long long partial_member_size = 0;
+ bool worker_pending = true; // start 1 worker per first packet of member
+ while( true )
+ {
+ if( shared_retval() ) break; // stop sending packets on error
+ int pos = 0; // current searching position
+ std::memcpy( buffer + hsize + size, lzip_magic, 4 ); // sentinel
+ for( int newpos = 1; newpos <= size; ++newpos )
+ {
+ while( buffer[newpos] != lzip_magic[0] ||
+ buffer[newpos+1] != lzip_magic[1] ||
+ buffer[newpos+2] != lzip_magic[2] ||
+ buffer[newpos+3] != lzip_magic[3] ) ++newpos;
+ if( newpos <= size )
+ {
+ const Lzip_trailer & trailer =
+ *(const Lzip_trailer *)(buffer + newpos - tsize);
+ const unsigned long long member_size = trailer.member_size();
+ if( partial_member_size + newpos - pos == member_size &&
+ trailer.check_consistency() )
+ { // header found
+ const Lzip_header & header = *(const Lzip_header *)(buffer + newpos);
+ if( !header.check_version() )
+ { if( shared_retval.set_value( 2 ) )
+ { pp( bad_version( header.version() ) ); } goto fail; }
+ const unsigned dictionary_size = header.dictionary_size();
+ if( !isvalid_ds( dictionary_size ) )
+ { if( shared_retval.set_value( 2 ) ) pp( bad_dict_msg );
+ goto fail; }
+ if( tmp.dictionary_size < dictionary_size )
+ tmp.dictionary_size = dictionary_size;
+ uint8_t * const data = new( std::nothrow ) uint8_t[newpos - pos];
+ if( !data ) goto mem_fail;
+ std::memcpy( data, buffer + pos, newpos - pos );
+ courier.receive_packet( data, newpos - pos, true ); // eom
+ partial_member_size = 0;
+ pos = newpos;
+ if( worker_pending )
+ { if( !start_worker( worker_arg, worker_args, worker_threads,
+ worker_id, shared_retval ) ) goto fail;
+ ++worker_id; }
+ worker_pending = worker_id < tmp.num_workers;
+ show_progress( member_size );
+ }
+ }
+ }
+
+ if( at_stream_end )
+ {
+ uint8_t * data = new( std::nothrow ) uint8_t[size + hsize - pos];
+ if( !data ) goto mem_fail;
+ std::memcpy( data, buffer + pos, size + hsize - pos );
+ courier.receive_packet( data, size + hsize - pos, true ); // eom
+ if( worker_pending &&
+ start_worker( worker_arg, worker_args, worker_threads,
+ worker_id, shared_retval ) ) ++worker_id;
+ break;
+ }
+ if( pos < buffer_size )
+ {
+ partial_member_size += buffer_size - pos;
+ uint8_t * data = new( std::nothrow ) uint8_t[buffer_size - pos];
+ if( !data ) goto mem_fail;
+ std::memcpy( data, buffer + pos, buffer_size - pos );
+ courier.receive_packet( data, buffer_size - pos, false );
+ if( worker_pending )
+ { if( !start_worker( worker_arg, worker_args, worker_threads,
+ worker_id, shared_retval ) ) break;
+ ++worker_id; worker_pending = false; }
+ }
+ if( courier.trailing_data_found() ) break;
+ std::memcpy( base_buffer, base_buffer + buffer_size, tsize + hsize );
+ size = readblock( infd, buffer + hsize, buffer_size );
+ at_stream_end = ( size < buffer_size );
+ if( size != buffer_size && errno )
+ { if( shared_retval.set_value( 1 ) )
+ { pp(); show_error( "Read error", errno ); } break; }
+ }
+ delete[] base_buffer;
+ courier.finish( worker_id ); // no more packets to send
+ tmp.num_workers = worker_id;
+ return 0;
+ }
+
+
+/* Get from courier the processed and sorted packets, and write their
+ contents to the output file. Drain queue on error.
+*/
+void muxer( Packet_courier & courier, const Pretty_print & pp,
+ Shared_retval & shared_retval, const int outfd )
+ {
+ while( true )
+ {
+ const Packet * const opacket = courier.deliver_packet();
+ if( !opacket ) break; // queue is empty. all workers exited
+
+ if( shared_retval() == 0 &&
+ writeblock( outfd, opacket->data, opacket->size ) != opacket->size &&
+ shared_retval.set_value( 1 ) )
+ { pp(); show_error( "Write error", errno ); }
+ delete opacket;
+ }
+ }
+
+} // end namespace
+
+
+/* Init the courier, then start the splitter and the workers and, if not
+ testing, call the muxer.
+*/
+int dec_stream( const unsigned long long cfile_size, const int num_workers,
+ const int infd, const int outfd, const Cl_options & cl_opts,
+ const Pretty_print & pp, const int debug_level,
+ const int in_slots, const int out_slots )
+ {
+ const int total_in_slots = ( INT_MAX / num_workers >= in_slots ) ?
+ num_workers * in_slots : INT_MAX;
+ in_size = 0;
+ out_size = 0;
+ Shared_retval shared_retval;
+ Packet_courier courier( shared_retval, num_workers, total_in_slots, out_slots );
+
+ if( debug_level & 2 ) std::fputs( "decompress stream.\n", stderr );
+
+ Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
+ pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
+ if( !worker_args || !worker_threads )
+ { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
+
+#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
+ const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 );
+#else
+ const bool nocopy = false;
+#endif
+
+ Splitter_arg splitter_arg;
+ splitter_arg.worker_arg.courier = &courier;
+ splitter_arg.worker_arg.pp = &pp;
+ splitter_arg.worker_arg.shared_retval = &shared_retval;
+ splitter_arg.worker_arg.worker_id = 0;
+ splitter_arg.worker_arg.ignore_trailing = cl_opts.ignore_trailing;
+ splitter_arg.worker_arg.loose_trailing = cl_opts.loose_trailing;
+ splitter_arg.worker_arg.testing = ( outfd < 0 );
+ splitter_arg.worker_arg.nocopy = nocopy;
+ splitter_arg.worker_args = worker_args;
+ splitter_arg.worker_threads = worker_threads;
+ splitter_arg.cfile_size = cfile_size;
+ splitter_arg.infd = infd;
+ splitter_arg.num_workers = num_workers;
+
+ pthread_t splitter_thread;
+ int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg );
+ if( errcode )
+ { show_error( "Can't create splitter thread", errcode );
+ delete[] worker_threads; delete[] worker_args; return 1; }
+
+ if( outfd >= 0 ) muxer( courier, pp, shared_retval, outfd );
+
+ errcode = pthread_join( splitter_thread, 0 );
+ if( errcode && shared_retval.set_value( 1 ) )
+ show_error( "Can't join splitter thread", errcode );
+
+ for( int i = splitter_arg.num_workers; --i >= 0; )
+ { // join only the workers started
+ errcode = pthread_join( worker_threads[i], 0 );
+ if( errcode && shared_retval.set_value( 1 ) )
+ show_error( "Can't join worker threads", errcode );
+ }
+ delete[] worker_threads;
+ delete[] worker_args;
+
+ if( shared_retval() ) return shared_retval(); // some thread found a problem
+
+ show_results( in_size, out_size, splitter_arg.dictionary_size, outfd < 0 );
+
+ if( debug_level & 1 )
+ {
+ std::fprintf( stderr,
+ "workers started %8u\n"
+ "any worker tried to consume from splitter %8u times\n"
+ "any worker had to wait %8u times\n",
+ splitter_arg.num_workers,
+ courier.icheck_counter, courier.iwait_counter );
+ if( outfd >= 0 )
+ std::fprintf( stderr,
+ "muxer tried to consume from workers %8u times\n"
+ "muxer had to wait %8u times\n",
+ courier.ocheck_counter, courier.owait_counter );
+ }
+
+ if( !courier.finished() ) internal_error( "courier not finished." );
+ return 0;
+ }
diff --git a/decompress.cc b/decompress.cc
new file mode 100644
index 0000000..5b0e68f
--- /dev/null
+++ b/decompress.cc
@@ -0,0 +1,363 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009 Laszlo Ersek.
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _FILE_OFFSET_BITS 64
+
+#include <algorithm>
+#include <cerrno>
+#include <climits>
+#include <csignal>
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+#include <string>
+#include <vector>
+#include <stdint.h>
+#include <unistd.h>
+#include <sys/stat.h>
+#include <lzlib.h>
+
+#include "lzip.h"
+#include "lzip_index.h"
+
+
+/* This code is based on a patch by Hannes Domani, <ssbssa@yahoo.de> to make
+ possible compiling plzip under MS Windows (with MINGW compiler).
+*/
+#if defined __MSVCRT__ && defined WITH_MINGW
+#include <windows.h>
+#warning "Parallel I/O is not guaranteed to work on Windows."
+
+ssize_t pread( int fd, void *buf, size_t count, uint64_t offset )
+ {
+ OVERLAPPED o = {0,0,0,0,0};
+ HANDLE fh = (HANDLE)_get_osfhandle(fd);
+ DWORD bytes;
+ BOOL ret;
+
+ if( fh == INVALID_HANDLE_VALUE ) { errno = EBADF; return -1; }
+ o.Offset = offset & 0xffffffff;
+ o.OffsetHigh = (offset >> 32) & 0xffffffff;
+ ret = ReadFile( fh, buf, (DWORD)count, &bytes, &o );
+ if( !ret ) { errno = EIO; return -1; }
+ return (ssize_t)bytes;
+ }
+
+ssize_t pwrite( int fd, const void *buf, size_t count, uint64_t offset )
+ {
+ OVERLAPPED o = {0,0,0,0,0};
+ HANDLE fh = (HANDLE)_get_osfhandle(fd);
+ DWORD bytes;
+ BOOL ret;
+
+ if( fh == INVALID_HANDLE_VALUE ) { errno = EBADF; return -1; }
+ o.Offset = offset & 0xffffffff;
+ o.OffsetHigh = (offset >> 32) & 0xffffffff;
+ ret = WriteFile(fh, buf, (DWORD)count, &bytes, &o);
+ if( !ret ) { errno = EIO; return -1; }
+ return (ssize_t)bytes;
+ }
+
+#endif // __MSVCRT__
+
+
+/* Return the number of bytes really read.
+ If (value returned < size) and (errno == 0), means EOF was reached.
+*/
+int preadblock( const int fd, uint8_t * const buf, const int size,
+ const long long pos )
+ {
+ int sz = 0;
+ errno = 0;
+ while( sz < size )
+ {
+ const int n = pread( fd, buf + sz, size - sz, pos + sz );
+ if( n > 0 ) sz += n;
+ else if( n == 0 ) break; // EOF
+ else if( errno != EINTR ) break;
+ errno = 0;
+ }
+ return sz;
+ }
+
+
+/* Return the number of bytes really written.
+ If (value returned < size), it is always an error.
+*/
+int pwriteblock( const int fd, const uint8_t * const buf, const int size,
+ const long long pos )
+ {
+ int sz = 0;
+ errno = 0;
+ while( sz < size )
+ {
+ const int n = pwrite( fd, buf + sz, size - sz, pos + sz );
+ if( n > 0 ) sz += n;
+ else if( n < 0 && errno != EINTR ) break;
+ errno = 0;
+ }
+ return sz;
+ }
+
+
+void decompress_error( struct LZ_Decoder * const decoder,
+ const Pretty_print & pp,
+ Shared_retval & shared_retval, const int worker_id )
+ {
+ const LZ_Errno errcode = LZ_decompress_errno( decoder );
+ const int retval = ( errcode == LZ_header_error || errcode == LZ_data_error ||
+ errcode == LZ_unexpected_eof ) ? 2 : 1;
+ if( !shared_retval.set_value( retval ) ) return;
+ pp();
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s in worker %d\n", LZ_strerror( errcode ),
+ worker_id );
+ }
+
+
+void show_results( const unsigned long long in_size,
+ const unsigned long long out_size,
+ const unsigned dictionary_size, const bool testing )
+ {
+ if( verbosity >= 2 )
+ {
+ if( verbosity >= 4 ) show_header( dictionary_size );
+ if( out_size == 0 || in_size == 0 )
+ std::fputs( "no data compressed. ", stderr );
+ else
+ std::fprintf( stderr, "%6.3f:1, %5.2f%% ratio, %5.2f%% saved. ",
+ (double)out_size / in_size,
+ ( 100.0 * in_size ) / out_size,
+ 100.0 - ( ( 100.0 * in_size ) / out_size ) );
+ if( verbosity >= 3 )
+ std::fprintf( stderr, "%9llu out, %8llu in. ", out_size, in_size );
+ }
+ if( verbosity >= 1 ) std::fputs( testing ? "ok\n" : "done\n", stderr );
+ }
+
+
+namespace {
+
+struct Worker_arg
+ {
+ const Lzip_index * lzip_index;
+ const Pretty_print * pp;
+ Shared_retval * shared_retval;
+ int worker_id;
+ int num_workers;
+ int infd;
+ int outfd;
+ bool nocopy; // avoid copying decompressed data when testing
+ };
+
+
+/* Read members from input file, decompress their contents, and write to
+ output file the data produced.
+*/
+extern "C" void * dworker( void * arg )
+ {
+ const Worker_arg & tmp = *(const Worker_arg *)arg;
+ const Lzip_index & lzip_index = *tmp.lzip_index;
+ const Pretty_print & pp = *tmp.pp;
+ Shared_retval & shared_retval = *tmp.shared_retval;
+ const int worker_id = tmp.worker_id;
+ const int num_workers = tmp.num_workers;
+ const int infd = tmp.infd;
+ const int outfd = tmp.outfd;
+ const bool nocopy = tmp.nocopy;
+ const int buffer_size = 65536;
+
+ uint8_t * const ibuffer = new( std::nothrow ) uint8_t[buffer_size];
+ uint8_t * const obuffer =
+ nocopy ? 0 : new( std::nothrow ) uint8_t[buffer_size];
+ LZ_Decoder * const decoder = LZ_decompress_open();
+ if( !ibuffer || ( !nocopy && !obuffer ) || !decoder ||
+ LZ_decompress_errno( decoder ) != LZ_ok )
+ { if( shared_retval.set_value( 1 ) ) { pp( mem_msg ); } goto done; }
+
+ for( long i = worker_id; i < lzip_index.members(); i += num_workers )
+ {
+ long long data_pos = lzip_index.dblock( i ).pos();
+ long long data_rest = lzip_index.dblock( i ).size();
+ long long member_pos = lzip_index.mblock( i ).pos();
+ long long member_rest = lzip_index.mblock( i ).size();
+
+ while( member_rest > 0 )
+ {
+ if( shared_retval() ) goto done; // other worker found a problem
+ while( LZ_decompress_write_size( decoder ) > 0 )
+ {
+ const int size = std::min( LZ_decompress_write_size( decoder ),
+ (int)std::min( (long long)buffer_size, member_rest ) );
+ if( size > 0 )
+ {
+ if( preadblock( infd, ibuffer, size, member_pos ) != size )
+ { if( shared_retval.set_value( 1 ) )
+ { pp(); show_error( "Read error", errno ); } goto done; }
+ member_pos += size;
+ member_rest -= size;
+ if( LZ_decompress_write( decoder, ibuffer, size ) != size )
+ internal_error( "library error (LZ_decompress_write)." );
+ }
+ if( member_rest <= 0 ) { LZ_decompress_finish( decoder ); break; }
+ }
+ while( true ) // write decompressed data to file
+ {
+ const int rd = LZ_decompress_read( decoder, obuffer, buffer_size );
+ if( rd < 0 )
+ { decompress_error( decoder, pp, shared_retval, worker_id );
+ goto done; }
+ if( rd > 0 && outfd >= 0 )
+ {
+ const int wr = pwriteblock( outfd, obuffer, rd, data_pos );
+ if( wr != rd )
+ {
+ if( shared_retval.set_value( 1 ) ) { pp();
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "Write error in worker %d: %s\n",
+ worker_id, std::strerror( errno ) ); }
+ goto done;
+ }
+ }
+ if( rd > 0 )
+ {
+ data_pos += rd;
+ data_rest -= rd;
+ }
+ if( LZ_decompress_finished( decoder ) == 1 )
+ {
+ if( data_rest != 0 )
+ internal_error( "final data_rest is not zero." );
+ LZ_decompress_reset( decoder ); // prepare for new member
+ break;
+ }
+ if( rd == 0 ) break;
+ }
+ }
+ show_progress( lzip_index.mblock( i ).size() );
+ }
+done:
+ if( obuffer ) { delete[] obuffer; } delete[] ibuffer;
+ if( LZ_decompress_member_position( decoder ) != 0 &&
+ shared_retval.set_value( 1 ) )
+ pp( "Error, some data remains in decoder." );
+ if( LZ_decompress_close( decoder ) < 0 && shared_retval.set_value( 1 ) )
+ pp( "LZ_decompress_close failed." );
+ return 0;
+ }
+
+} // end namespace
+
+
+// start the workers and wait for them to finish.
+int decompress( const unsigned long long cfile_size, int num_workers,
+ const int infd, const int outfd, const Cl_options & cl_opts,
+ const Pretty_print & pp, const int debug_level,
+ const int in_slots, const int out_slots,
+ const bool infd_isreg, const bool one_to_one )
+ {
+ if( !infd_isreg )
+ return dec_stream( cfile_size, num_workers, infd, outfd, cl_opts, pp,
+ debug_level, in_slots, out_slots );
+
+ const Lzip_index lzip_index( infd, cl_opts );
+ if( lzip_index.retval() == 1 ) // decompress as stream if seek fails
+ {
+ lseek( infd, 0, SEEK_SET );
+ return dec_stream( cfile_size, num_workers, infd, outfd, cl_opts, pp,
+ debug_level, in_slots, out_slots );
+ }
+ if( lzip_index.retval() != 0 ) // corrupt or invalid input file
+ {
+ if( lzip_index.bad_magic() )
+ show_file_error( pp.name(), lzip_index.error().c_str() );
+ else pp( lzip_index.error().c_str() );
+ return lzip_index.retval();
+ }
+
+ if( num_workers > lzip_index.members() ) num_workers = lzip_index.members();
+
+ if( outfd >= 0 )
+ {
+ struct stat st;
+ if( !one_to_one || fstat( outfd, &st ) != 0 || !S_ISREG( st.st_mode ) ||
+ lseek( outfd, 0, SEEK_CUR ) < 0 )
+ {
+ if( debug_level & 2 ) std::fputs( "decompress file to stdout.\n", stderr );
+ if( verbosity >= 1 ) pp();
+ show_progress( 0, cfile_size, &pp ); // init
+ return dec_stdout( num_workers, infd, outfd, pp, debug_level, out_slots,
+ lzip_index );
+ }
+ }
+
+ if( debug_level & 2 ) std::fputs( "decompress file to file.\n", stderr );
+ if( verbosity >= 1 ) pp();
+ show_progress( 0, cfile_size, &pp ); // init
+
+ Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers];
+ pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers];
+ if( !worker_args || !worker_threads )
+ { pp( mem_msg ); delete[] worker_threads; delete[] worker_args; return 1; }
+
+#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
+ const bool nocopy = ( outfd < 0 && LZ_api_version() >= 1012 );
+#else
+ const bool nocopy = false;
+#endif
+
+ Shared_retval shared_retval;
+ int i = 0; // number of workers started
+ for( ; i < num_workers; ++i )
+ {
+ worker_args[i].lzip_index = &lzip_index;
+ worker_args[i].pp = &pp;
+ worker_args[i].shared_retval = &shared_retval;
+ worker_args[i].worker_id = i;
+ worker_args[i].num_workers = num_workers;
+ worker_args[i].infd = infd;
+ worker_args[i].outfd = outfd;
+ worker_args[i].nocopy = nocopy;
+ const int errcode =
+ pthread_create( &worker_threads[i], 0, dworker, &worker_args[i] );
+ if( errcode )
+ { if( shared_retval.set_value( 1 ) )
+ { show_error( "Can't create worker threads", errcode ); } break; }
+ }
+
+ while( --i >= 0 )
+ {
+ const int errcode = pthread_join( worker_threads[i], 0 );
+ if( errcode && shared_retval.set_value( 1 ) )
+ show_error( "Can't join worker threads", errcode );
+ }
+ delete[] worker_threads;
+ delete[] worker_args;
+
+ if( shared_retval() ) return shared_retval(); // some thread found a problem
+
+ if( verbosity >= 1 )
+ show_results( lzip_index.cdata_size(), lzip_index.udata_size(),
+ lzip_index.dictionary_size(), outfd < 0 );
+
+ if( debug_level & 1 )
+ std::fprintf( stderr,
+ "workers started %8u\n", num_workers );
+
+ return 0;
+ }
diff --git a/doc/plzip.1 b/doc/plzip.1
new file mode 100644
index 0000000..3985e5b
--- /dev/null
+++ b/doc/plzip.1
@@ -0,0 +1,148 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
+.TH PLZIP "1" "January 2024" "plzip 1.11" "User Commands"
+.SH NAME
+plzip \- reduces the size of files
+.SH SYNOPSIS
+.B plzip
+[\fI\,options\/\fR] [\fI\,files\/\fR]
+.SH DESCRIPTION
+Plzip is a massively parallel (multi\-threaded) implementation of lzip,
+compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
+.PP
+Lzip is a lossless data compressor with a user interface similar to the one
+of gzip or bzip2. Lzip uses a simplified form of the 'Lempel\-Ziv\-Markov
+chain\-Algorithm' (LZMA) stream format to maximize interoperability. The
+maximum dictionary size is 512 MiB so that any lzip file can be decompressed
+on 32\-bit machines. Lzip provides accurate and robust 3\-factor integrity
+checking. Lzip can compress about as fast as gzip (lzip \fB\-0\fR) or compress most
+files more than bzip2 (lzip \fB\-9\fR). Decompression speed is intermediate between
+gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
+perspective. Lzip has been designed, written, and tested with great care to
+replace gzip and bzip2 as the standard general\-purpose compressed format for
+Unix\-like systems.
+.PP
+Plzip can compress/decompress large files on multiprocessor machines much
+faster than lzip, at the cost of a slightly reduced compression ratio (0.4
+to 2 percent larger compressed files). Note that the number of usable
+threads is limited by file size; on files larger than a few GB plzip can use
+hundreds of processors, but on files of only a few MB plzip is no faster
+than lzip.
+.SH OPTIONS
+.TP
+\fB\-h\fR, \fB\-\-help\fR
+display this help and exit
+.TP
+\fB\-V\fR, \fB\-\-version\fR
+output version information and exit
+.TP
+\fB\-a\fR, \fB\-\-trailing\-error\fR
+exit with error status if trailing data
+.TP
+\fB\-B\fR, \fB\-\-data\-size=\fR<bytes>
+set size of input data blocks [2x8=16 MiB]
+.TP
+\fB\-c\fR, \fB\-\-stdout\fR
+write to standard output, keep input files
+.TP
+\fB\-d\fR, \fB\-\-decompress\fR
+decompress, test compressed file integrity
+.TP
+\fB\-f\fR, \fB\-\-force\fR
+overwrite existing output files
+.TP
+\fB\-F\fR, \fB\-\-recompress\fR
+force re\-compression of compressed files
+.TP
+\fB\-k\fR, \fB\-\-keep\fR
+keep (don't delete) input files
+.TP
+\fB\-l\fR, \fB\-\-list\fR
+print (un)compressed file sizes
+.TP
+\fB\-m\fR, \fB\-\-match\-length=\fR<bytes>
+set match length limit in bytes [36]
+.TP
+\fB\-n\fR, \fB\-\-threads=\fR<n>
+set number of (de)compression threads [2]
+.TP
+\fB\-o\fR, \fB\-\-output=\fR<file>
+write to <file>, keep input files
+.TP
+\fB\-q\fR, \fB\-\-quiet\fR
+suppress all messages
+.TP
+\fB\-s\fR, \fB\-\-dictionary\-size=\fR<bytes>
+set dictionary size limit in bytes [8 MiB]
+.TP
+\fB\-t\fR, \fB\-\-test\fR
+test compressed file integrity
+.TP
+\fB\-v\fR, \fB\-\-verbose\fR
+be verbose (a 2nd \fB\-v\fR gives more)
+.TP
+\fB\-0\fR .. \fB\-9\fR
+set compression level [default 6]
+.TP
+\fB\-\-fast\fR
+alias for \fB\-0\fR
+.TP
+\fB\-\-best\fR
+alias for \fB\-9\fR
+.TP
+\fB\-\-loose\-trailing\fR
+allow trailing data seeming corrupt header
+.TP
+\fB\-\-in\-slots=\fR<n>
+number of 1 MiB input packets buffered [4]
+.TP
+\fB\-\-out\-slots=\fR<n>
+number of 1 MiB output packets buffered [64]
+.TP
+\fB\-\-check\-lib\fR
+compare version of lzlib.h with liblz.{a,so}
+.PP
+If no file names are given, or if a file is '\-', plzip compresses or
+decompresses from standard input to standard output.
+Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,
+Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...
+Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 to
+2^29 bytes.
+.PP
+The bidimensional parameter space of LZMA can't be mapped to a linear scale
+optimal for all files. If your files are large, very repetitive, etc, you
+may need to use the options \fB\-\-dictionary\-size\fR and \fB\-\-match\-length\fR directly
+to achieve optimal performance.
+.PP
+To extract all the files from archive 'foo.tar.lz', use the commands
+\&'tar \fB\-xf\fR foo.tar.lz' or 'plzip \fB\-cd\fR foo.tar.lz | tar \fB\-xf\fR \-'.
+.PP
+Exit status: 0 for a normal exit, 1 for environmental problems
+(file not found, invalid command\-line options, I/O errors, etc), 2 to
+indicate a corrupt or invalid input file, 3 for an internal consistency
+error (e.g., bug) which caused plzip to panic.
+.SH "REPORTING BUGS"
+Report bugs to lzip\-bug@nongnu.org
+.br
+Plzip home page: http://www.nongnu.org/lzip/plzip.html
+.SH COPYRIGHT
+Copyright \(co 2009 Laszlo Ersek.
+.br
+Copyright \(co 2024 Antonio Diaz Diaz.
+License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
+.br
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+Using lzlib 1.14
+Using LZ_API_VERSION = 1014
+.SH "SEE ALSO"
+The full documentation for
+.B plzip
+is maintained as a Texinfo manual. If the
+.B info
+and
+.B plzip
+programs are properly installed at your site, the command
+.IP
+.B info plzip
+.PP
+should give you access to the complete manual.
diff --git a/doc/plzip.info b/doc/plzip.info
new file mode 100644
index 0000000..becd133
--- /dev/null
+++ b/doc/plzip.info
@@ -0,0 +1,833 @@
+This is plzip.info, produced by makeinfo version 4.13+ from plzip.texi.
+
+INFO-DIR-SECTION Compression
+START-INFO-DIR-ENTRY
+* Plzip: (plzip). Massively parallel implementation of lzip
+END-INFO-DIR-ENTRY
+
+
+File: plzip.info, Node: Top, Next: Introduction, Up: (dir)
+
+Plzip Manual
+************
+
+This manual is for Plzip (version 1.11, 21 January 2024).
+
+* Menu:
+
+* Introduction:: Purpose and features of plzip
+* Output:: Meaning of plzip's output
+* Invoking plzip:: Command-line interface
+* Program design:: Internal structure of plzip
+* Memory requirements:: Memory required to compress and decompress
+* Minimum file sizes:: Minimum file sizes required for full speed
+* File format:: Detailed format of the compressed file
+* Trailing data:: Extra data appended to the file
+* Examples:: A small tutorial with examples
+* Problems:: Reporting bugs
+* Concept index:: Index of concepts
+
+
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This manual is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
+
+
+File: plzip.info, Node: Introduction, Next: Output, Prev: Top, Up: Top
+
+1 Introduction
+**************
+
+Plzip is a massively parallel (multi-threaded) implementation of lzip,
+compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.
+
+ Lzip is a lossless data compressor with a user interface similar to the
+one of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
+chain-Algorithm' (LZMA) stream format to maximize interoperability. The
+maximum dictionary size is 512 MiB so that any lzip file can be decompressed
+on 32-bit machines. Lzip provides accurate and robust 3-factor integrity
+checking. Lzip can compress about as fast as gzip (lzip -0) or compress most
+files more than bzip2 (lzip -9). Decompression speed is intermediate between
+gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
+perspective. Lzip has been designed, written, and tested with great care to
+replace gzip and bzip2 as the standard general-purpose compressed format for
+Unix-like systems.
+
+ Plzip can compress/decompress large files on multiprocessor machines much
+faster than lzip, at the cost of a slightly reduced compression ratio (0.4
+to 2 percent larger compressed files). Note that the number of usable
+threads is limited by file size; on files larger than a few GB plzip can use
+hundreds of processors, but on files of only a few MB plzip is no faster
+than lzip. *Note Minimum file sizes::.
+
+ For creation and manipulation of compressed tar archives tarlz can be
+more efficient than using tar and plzip because tarlz is able to keep the
+alignment between tar members and lzip members. *Note tarlz manual:
+(tarlz)Top.
+
+ The lzip file format is designed for data sharing and long-term
+archiving, taking into account both data integrity and decoder availability:
+
+ * The lzip format provides very safe integrity checking and some data
+ recovery means. The program lziprecover can repair bit flip errors
+ (one of the most common forms of data corruption) in lzip files, and
+ provides data recovery capabilities, including error-checked merging
+ of damaged copies of a file. *Note Data safety: (lziprecover)Data
+ safety.
+
+ * The lzip format is as simple as possible (but not simpler). The lzip
+ manual provides the source code of a simple decompressor along with a
+ detailed explanation of how it works, so that with the only help of the
+ lzip manual it would be possible for a digital archaeologist to extract
+ the data from a lzip file long after quantum computers eventually
+ render LZMA obsolete.
+
+ * Additionally the lzip reference implementation is copylefted, which
+ guarantees that it will remain free forever.
+
+ A nice feature of the lzip format is that a corrupt byte is easier to
+repair the nearer it is from the beginning of the file. Therefore, with the
+help of lziprecover, losing an entire archive just because of a corrupt
+byte near the beginning is a thing of the past.
+
+ Plzip uses the same well-defined exit status values used by lzip, which
+makes it safer than compressors returning ambiguous warning values (like
+gzip) when it is used as a back end for other programs like tar or zutils.
+
+ Plzip automatically uses for each file the largest dictionary size that
+does not exceed neither the file size nor the limit given. Keep in mind
+that the decompression memory requirement is affected at compression time
+by the choice of dictionary size limit. *Note Memory requirements::.
+
+ When compressing, plzip replaces every file given in the command line
+with a compressed version of itself, with the name "original_name.lz". When
+decompressing, plzip attempts to guess the name for the decompressed file
+from that of the compressed file as follows:
+
+filename.lz becomes filename
+filename.tlz becomes filename.tar
+anyothername becomes anyothername.out
+
+ (De)compressing a file is much like copying or moving it. Therefore plzip
+preserves the access and modification dates, permissions, and, if you have
+appropriate privileges, ownership of the file just as 'cp -p' does. (If the
+user ID or the group ID can't be duplicated, the file permission bits
+S_ISUID and S_ISGID are cleared).
+
+ Plzip is able to read from some types of non-regular files if either the
+option '-c' or the option '-o' is specified.
+
+ Plzip refuses to read compressed data from a terminal or write compressed
+data to a terminal, as this would be entirely incomprehensible and might
+leave the terminal in an abnormal state.
+
+ Plzip correctly decompresses a file which is the concatenation of two or
+more compressed files. The result is the concatenation of the corresponding
+decompressed files. Integrity testing of concatenated compressed files is
+also supported.
+
+
+File: plzip.info, Node: Output, Next: Invoking plzip, Prev: Introduction, Up: Top
+
+2 Meaning of plzip's output
+***************************
+
+The output of plzip looks like this:
+
+ plzip -v foo
+ foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
+
+ plzip -tvvv foo.lz
+ foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok
+
+ The meaning of each field is as follows:
+
+'N:1'
+ The compression ratio (uncompressed_size / compressed_size), shown as
+ N to 1.
+
+'ratio'
+ The inverse compression ratio (compressed_size / uncompressed_size),
+ shown as a percentage. A decimal ratio is easily obtained by moving the
+ decimal point two places to the left; 14.98% = 0.1498.
+
+'saved'
+ The space saved by compression (1 - ratio), shown as a percentage.
+
+'in'
+ Size of the input data. This is the uncompressed size when
+ compressing, or the compressed size when decompressing or testing.
+ Note that plzip always prints the uncompressed size before the
+ compressed size when compressing, decompressing, testing, or listing.
+
+'out'
+ Size of the output data. This is the compressed size when compressing,
+ or the decompressed size when decompressing or testing.
+
+
+ When decompressing or testing at verbosity level 4 (-vvvv), the
+dictionary size used to compress the file is also shown.
+
+ LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
+have been compressed. Decompressed is used to refer to data which have
+undergone the process of decompression.
+
+
+File: plzip.info, Node: Invoking plzip, Next: Program design, Prev: Output, Up: Top
+
+3 Invoking plzip
+****************
+
+The format for running plzip is:
+
+ plzip [OPTIONS] [FILES]
+
+If no file names are specified, plzip compresses (or decompresses) from
+standard input to standard output. A hyphen '-' used as a FILE argument
+means standard input. It can be mixed with other FILES and is read just
+once, the first time it appears in the command line. Remember to prepend
+'./' to any file name beginning with a hyphen, or use '--'.
+
+ plzip supports the following options: *Note Argument syntax:
+(arg_parser)Argument syntax.
+
+'-h'
+'--help'
+ Print an informative help message describing the options and exit.
+
+'-V'
+'--version'
+ Print the version number of plzip on the standard output and exit.
+ This version number should be included in all bug reports.
+
+'-a'
+'--trailing-error'
+ Exit with error status 2 if any remaining input is detected after
+ decompressing the last member. Such remaining input is usually trailing
+ garbage that can be safely ignored. *Note concat-example::.
+
+'-B BYTES'
+'--data-size=BYTES'
+ When compressing, set the size in bytes of the input data blocks. The
+ input file is divided in chunks of this size before compression is
+ performed. Valid values range from 8 KiB to 1 GiB. Default value is
+ two times the dictionary size, except for option '-0' where it
+ defaults to 1 MiB. Plzip reduces the dictionary size if it is larger
+ than the data size specified. *Note Minimum file sizes::.
+
+'-c'
+'--stdout'
+ Compress or decompress to standard output; keep input files unchanged.
+ If compressing several files, each file is compressed independently.
+ (The output consists of a sequence of independently compressed
+ members). This option (or '-o') is needed when reading from a named
+ pipe (fifo) or from a device. Use 'lziprecover -cd -i' to recover as
+ much of the decompressed data as possible when decompressing a corrupt
+ file. '-c' overrides '-o'. '-c' has no effect when testing or listing.
+
+'-d'
+'--decompress'
+ Decompress the files specified. The integrity of the files specified is
+ checked. If a file does not exist, can't be opened, or the destination
+ file already exists and '--force' has not been specified, plzip
+ continues decompressing the rest of the files and exits with error
+ status 1. If a file fails to decompress, or is a terminal, plzip exits
+ immediately with error status 2 without decompressing the rest of the
+ files. A terminal is considered an uncompressed file, and therefore
+ invalid.
+
+'-f'
+'--force'
+ Force overwrite of output files.
+
+'-F'
+'--recompress'
+ When compressing, force re-compression of files whose name already has
+ the '.lz' or '.tlz' suffix.
+
+'-k'
+'--keep'
+ Keep (don't delete) input files during compression or decompression.
+
+'-l'
+'--list'
+ Print the uncompressed size, compressed size, and percentage saved of
+ the files specified. Trailing data are ignored. The values produced
+ are correct even for multimember files. If more than one file is
+ given, a final line containing the cumulative sizes is printed. With
+ '-v', the dictionary size, the number of members in the file, and the
+ amount of trailing data (if any) are also printed. With '-vv', the
+ positions and sizes of each member in multimember files are also
+ printed.
+
+ If any file is damaged, does not exist, can't be opened, or is not
+ regular, the final exit status is > 0. '-lq' can be used to check
+ quickly (without decompressing) the structural integrity of the files
+ specified. (Use '--test' to check the data integrity). '-alq'
+ additionally checks that none of the files specified contain trailing
+ data.
+
+'-m BYTES'
+'--match-length=BYTES'
+ When compressing, set the match length limit in bytes. After a match
+ this long is found, the search is finished. Valid values range from 5
+ to 273. Larger values usually give better compression ratios but
+ longer compression times.
+
+'-n N'
+'--threads=N'
+ Set the maximum number of worker threads, overriding the system's
+ default. Valid values range from 1 to "as many as your system can
+ support". If this option is not used, plzip tries to detect the number
+ of processors in the system and use it as default value. When
+ compressing on a 32 bit system, plzip tries to limit the memory use to
+ under 2.22 GiB (4 worker threads at level -9) by reducing the number
+ of threads below the system's default. 'plzip --help' shows the
+ system's default value.
+
+ Plzip starts the number of threads required by each file without
+ exceeding the value specified. Note that the number of usable threads
+ is limited to ceil( file_size / data_size ) during compression (*note
+ Minimum file sizes::), and to the number of members in the input
+ during decompression. You can find the number of members in a lzip
+ file by running 'plzip -lv file.lz'.
+
+'-o FILE'
+'--output=FILE'
+ If '-c' has not been also specified, write the (de)compressed output
+ to FILE, automatically creating any missing parent directories; keep
+ input files unchanged. If compressing several files, each file is
+ compressed independently. (The output consists of a sequence of
+ independently compressed members). This option (or '-c') is needed
+ when reading from a named pipe (fifo) or from a device. '-o -' is
+ equivalent to '-c'. '-o' has no effect when testing or listing.
+
+ In order to keep backward compatibility with plzip versions prior to
+ 1.9, when compressing from standard input and no other file names are
+ given, the extension '.lz' is appended to FILE unless it already ends
+ in '.lz' or '.tlz'. This feature will be removed in a future version
+ of plzip. Meanwhile, redirection may be used instead of '-o' to write
+ the compressed output to a file without the extension '.lz' in its
+ name: 'plzip < file > foo'.
+
+'-q'
+'--quiet'
+ Quiet operation. Suppress all messages.
+
+'-s BYTES'
+'--dictionary-size=BYTES'
+ When compressing, set the dictionary size limit in bytes. Plzip uses
+ for each file the largest dictionary size that does not exceed neither
+ the file size nor this limit. Valid values range from 4 KiB to 512 MiB.
+ Values 12 to 29 are interpreted as powers of two, meaning 2^12 to 2^29
+ bytes. Dictionary sizes are quantized so that they can be coded in
+ just one byte (*note coded-dict-size::). If the size specified does
+ not match one of the valid sizes, it is rounded upwards by adding up
+ to (BYTES / 8) to it.
+
+ For maximum compression you should use a dictionary size limit as large
+ as possible, but keep in mind that the decompression memory requirement
+ is affected at compression time by the choice of dictionary size limit.
+
+'-t'
+'--test'
+ Check integrity of the files specified, but don't decompress them. This
+ really performs a trial decompression and throws away the result. Use
+ it together with '-v' to see information about the files. If a file
+ fails the test, does not exist, can't be opened, or is a terminal,
+ plzip continues testing the rest of the files. A final diagnostic is
+ shown at verbosity level 1 or higher if any file fails the test when
+ testing multiple files.
+
+'-v'
+'--verbose'
+ Verbose mode.
+ When compressing, show the compression ratio and size for each file
+ processed.
+ When decompressing or testing, further -v's (up to 4) increase the
+ verbosity level, showing status, compression ratio, dictionary size,
+ decompressed size, and compressed size.
+ Two or more '-v' options show the progress of (de)compression, except
+ for single-member files.
+
+'-0 .. -9'
+ Compression level. Set the compression parameters (dictionary size and
+ match length limit) as shown in the table below. The default
+ compression level is '-6', equivalent to '-s8MiB -m36'. Note that '-9'
+ can be much slower than '-0'. These options have no effect when
+ decompressing, testing, or listing.
+
+ The bidimensional parameter space of LZMA can't be mapped to a linear
+ scale optimal for all files. If your files are large, very repetitive,
+ etc, you may need to use the options '--dictionary-size' and
+ '--match-length' directly to achieve optimal performance.
+
+ If several compression levels or '-s' or '-m' options are given, the
+ last setting is used. For example '-9 -s64MiB' is equivalent to
+ '-s64MiB -m273'
+
+ Level Dictionary size (-s) Match length limit (-m)
+ -0 64 KiB 16 bytes
+ -1 1 MiB 5 bytes
+ -2 1.5 MiB 6 bytes
+ -3 2 MiB 8 bytes
+ -4 3 MiB 12 bytes
+ -5 4 MiB 20 bytes
+ -6 8 MiB 36 bytes
+ -7 16 MiB 68 bytes
+ -8 24 MiB 132 bytes
+ -9 32 MiB 273 bytes
+
+'--fast'
+'--best'
+ Aliases for GNU gzip compatibility.
+
+'--loose-trailing'
+ When decompressing, testing, or listing, allow trailing data whose
+ first bytes are so similar to the magic bytes of a lzip header that
+ they can be confused with a corrupt header. Use this option if a file
+ triggers a "corrupt header" error and the cause is not indeed a
+ corrupt header.
+
+'--in-slots=N'
+ Number of 1 MiB input packets buffered per worker thread when
+ decompressing from non-seekable input. Increasing the number of packets
+ may increase decompression speed, but requires more memory. Valid
+ values range from 1 to 64. The default value is 4.
+
+'--out-slots=N'
+ Number of 1 MiB output packets buffered per worker thread when
+ decompressing to non-seekable output. Increasing the number of packets
+ may increase decompression speed, but requires more memory. Valid
+ values range from 1 to 1024. The default value is 64.
+
+'--check-lib'
+ Compare the version of lzlib used to compile plzip with the version
+ actually being used at run time and exit. Report any differences
+ found. Exit with error status 1 if differences are found. A mismatch
+ may indicate that lzlib is not correctly installed or that a different
+ version of lzlib has been installed after compiling plzip. Exit with
+ error status 2 if LZ_API_VERSION and LZ_version_string don't match.
+ 'plzip -v --check-lib' shows the version of lzlib being used and the
+ value of LZ_API_VERSION (if defined). *Note Library version:
+ (lzlib)Library version.
+
+
+ Numbers given as arguments to options may be expressed in decimal,
+hexadecimal, or octal (using the same syntax as integer constants in C++),
+and may be followed by a multiplier and an optional 'B' for "byte".
+
+ Table of SI and binary prefixes (unit multipliers):
+
+Prefix Value | Prefix Value
+k kilobyte (10^3 = 1000) | Ki kibibyte (2^10 = 1024)
+M megabyte (10^6) | Mi mebibyte (2^20)
+G gigabyte (10^9) | Gi gibibyte (2^30)
+T terabyte (10^12) | Ti tebibyte (2^40)
+P petabyte (10^15) | Pi pebibyte (2^50)
+E exabyte (10^18) | Ei exbibyte (2^60)
+Z zettabyte (10^21) | Zi zebibyte (2^70)
+Y yottabyte (10^24) | Yi yobibyte (2^80)
+R ronnabyte (10^27) | Ri robibyte (2^90)
+Q quettabyte (10^30) | Qi quebibyte (2^100)
+
+
+ Exit status: 0 for a normal exit, 1 for environmental problems (file not
+found, invalid command-line options, I/O errors, etc), 2 to indicate a
+corrupt or invalid input file, 3 for an internal consistency error (e.g.,
+bug) which caused plzip to panic.
+
+
+File: plzip.info, Node: Program design, Next: Memory requirements, Prev: Invoking plzip, Up: Top
+
+4 Internal structure of plzip
+*****************************
+
+When compressing, plzip divides the input file into chunks and compresses as
+many chunks simultaneously as worker threads are chosen, creating a
+multimember compressed file. Each chunk is compressed in-place (using the
+same buffer for input and output), reducing the amount of RAM required.
+
+ When decompressing, plzip decompresses as many members simultaneously as
+worker threads are chosen. Files that were compressed with lzip are not
+decompressed faster than using lzip (unless the option '-b' was used)
+because lzip usually produces single-member files, which can't be
+decompressed in parallel.
+
+ For each input file, a splitter thread and several worker threads are
+created, acting the main thread as muxer (multiplexer) thread. A "packet
+courier" takes care of data transfers among threads and limits the maximum
+number of data blocks (packets) being processed simultaneously.
+
+ The splitter reads data blocks from the input file, and distributes them
+to the workers. The workers (de)compress the blocks received from the
+splitter. The muxer collects processed packets from the workers, and writes
+them to the output file.
+
+ .------------.
+ ,-->| worker 0 |--,
+ | `------------' |
+.-------. .----------. | .------------. | .-------. .--------.
+| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
+| file | `----------' | `------------' | `-------' | file |
+`-------' | ... | `--------'
+ | .------------. |
+ `-->| worker N-1 |--'
+ `------------'
+
+ When decompressing from a regular file, the splitter is removed and the
+workers read directly from the input file. If the output file is also a
+regular file, the muxer is also removed and the workers write directly to
+the output file. With these optimizations, the use of RAM is greatly
+reduced and the decompression speed of large files with many members is
+only limited by the number of processors available and by I/O speed.
+
+
+File: plzip.info, Node: Memory requirements, Next: Minimum file sizes, Prev: Program design, Up: Top
+
+5 Memory required to compress and decompress
+********************************************
+
+The amount of memory required *per worker thread* for decompression or
+testing is approximately the following:
+
+ * For decompression of a regular (seekable) file to another regular file,
+ or for testing of a regular file; the dictionary size.
+
+ * For testing of a non-seekable file or of standard input; the dictionary
+ size plus 1 MiB plus up to the number of 1 MiB input packets buffered
+ (4 by default).
+
+ * For decompression of a regular file to a non-seekable file or to
+ standard output; the dictionary size plus up to the number of 1 MiB
+ output packets buffered (64 by default).
+
+ * For decompression of a non-seekable file or of standard input; the
+ dictionary size plus 1 MiB plus up to the number of 1 MiB input and
+ output packets buffered (68 by default).
+
+The amount of memory required *per worker thread* for compression is
+approximately the following:
+
+ * For compression at level -0; 1.5 MiB plus 3.375 times the data size
+ (*note --data-size::). Default is 4.875 MiB.
+
+ * For compression at other levels; 11 times the dictionary size plus
+ 3.375 times the data size. Default is 142 MiB.
+
+The following table shows the memory required *per thread* for compression
+at a given level, using the default data size for each level:
+
+Level Memory required
+-0 4.875 MiB
+-1 17.75 MiB
+-2 26.625 MiB
+-3 35.5 MiB
+-4 53.25 MiB
+-5 71 MiB
+-6 142 MiB
+-7 284 MiB
+-8 426 MiB
+-9 568 MiB
+
+
+File: plzip.info, Node: Minimum file sizes, Next: File format, Prev: Memory requirements, Up: Top
+
+6 Minimum file sizes required for full compression speed
+********************************************************
+
+When compressing, plzip divides the input file into chunks and compresses
+as many chunks simultaneously as worker threads are chosen, creating a
+multimember compressed file.
+
+ For this to work as expected (and roughly multiply the compression speed
+by the number of available processors), the uncompressed file must be at
+least as large as the number of worker threads times the chunk size (*note
+--data-size::). Else some processors do not get any data to compress, and
+compression is proportionally slower. The maximum speed increase achievable
+on a given file is limited by the ratio (file_size / data_size). For
+example, a tarball the size of gcc or linux scales up to 10 or 14
+processors at level -9.
+
+ The following table shows the minimum uncompressed file size needed for
+full use of N processors at a given compression level, using the default
+data size for each level:
+
+Processors 2 4 8 16 64 256
+------------------------------------------------------------------
+Level
+-0 2 MiB 4 MiB 8 MiB 16 MiB 64 MiB 256 MiB
+-1 4 MiB 8 MiB 16 MiB 32 MiB 128 MiB 512 MiB
+-2 6 MiB 12 MiB 24 MiB 48 MiB 192 MiB 768 MiB
+-3 8 MiB 16 MiB 32 MiB 64 MiB 256 MiB 1 GiB
+-4 12 MiB 24 MiB 48 MiB 96 MiB 384 MiB 1.5 GiB
+-5 16 MiB 32 MiB 64 MiB 128 MiB 512 MiB 2 GiB
+-6 32 MiB 64 MiB 128 MiB 256 MiB 1 GiB 4 GiB
+-7 64 MiB 128 MiB 256 MiB 512 MiB 2 GiB 8 GiB
+-8 96 MiB 192 MiB 384 MiB 768 MiB 3 GiB 12 GiB
+-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB
+
+
+File: plzip.info, Node: File format, Next: Trailing data, Prev: Minimum file sizes, Up: Top
+
+7 File format
+*************
+
+Perfection is reached, not when there is no longer anything to add, but
+when there is no longer anything to take away.
+-- Antoine de Saint-Exupery
+
+
+ In the diagram below, a box like this:
+
++---+
+| | <-- the vertical bars might be missing
++---+
+
+ represents one byte; a box like this:
+
++==============+
+| |
++==============+
+
+ represents a variable number of bytes.
+
+
+ A lzip file consists of one or more independent "members" (compressed
+data sets). The members simply appear one after another in the file, with no
+additional information before, between, or after them. Each member can
+encode in compressed form up to 16 EiB - 1 byte of uncompressed data. The
+size of a multimember file is unlimited.
+
+ Each member has the following structure:
+
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ All multibyte values are stored in little endian order.
+
+'ID string (the "magic" bytes)'
+ A four byte string, identifying the lzip format, with the value "LZIP"
+ (0x4C, 0x5A, 0x49, 0x50).
+
+'VN (version number, 1 byte)'
+ Just in case something needs to be modified in the future. 1 for now.
+
+'DS (coded dictionary size, 1 byte)'
+ The dictionary size is calculated by taking a power of 2 (the base
+ size) and subtracting from it a fraction between 0/16 and 7/16 of the
+ base size.
+ Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).
+ Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
+ from the base size to obtain the dictionary size.
+ Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB
+ Valid values for dictionary size range from 4 KiB to 512 MiB.
+
+'LZMA stream'
+ The LZMA stream, finished by an "End Of Stream" marker. Uses default
+ values for encoder properties. *Note Stream format: (lzip)Stream
+ format, for a complete description.
+
+'CRC32 (4 bytes)'
+ Cyclic Redundancy Check (CRC) of the original uncompressed data.
+
+'Data size (8 bytes)'
+ Size of the original uncompressed data.
+
+'Member size (8 bytes)'
+ Total size of the member, including header and trailer. This field acts
+ as a distributed index, improves the checking of stream integrity, and
+ facilitates the safe recovery of undamaged members from multimember
+ files. Lzip limits the member size to 2 PiB to prevent the data size
+ field from overflowing.
+
+
+
+File: plzip.info, Node: Trailing data, Next: Examples, Prev: File format, Up: Top
+
+8 Extra data appended to the file
+*********************************
+
+Sometimes extra data are found appended to a lzip file after the last
+member. Such trailing data may be:
+
+ * Padding added to make the file size a multiple of some block size, for
+ example when writing to a tape. It is safe to append any amount of
+ padding zero bytes to a lzip file.
+
+ * Useful data added by the user; an "End Of File" string (to check that
+ the file has not been truncated), a cryptographically secure hash, a
+ description of file contents, etc. It is safe to append any amount of
+ text to a lzip file as long as none of the first four bytes of the
+ text matches the corresponding byte in the string "LZIP", and the text
+ does not contain any zero bytes (null characters). Nonzero bytes and
+ zero bytes can't be safely mixed in trailing data.
+
+ * Garbage added by some not totally successful copy operation.
+
+ * Malicious data added to the file in order to make its total size and
+ hash value (for a chosen hash) coincide with those of another file.
+
+ * In rare cases, trailing data could be the corrupt header of another
+ member. In multimember or concatenated files the probability of
+ corruption happening in the magic bytes is 5 times smaller than the
+ probability of getting a false positive caused by the corruption of the
+ integrity information itself. Therefore it can be considered to be
+ below the noise level. Additionally, the test used by plzip to
+ discriminate trailing data from a corrupt header has a Hamming
+ distance (HD) of 3, and the 3 bit flips must happen in different magic
+ bytes for the test to fail. In any case, the option '--trailing-error'
+ guarantees that any corrupt header is detected.
+
+ Trailing data are in no way part of the lzip file format, but tools
+reading lzip files are expected to behave as correctly and usefully as
+possible in the presence of trailing data.
+
+ Trailing data can be safely ignored in most cases. In some cases, like
+that of user-added data, they are expected to be ignored. In those cases
+where a file containing trailing data must be rejected, the option
+'--trailing-error' can be used. *Note --trailing-error::.
+
+
+File: plzip.info, Node: Examples, Next: Problems, Prev: Trailing data, Up: Top
+
+9 A small tutorial with examples
+********************************
+
+WARNING! Even if plzip is bug-free, other causes may result in a corrupt
+compressed file (bugs in the system libraries, memory errors, etc).
+Therefore, if the data you are going to compress are important, give the
+option '--keep' to plzip and don't remove the original file until you check
+the compressed file with a command like 'plzip -cd file.lz | cmp file -'.
+Most RAM errors happening during compression can only be detected by
+comparing the compressed file with the original because the corruption
+happens before plzip compresses the RAM contents, resulting in a valid
+compressed file containing wrong data.
+
+
+Example 1: Extract all the files from archive 'foo.tar.lz'.
+
+ tar -xf foo.tar.lz
+ or
+ plzip -cd foo.tar.lz | tar -xf -
+
+
+Example 2: Replace a regular file with its compressed version 'file.lz' and
+show the compression ratio.
+
+ plzip -v file
+
+
+Example 3: Like example 2 but the created 'file.lz' has a block size of
+1 MiB. The compression ratio is not shown.
+
+ plzip -B 1MiB file
+
+
+Example 4: Restore a regular file from its compressed version 'file.lz'. If
+the operation is successful, 'file.lz' is removed.
+
+ plzip -d file.lz
+
+
+Example 5: Check the integrity of the compressed file 'file.lz' and show
+status.
+
+ plzip -tv file.lz
+
+
+Example 6: The right way of concatenating the decompressed output of two or
+more compressed files. *Note Trailing data::.
+
+ Don't do this
+ cat file1.lz file2.lz file3.lz | plzip -d -
+ Do this instead
+ plzip -cd file1.lz file2.lz file3.lz
+
+
+Example 7: Decompress 'file.lz' partially until 10 KiB of decompressed data
+are produced.
+
+ plzip -cd file.lz | dd bs=1024 count=10
+
+
+Example 8: Decompress 'file.lz' partially from decompressed byte at offset
+10000 to decompressed byte at offset 14999 (5000 bytes are produced).
+
+ plzip -cd file.lz | dd bs=1000 skip=10 count=5
+
+
+Example 9: Compress a whole device in /dev/sdc and send the output to
+'file.lz'.
+
+ plzip -c /dev/sdc > file.lz
+ or
+ plzip /dev/sdc -o file.lz
+
+
+File: plzip.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
+
+10 Reporting bugs
+*****************
+
+There are probably bugs in plzip. There are certainly errors and omissions
+in this manual. If you report them, they will get fixed. If you don't, no
+one will ever know about them and they will remain unfixed for all
+eternity, if not longer.
+
+ If you find a bug in plzip, please send electronic mail to
+<lzip-bug@nongnu.org>. Include the version number, which you can find by
+running 'plzip --version' and 'plzip -v --check-lib'.
+
+
+File: plzip.info, Node: Concept index, Prev: Problems, Up: Top
+
+Concept index
+*************
+
+
+* Menu:
+
+* bugs: Problems. (line 6)
+* examples: Examples. (line 6)
+* file format: File format. (line 6)
+* getting help: Problems. (line 6)
+* introduction: Introduction. (line 6)
+* invoking: Invoking plzip. (line 6)
+* memory requirements: Memory requirements. (line 6)
+* minimum file sizes: Minimum file sizes. (line 6)
+* options: Invoking plzip. (line 6)
+* output: Output. (line 6)
+* program design: Program design. (line 6)
+* trailing data: Trailing data. (line 6)
+* usage: Invoking plzip. (line 6)
+* version: Invoking plzip. (line 6)
+
+
+
+Tag Table:
+Node: Top217
+Node: Introduction1156
+Node: Output5934
+Node: Invoking plzip7497
+Ref: --trailing-error8372
+Ref: --data-size8610
+Node: Program design19519
+Node: Memory requirements21818
+Node: Minimum file sizes23503
+Node: File format25506
+Ref: coded-dict-size26945
+Node: Trailing data28195
+Node: Examples30531
+Ref: concat-example31964
+Node: Problems32721
+Node: Concept index33276
+
+End Tag Table
+
+
+Local Variables:
+coding: iso-8859-15
+End:
diff --git a/doc/plzip.texi b/doc/plzip.texi
new file mode 100644
index 0000000..323fad1
--- /dev/null
+++ b/doc/plzip.texi
@@ -0,0 +1,907 @@
+\input texinfo @c -*-texinfo-*-
+@c %**start of header
+@setfilename plzip.info
+@documentencoding ISO-8859-15
+@settitle Plzip Manual
+@finalout
+@c %**end of header
+
+@set UPDATED 21 January 2024
+@set VERSION 1.11
+
+@dircategory Compression
+@direntry
+* Plzip: (plzip). Massively parallel implementation of lzip
+@end direntry
+
+
+@ifnothtml
+@titlepage
+@title Plzip
+@subtitle Massively parallel implementation of lzip
+@subtitle for Plzip version @value{VERSION}, @value{UPDATED}
+@author by Antonio Diaz Diaz
+
+@page
+@vskip 0pt plus 1filll
+@end titlepage
+
+@contents
+@end ifnothtml
+
+@ifnottex
+@node Top
+@top
+
+This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
+
+@menu
+* Introduction:: Purpose and features of plzip
+* Output:: Meaning of plzip's output
+* Invoking plzip:: Command-line interface
+* Program design:: Internal structure of plzip
+* Memory requirements:: Memory required to compress and decompress
+* Minimum file sizes:: Minimum file sizes required for full speed
+* File format:: Detailed format of the compressed file
+* Trailing data:: Extra data appended to the file
+* Examples:: A small tutorial with examples
+* Problems:: Reporting bugs
+* Concept index:: Index of concepts
+@end menu
+
+@sp 1
+Copyright @copyright{} 2009-2024 Antonio Diaz Diaz.
+
+This manual is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
+@end ifnottex
+
+
+@node Introduction
+@chapter Introduction
+@cindex introduction
+
+@uref{http://www.nongnu.org/lzip/plzip.html,,Plzip}
+is a massively parallel (multi-threaded) implementation of lzip,
+compatible with lzip 1.4 or newer. Plzip uses the compression library
+@uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
+
+@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip}
+is a lossless data compressor with a user interface similar to the one
+of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
+chain-Algorithm' (LZMA) stream format to maximize interoperability. The
+maximum dictionary size is 512 MiB so that any lzip file can be decompressed
+on 32-bit machines. Lzip provides accurate and robust 3-factor integrity
+checking. Lzip can compress about as fast as gzip @w{(lzip -0)} or compress most
+files more than bzip2 @w{(lzip -9)}. Decompression speed is intermediate between
+gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery
+perspective. Lzip has been designed, written, and tested with great care to
+replace gzip and bzip2 as the standard general-purpose compressed format for
+Unix-like systems.
+
+Plzip can compress/decompress large files on multiprocessor machines much
+faster than lzip, at the cost of a slightly reduced compression ratio (0.4
+to 2 percent larger compressed files). Note that the number of usable
+threads is limited by file size; on files larger than a few GB plzip can use
+hundreds of processors, but on files of only a few MB plzip is no faster
+than lzip. @xref{Minimum file sizes}.
+
+For creation and manipulation of compressed tar archives
+@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be more
+efficient than using tar and plzip because tarlz is able to keep the
+alignment between tar members and lzip members.
+@ifnothtml
+@xref{Top,tarlz manual,,tarlz}.
+@end ifnothtml
+
+The lzip file format is designed for data sharing and long-term archiving,
+taking into account both data integrity and decoder availability:
+
+@itemize @bullet
+@item
+The lzip format provides very safe integrity checking and some data
+recovery means. The program
+@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
+can repair bit flip errors (one of the most common forms of data corruption)
+in lzip files, and provides data recovery capabilities, including
+error-checked merging of damaged copies of a file.
+@ifnothtml
+@xref{Data safety,,,lziprecover}.
+@end ifnothtml
+
+@item
+The lzip format is as simple as possible (but not simpler). The lzip
+manual provides the source code of a simple decompressor along with a
+detailed explanation of how it works, so that with the only help of the
+lzip manual it would be possible for a digital archaeologist to extract
+the data from a lzip file long after quantum computers eventually
+render LZMA obsolete.
+
+@item
+Additionally the lzip reference implementation is copylefted, which
+guarantees that it will remain free forever.
+@end itemize
+
+A nice feature of the lzip format is that a corrupt byte is easier to repair
+the nearer it is from the beginning of the file. Therefore, with the help of
+lziprecover, losing an entire archive just because of a corrupt byte near
+the beginning is a thing of the past.
+
+Plzip uses the same well-defined exit status values used by lzip, which
+makes it safer than compressors returning ambiguous warning values (like
+gzip) when it is used as a back end for other programs like tar or zutils.
+
+Plzip automatically uses for each file the largest dictionary size that does
+not exceed neither the file size nor the limit given. Keep in mind that the
+decompression memory requirement is affected at compression time by the
+choice of dictionary size limit. @xref{Memory requirements}.
+
+When compressing, plzip replaces every file given in the command line
+with a compressed version of itself, with the name "original_name.lz".
+When decompressing, plzip attempts to guess the name for the decompressed
+file from that of the compressed file as follows:
+
+@multitable {anyothername} {becomes} {anyothername.out}
+@item filename.lz @tab becomes @tab filename
+@item filename.tlz @tab becomes @tab filename.tar
+@item anyothername @tab becomes @tab anyothername.out
+@end multitable
+
+(De)compressing a file is much like copying or moving it. Therefore plzip
+preserves the access and modification dates, permissions, and, if you have
+appropriate privileges, ownership of the file just as @w{@samp{cp -p}} does.
+(If the user ID or the group ID can't be duplicated, the file permission
+bits S_ISUID and S_ISGID are cleared).
+
+Plzip is able to read from some types of non-regular files if either the
+option @option{-c} or the option @option{-o} is specified.
+
+Plzip refuses to read compressed data from a terminal or write compressed
+data to a terminal, as this would be entirely incomprehensible and might
+leave the terminal in an abnormal state.
+
+Plzip correctly decompresses a file which is the concatenation of two or
+more compressed files. The result is the concatenation of the corresponding
+decompressed files. Integrity testing of concatenated compressed files is
+also supported.
+
+
+@node Output
+@chapter Meaning of plzip's output
+@cindex output
+
+The output of plzip looks like this:
+
+@example
+plzip -v foo
+ foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
+
+plzip -tvvv foo.lz
+ foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok
+@end example
+
+The meaning of each field is as follows:
+
+@table @code
+@item N:1
+The compression ratio @w{(uncompressed_size / compressed_size)}, shown as
+@w{N to 1}.
+
+@item ratio
+The inverse compression ratio @w{(compressed_size / uncompressed_size)},
+shown as a percentage. A decimal ratio is easily obtained by moving the
+decimal point two places to the left; @w{14.98% = 0.1498}.
+
+@item saved
+The space saved by compression @w{(1 - ratio)}, shown as a percentage.
+
+@item in
+Size of the input data. This is the uncompressed size when compressing, or
+the compressed size when decompressing or testing. Note that plzip always
+prints the uncompressed size before the compressed size when compressing,
+decompressing, testing, or listing.
+
+@item out
+Size of the output data. This is the compressed size when compressing, or
+the decompressed size when decompressing or testing.
+
+@end table
+
+When decompressing or testing at verbosity level 4 (-vvvv), the dictionary
+size used to compress the file is also shown.
+
+LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
+been compressed. Decompressed is used to refer to data which have undergone
+the process of decompression.
+
+
+@node Invoking plzip
+@chapter Invoking plzip
+@cindex invoking
+@cindex options
+@cindex usage
+@cindex version
+
+The format for running plzip is:
+
+@example
+plzip [@var{options}] [@var{files}]
+@end example
+
+@noindent
+If no file names are specified, plzip compresses (or decompresses) from
+standard input to standard output. A hyphen @samp{-} used as a @var{file}
+argument means standard input. It can be mixed with other @var{files} and is
+read just once, the first time it appears in the command line. Remember to
+prepend @file{./} to any file name beginning with a hyphen, or use @samp{--}.
+
+plzip supports the following
+@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}:
+@ifnothtml
+@xref{Argument syntax,,,arg_parser}.
+@end ifnothtml
+
+@table @code
+@item -h
+@itemx --help
+Print an informative help message describing the options and exit.
+
+@item -V
+@itemx --version
+Print the version number of plzip on the standard output and exit.
+This version number should be included in all bug reports.
+
+@anchor{--trailing-error}
+@item -a
+@itemx --trailing-error
+Exit with error status 2 if any remaining input is detected after
+decompressing the last member. Such remaining input is usually trailing
+garbage that can be safely ignored. @xref{concat-example}.
+
+@anchor{--data-size}
+@item -B @var{bytes}
+@itemx --data-size=@var{bytes}
+When compressing, set the size in bytes of the input data blocks. The input
+file is divided in chunks of this size before compression is performed.
+Valid values range from @w{8 KiB} to @w{1 GiB}. Default value is two times
+the dictionary size, except for option @option{-0} where it defaults to
+@w{1 MiB}. Plzip reduces the dictionary size if it is larger than the data
+size specified. @xref{Minimum file sizes}.
+
+@item -c
+@itemx --stdout
+Compress or decompress to standard output; keep input files unchanged. If
+compressing several files, each file is compressed independently. (The
+output consists of a sequence of independently compressed members). This
+option (or @option{-o}) is needed when reading from a named pipe (fifo) or
+from a device. Use @w{@samp{lziprecover -cd -i}} to recover as much of the
+decompressed data as possible when decompressing a corrupt file. @option{-c}
+overrides @option{-o}. @option{-c} has no effect when testing or listing.
+
+@item -d
+@itemx --decompress
+Decompress the files specified. The integrity of the files specified is
+checked. If a file does not exist, can't be opened, or the destination file
+already exists and @option{--force} has not been specified, plzip continues
+decompressing the rest of the files and exits with error status 1. If a file
+fails to decompress, or is a terminal, plzip exits immediately with error
+status 2 without decompressing the rest of the files. A terminal is
+considered an uncompressed file, and therefore invalid.
+
+@item -f
+@itemx --force
+Force overwrite of output files.
+
+@item -F
+@itemx --recompress
+When compressing, force re-compression of files whose name already has
+the @samp{.lz} or @samp{.tlz} suffix.
+
+@item -k
+@itemx --keep
+Keep (don't delete) input files during compression or decompression.
+
+@item -l
+@itemx --list
+Print the uncompressed size, compressed size, and percentage saved of the
+files specified. Trailing data are ignored. The values produced are correct
+even for multimember files. If more than one file is given, a final line
+containing the cumulative sizes is printed. With @option{-v}, the dictionary
+size, the number of members in the file, and the amount of trailing data (if
+any) are also printed. With @option{-vv}, the positions and sizes of each
+member in multimember files are also printed.
+
+If any file is damaged, does not exist, can't be opened, or is not regular,
+the final exit status is @w{> 0}. @option{-lq} can be used to check quickly
+(without decompressing) the structural integrity of the files specified.
+(Use @option{--test} to check the data integrity). @option{-alq}
+additionally checks that none of the files specified contain trailing data.
+
+@item -m @var{bytes}
+@itemx --match-length=@var{bytes}
+When compressing, set the match length limit in bytes. After a match this
+long is found, the search is finished. Valid values range from 5 to 273.
+Larger values usually give better compression ratios but longer compression
+times.
+
+@item -n @var{n}
+@itemx --threads=@var{n}
+Set the maximum number of worker threads, overriding the system's default.
+Valid values range from 1 to "as many as your system can support". If this
+option is not used, plzip tries to detect the number of processors in the
+system and use it as default value. When compressing on a @w{32 bit} system,
+plzip tries to limit the memory use to under @w{2.22 GiB} (4 worker threads
+at level -9) by reducing the number of threads below the system's default.
+@w{@samp{plzip --help}} shows the system's default value.
+
+Plzip starts the number of threads required by each file without exceeding
+the value specified. Note that the number of usable threads is limited to
+@w{ceil( file_size / data_size )} during compression (@pxref{Minimum file
+sizes}), and to the number of members in the input during decompression. You
+can find the number of members in a lzip file by running
+@w{@samp{plzip -lv file.lz}}.
+
+@item -o @var{file}
+@itemx --output=@var{file}
+If @option{-c} has not been also specified, write the (de)compressed output
+to @var{file}, automatically creating any missing parent directories; keep
+input files unchanged. If compressing several files, each file is compressed
+independently. (The output consists of a sequence of independently
+compressed members). This option (or @option{-c}) is needed when reading
+from a named pipe (fifo) or from a device. @w{@option{-o -}} is equivalent
+to @option{-c}. @option{-o} has no effect when testing or listing.
+
+In order to keep backward compatibility with plzip versions prior to 1.9,
+when compressing from standard input and no other file names are given, the
+extension @samp{.lz} is appended to @var{file} unless it already ends in
+@samp{.lz} or @samp{.tlz}. This feature will be removed in a future version
+of plzip. Meanwhile, redirection may be used instead of @option{-o} to write
+the compressed output to a file without the extension @samp{.lz} in its
+name: @w{@samp{plzip < file > foo}}.
+
+@item -q
+@itemx --quiet
+Quiet operation. Suppress all messages.
+
+@item -s @var{bytes}
+@itemx --dictionary-size=@var{bytes}
+When compressing, set the dictionary size limit in bytes. Plzip uses for
+each file the largest dictionary size that does not exceed neither the file
+size nor this limit. Valid values range from @w{4 KiB} to @w{512 MiB}.
+Values 12 to 29 are interpreted as powers of two, meaning 2^12 to 2^29
+bytes. Dictionary sizes are quantized so that they can be coded in just one
+byte (@pxref{coded-dict-size}). If the size specified does not match one of
+the valid sizes, it is rounded upwards by adding up to @w{(@var{bytes} / 8)}
+to it.
+
+For maximum compression you should use a dictionary size limit as large
+as possible, but keep in mind that the decompression memory requirement
+is affected at compression time by the choice of dictionary size limit.
+
+@item -t
+@itemx --test
+Check integrity of the files specified, but don't decompress them. This
+really performs a trial decompression and throws away the result. Use it
+together with @option{-v} to see information about the files. If a file
+fails the test, does not exist, can't be opened, or is a terminal, plzip
+continues testing the rest of the files. A final diagnostic is shown at
+verbosity level 1 or higher if any file fails the test when testing multiple
+files.
+
+@item -v
+@itemx --verbose
+Verbose mode.@*
+When compressing, show the compression ratio and size for each file
+processed.@*
+When decompressing or testing, further -v's (up to 4) increase the
+verbosity level, showing status, compression ratio, dictionary size,
+decompressed size, and compressed size.@*
+Two or more @option{-v} options show the progress of (de)compression,
+except for single-member files.
+
+@item -0 .. -9
+Compression level. Set the compression parameters (dictionary size and
+match length limit) as shown in the table below. The default compression
+level is @option{-6}, equivalent to @w{@option{-s8MiB -m36}}. Note that
+@option{-9} can be much slower than @option{-0}. These options have no
+effect when decompressing, testing, or listing.
+
+The bidimensional parameter space of LZMA can't be mapped to a linear scale
+optimal for all files. If your files are large, very repetitive, etc, you
+may need to use the options @option{--dictionary-size} and
+@option{--match-length} directly to achieve optimal performance.
+
+If several compression levels or @option{-s} or @option{-m} options are
+given, the last setting is used. For example @w{@option{-9 -s64MiB}} is
+equivalent to @w{@option{-s64MiB -m273}}
+
+@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)}
+@item Level @tab Dictionary size (-s) @tab Match length limit (-m)
+@item -0 @tab 64 KiB @tab 16 bytes
+@item -1 @tab 1 MiB @tab 5 bytes
+@item -2 @tab 1.5 MiB @tab 6 bytes
+@item -3 @tab 2 MiB @tab 8 bytes
+@item -4 @tab 3 MiB @tab 12 bytes
+@item -5 @tab 4 MiB @tab 20 bytes
+@item -6 @tab 8 MiB @tab 36 bytes
+@item -7 @tab 16 MiB @tab 68 bytes
+@item -8 @tab 24 MiB @tab 132 bytes
+@item -9 @tab 32 MiB @tab 273 bytes
+@end multitable
+
+@item --fast
+@itemx --best
+Aliases for GNU gzip compatibility.
+
+@item --loose-trailing
+When decompressing, testing, or listing, allow trailing data whose first
+bytes are so similar to the magic bytes of a lzip header that they can
+be confused with a corrupt header. Use this option if a file triggers a
+"corrupt header" error and the cause is not indeed a corrupt header.
+
+@item --in-slots=@var{n}
+Number of @w{1 MiB} input packets buffered per worker thread when
+decompressing from non-seekable input. Increasing the number of packets
+may increase decompression speed, but requires more memory. Valid values
+range from 1 to 64. The default value is 4.
+
+@item --out-slots=@var{n}
+Number of @w{1 MiB} output packets buffered per worker thread when
+decompressing to non-seekable output. Increasing the number of packets
+may increase decompression speed, but requires more memory. Valid values
+range from 1 to 1024. The default value is 64.
+
+@item --check-lib
+Compare the
+@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib}
+used to compile plzip with the version actually being used at run time and
+exit. Report any differences found. Exit with error status 1 if differences
+are found. A mismatch may indicate that lzlib is not correctly installed or
+that a different version of lzlib has been installed after compiling plzip.
+Exit with error status 2 if LZ_API_VERSION and LZ_version_string don't
+match. @w{@samp{plzip -v --check-lib}} shows the version of lzlib being used
+and the value of LZ_API_VERSION (if defined).
+@ifnothtml
+@xref{Library version,,,lzlib}.
+@end ifnothtml
+
+@end table
+
+Numbers given as arguments to options may be expressed in decimal,
+hexadecimal, or octal (using the same syntax as integer constants in C++),
+and may be followed by a multiplier and an optional @samp{B} for "byte".
+
+Table of SI and binary prefixes (unit multipliers):
+
+@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)}
+@item Prefix @tab Value @tab | @tab Prefix @tab Value
+@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024)
+@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20)
+@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30)
+@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40)
+@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50)
+@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60)
+@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70)
+@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80)
+@item R @tab ronnabyte (10^27) @tab | @tab Ri @tab robibyte (2^90)
+@item Q @tab quettabyte (10^30) @tab | @tab Qi @tab quebibyte (2^100)
+@end multitable
+
+@sp 1
+Exit status: 0 for a normal exit, 1 for environmental problems
+(file not found, invalid command-line options, I/O errors, etc), 2 to
+indicate a corrupt or invalid input file, 3 for an internal consistency
+error (e.g., bug) which caused plzip to panic.
+
+
+@node Program design
+@chapter Internal structure of plzip
+@cindex program design
+
+When compressing, plzip divides the input file into chunks and compresses as
+many chunks simultaneously as worker threads are chosen, creating a
+multimember compressed file. Each chunk is compressed in-place (using the
+same buffer for input and output), reducing the amount of RAM required.
+
+When decompressing, plzip decompresses as many members simultaneously as
+worker threads are chosen. Files that were compressed with lzip are not
+decompressed faster than using lzip (unless the option @option{-b} was used)
+because lzip usually produces single-member files, which can't be
+decompressed in parallel.
+
+For each input file, a splitter thread and several worker threads are
+created, acting the main thread as muxer (multiplexer) thread. A "packet
+courier" takes care of data transfers among threads and limits the
+maximum number of data blocks (packets) being processed simultaneously.
+
+The splitter reads data blocks from the input file, and distributes them
+to the workers. The workers (de)compress the blocks received from the
+splitter. The muxer collects processed packets from the workers, and
+writes them to the output file.
+
+@verbatim
+ .------------.
+ ,-->| worker 0 |--,
+ | `------------' |
+.-------. .----------. | .------------. | .-------. .--------.
+| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output |
+| file | `----------' | `------------' | `-------' | file |
+`-------' | ... | `--------'
+ | .------------. |
+ `-->| worker N-1 |--'
+ `------------'
+@end verbatim
+
+When decompressing from a regular file, the splitter is removed and the
+workers read directly from the input file. If the output file is also a
+regular file, the muxer is also removed and the workers write directly
+to the output file. With these optimizations, the use of RAM is greatly
+reduced and the decompression speed of large files with many members is
+only limited by the number of processors available and by I/O speed.
+
+
+@node Memory requirements
+@chapter Memory required to compress and decompress
+@cindex memory requirements
+
+The amount of memory required @strong{per worker thread} for decompression
+or testing is approximately the following:
+
+@itemize @bullet
+@item
+For decompression of a regular (seekable) file to another regular file,
+or for testing of a regular file; the dictionary size.
+
+@item
+For testing of a non-seekable file or of standard input; the dictionary
+size plus @w{1 MiB} plus up to the number of @w{1 MiB} input packets
+buffered (4 by default).
+
+@item
+For decompression of a regular file to a non-seekable file or to
+standard output; the dictionary size plus up to the number of @w{1 MiB}
+output packets buffered (64 by default).
+
+@item
+For decompression of a non-seekable file or of standard input; the
+dictionary size plus @w{1 MiB} plus up to the number of @w{1 MiB} input
+and output packets buffered (68 by default).
+@end itemize
+
+@noindent
+The amount of memory required @strong{per worker thread} for compression
+is approximately the following:
+
+@itemize @bullet
+@item
+For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size
+(@pxref{--data-size}). Default is @w{4.875 MiB}.
+
+@item
+For compression at other levels; 11 times the dictionary size plus 3.375
+times the data size. Default is @w{142 MiB}.
+@end itemize
+
+@noindent
+The following table shows the memory required @strong{per thread} for
+compression at a given level, using the default data size for each level:
+
+@multitable {Level} {Memory required}
+@item Level @tab Memory required
+@item -0 @tab 4.875 MiB
+@item -1 @tab 17.75 MiB
+@item -2 @tab 26.625 MiB
+@item -3 @tab 35.5 MiB
+@item -4 @tab 53.25 MiB
+@item -5 @tab 71 MiB
+@item -6 @tab 142 MiB
+@item -7 @tab 284 MiB
+@item -8 @tab 426 MiB
+@item -9 @tab 568 MiB
+@end multitable
+
+
+@node Minimum file sizes
+@chapter Minimum file sizes required for full compression speed
+@cindex minimum file sizes
+
+When compressing, plzip divides the input file into chunks and
+compresses as many chunks simultaneously as worker threads are chosen,
+creating a multimember compressed file.
+
+For this to work as expected (and roughly multiply the compression speed by
+the number of available processors), the uncompressed file must be at least
+as large as the number of worker threads times the chunk size
+(@pxref{--data-size}). Else some processors do not get any data to compress,
+and compression is proportionally slower. The maximum speed increase
+achievable on a given file is limited by the ratio
+@w{(file_size / data_size)}. For example, a tarball the size of gcc or linux
+scales up to 10 or 14 processors at level -9.
+
+The following table shows the minimum uncompressed file size needed for
+full use of N processors at a given compression level, using the default
+data size for each level:
+
+@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB}
+@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256
+@item Level
+@item -0 @tab 2 MiB @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 64 MiB @tab 256 MiB
+@item -1 @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB @tab 512 MiB
+@item -2 @tab 6 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB @tab 768 MiB
+@item -3 @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB @tab 1 GiB
+@item -4 @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB @tab 1.5 GiB
+@item -5 @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB @tab 2 GiB
+@item -6 @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB @tab 4 GiB
+@item -7 @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB @tab 8 GiB
+@item -8 @tab 96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB @tab 12 GiB
+@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB @tab 16 GiB
+@end multitable
+
+
+@node File format
+@chapter File format
+@cindex file format
+
+Perfection is reached, not when there is no longer anything to add, but
+when there is no longer anything to take away.@*
+--- Antoine de Saint-Exupery
+
+@sp 1
+In the diagram below, a box like this:
+
+@verbatim
++---+
+| | <-- the vertical bars might be missing
++---+
+@end verbatim
+
+represents one byte; a box like this:
+
+@verbatim
++==============+
+| |
++==============+
+@end verbatim
+
+represents a variable number of bytes.
+
+@sp 1
+A lzip file consists of one or more independent "members" (compressed data
+sets). The members simply appear one after another in the file, with no
+additional information before, between, or after them. Each member can
+encode in compressed form up to @w{16 EiB - 1 byte} of uncompressed data.
+The size of a multimember file is unlimited.
+
+Each member has the following structure:
+
+@verbatim
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size |
++--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+@end verbatim
+
+All multibyte values are stored in little endian order.
+
+@table @samp
+@item ID string (the "magic" bytes)
+A four byte string, identifying the lzip format, with the value "LZIP"
+(0x4C, 0x5A, 0x49, 0x50).
+
+@item VN (version number, 1 byte)
+Just in case something needs to be modified in the future. 1 for now.
+
+@anchor{coded-dict-size}
+@item DS (coded dictionary size, 1 byte)
+The dictionary size is calculated by taking a power of 2 (the base size)
+and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
+Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
+Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
+from the base size to obtain the dictionary size.@*
+Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
+Valid values for dictionary size range from 4 KiB to 512 MiB.
+
+@item LZMA stream
+The LZMA stream, finished by an "End Of Stream" marker. Uses default values
+for encoder properties.
+@ifnothtml
+@xref{Stream format,,,lzip},
+@end ifnothtml
+@ifhtml
+See
+@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format}
+@end ifhtml
+for a complete description.
+
+@item CRC32 (4 bytes)
+Cyclic Redundancy Check (CRC) of the original uncompressed data.
+
+@item Data size (8 bytes)
+Size of the original uncompressed data.
+
+@item Member size (8 bytes)
+Total size of the member, including header and trailer. This field acts
+as a distributed index, improves the checking of stream integrity, and
+facilitates the safe recovery of undamaged members from multimember files.
+Lzip limits the member size to @w{2 PiB} to prevent the data size field from
+overflowing.
+
+@end table
+
+
+@node Trailing data
+@chapter Extra data appended to the file
+@cindex trailing data
+
+Sometimes extra data are found appended to a lzip file after the last
+member. Such trailing data may be:
+
+@itemize @bullet
+@item
+Padding added to make the file size a multiple of some block size, for
+example when writing to a tape. It is safe to append any amount of
+padding zero bytes to a lzip file.
+
+@item
+Useful data added by the user; an "End Of File" string (to check that the
+file has not been truncated), a cryptographically secure hash, a description
+of file contents, etc. It is safe to append any amount of text to a lzip
+file as long as none of the first four bytes of the text matches the
+corresponding byte in the string "LZIP", and the text does not contain any
+zero bytes (null characters). Nonzero bytes and zero bytes can't be safely
+mixed in trailing data.
+
+@item
+Garbage added by some not totally successful copy operation.
+
+@item
+Malicious data added to the file in order to make its total size and
+hash value (for a chosen hash) coincide with those of another file.
+
+@item
+In rare cases, trailing data could be the corrupt header of another
+member. In multimember or concatenated files the probability of
+corruption happening in the magic bytes is 5 times smaller than the
+probability of getting a false positive caused by the corruption of the
+integrity information itself. Therefore it can be considered to be below
+the noise level. Additionally, the test used by plzip to discriminate
+trailing data from a corrupt header has a Hamming distance (HD) of 3,
+and the 3 bit flips must happen in different magic bytes for the test to
+fail. In any case, the option @option{--trailing-error} guarantees that
+any corrupt header is detected.
+@end itemize
+
+Trailing data are in no way part of the lzip file format, but tools
+reading lzip files are expected to behave as correctly and usefully as
+possible in the presence of trailing data.
+
+Trailing data can be safely ignored in most cases. In some cases, like
+that of user-added data, they are expected to be ignored. In those cases
+where a file containing trailing data must be rejected, the option
+@option{--trailing-error} can be used. @xref{--trailing-error}.
+
+
+@node Examples
+@chapter A small tutorial with examples
+@cindex examples
+
+WARNING! Even if plzip is bug-free, other causes may result in a corrupt
+compressed file (bugs in the system libraries, memory errors, etc).
+Therefore, if the data you are going to compress are important, give the
+option @option{--keep} to plzip and don't remove the original file until you
+check the compressed file with a command like
+@w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during
+compression can only be detected by comparing the compressed file with the
+original because the corruption happens before plzip compresses the RAM
+contents, resulting in a valid compressed file containing wrong data.
+
+@sp 1
+@noindent
+Example 1: Extract all the files from archive @samp{foo.tar.lz}.
+
+@example
+ tar -xf foo.tar.lz
+or
+ plzip -cd foo.tar.lz | tar -xf -
+@end example
+
+@sp 1
+@noindent
+Example 2: Replace a regular file with its compressed version @samp{file.lz}
+and show the compression ratio.
+
+@example
+plzip -v file
+@end example
+
+@sp 1
+@noindent
+Example 3: Like example 2 but the created @samp{file.lz} has a block size of
+@w{1 MiB}. The compression ratio is not shown.
+
+@example
+plzip -B 1MiB file
+@end example
+
+@sp 1
+@noindent
+Example 4: Restore a regular file from its compressed version
+@samp{file.lz}. If the operation is successful, @samp{file.lz} is removed.
+
+@example
+plzip -d file.lz
+@end example
+
+@sp 1
+@noindent
+Example 5: Check the integrity of the compressed file @samp{file.lz} and
+show status.
+
+@example
+plzip -tv file.lz
+@end example
+
+@sp 1
+@anchor{concat-example}
+@noindent
+Example 6: The right way of concatenating the decompressed output of two or
+more compressed files. @xref{Trailing data}.
+
+@example
+Don't do this
+ cat file1.lz file2.lz file3.lz | plzip -d -
+Do this instead
+ plzip -cd file1.lz file2.lz file3.lz
+@end example
+
+@sp 1
+@noindent
+Example 7: Decompress @samp{file.lz} partially until @w{10 KiB} of
+decompressed data are produced.
+
+@example
+plzip -cd file.lz | dd bs=1024 count=10
+@end example
+
+@sp 1
+@noindent
+Example 8: Decompress @samp{file.lz} partially from decompressed byte at
+offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced).
+
+@example
+plzip -cd file.lz | dd bs=1000 skip=10 count=5
+@end example
+
+@sp 1
+@noindent
+Example 9: Compress a whole device in /dev/sdc and send the output to
+@samp{file.lz}.
+
+@example
+ plzip -c /dev/sdc > file.lz
+or
+ plzip /dev/sdc -o file.lz
+@end example
+
+
+@node Problems
+@chapter Reporting bugs
+@cindex bugs
+@cindex getting help
+
+There are probably bugs in plzip. There are certainly errors and
+omissions in this manual. If you report them, they will get fixed. If
+you don't, no one will ever know about them and they will remain unfixed
+for all eternity, if not longer.
+
+If you find a bug in plzip, please send electronic mail to
+@email{lzip-bug@@nongnu.org}. Include the version number, which you can
+find by running @w{@samp{plzip --version}} and
+@w{@samp{plzip -v --check-lib}}.
+
+
+@node Concept index
+@unnumbered Concept index
+
+@printindex cp
+
+@bye
diff --git a/list.cc b/list.cc
new file mode 100644
index 0000000..4b212fc
--- /dev/null
+++ b/list.cc
@@ -0,0 +1,114 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _FILE_OFFSET_BITS 64
+
+#include <cstdio>
+#include <cstring>
+#include <string>
+#include <vector>
+#include <stdint.h>
+#include <unistd.h>
+#include <sys/stat.h>
+
+#include "lzip.h"
+#include "lzip_index.h"
+
+
+namespace {
+
+void list_line( const unsigned long long uncomp_size,
+ const unsigned long long comp_size,
+ const char * const input_filename )
+ {
+ if( uncomp_size > 0 )
+ std::printf( "%14llu %14llu %6.2f%% %s\n", uncomp_size, comp_size,
+ 100.0 - ( ( 100.0 * comp_size ) / uncomp_size ),
+ input_filename );
+ else
+ std::printf( "%14llu %14llu -INF%% %s\n", uncomp_size, comp_size,
+ input_filename );
+ }
+
+} // end namespace
+
+
+int list_files( const std::vector< std::string > & filenames,
+ const Cl_options & cl_opts )
+ {
+ unsigned long long total_comp = 0, total_uncomp = 0;
+ int files = 0, retval = 0;
+ bool first_post = true;
+ bool stdin_used = false;
+
+ for( unsigned i = 0; i < filenames.size(); ++i )
+ {
+ const bool from_stdin = ( filenames[i] == "-" );
+ if( from_stdin ) { if( stdin_used ) continue; else stdin_used = true; }
+ const char * const input_filename =
+ from_stdin ? "(stdin)" : filenames[i].c_str();
+ struct stat in_stats; // not used
+ const int infd = from_stdin ? STDIN_FILENO :
+ open_instream( input_filename, &in_stats, false, true );
+ if( infd < 0 ) { set_retval( retval, 1 ); continue; }
+
+ const Lzip_index lzip_index( infd, cl_opts );
+ close( infd );
+ if( lzip_index.retval() != 0 )
+ {
+ show_file_error( input_filename, lzip_index.error().c_str() );
+ set_retval( retval, lzip_index.retval() );
+ continue;
+ }
+ if( verbosity < 0 ) continue;
+ const unsigned long long udata_size = lzip_index.udata_size();
+ const unsigned long long cdata_size = lzip_index.cdata_size();
+ total_comp += cdata_size; total_uncomp += udata_size; ++files;
+ const long members = lzip_index.members();
+ if( first_post )
+ {
+ first_post = false;
+ if( verbosity >= 1 ) std::fputs( " dict memb trail ", stdout );
+ std::fputs( " uncompressed compressed saved name\n", stdout );
+ }
+ if( verbosity >= 1 )
+ std::printf( "%s %5ld %6lld ", format_ds( lzip_index.dictionary_size() ),
+ members, lzip_index.file_size() - cdata_size );
+ list_line( udata_size, cdata_size, input_filename );
+
+ if( verbosity >= 2 && members > 1 )
+ {
+ std::fputs( " member data_pos data_size member_pos member_size\n", stdout );
+ for( long i = 0; i < members; ++i )
+ {
+ const Block & db = lzip_index.dblock( i );
+ const Block & mb = lzip_index.mblock( i );
+ std::printf( "%6ld %14llu %14llu %14llu %14llu\n",
+ i + 1, db.pos(), db.size(), mb.pos(), mb.size() );
+ }
+ first_post = true; // reprint heading after list of members
+ }
+ std::fflush( stdout );
+ }
+ if( verbosity >= 0 && files > 1 )
+ {
+ if( verbosity >= 1 ) std::fputs( " ", stdout );
+ list_line( total_uncomp, total_comp, "(totals)" );
+ std::fflush( stdout );
+ }
+ return retval;
+ }
diff --git a/lzip.h b/lzip.h
new file mode 100644
index 0000000..8007fbf
--- /dev/null
+++ b/lzip.h
@@ -0,0 +1,340 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <pthread.h>
+
+enum {
+ min_dictionary_bits = 12,
+ min_dictionary_size = 1 << min_dictionary_bits, // >= modeled_distances
+ max_dictionary_bits = 29,
+ max_dictionary_size = 1 << max_dictionary_bits,
+ min_member_size = 36 };
+
+
+// defined in main.cc
+extern int verbosity;
+
+class Pretty_print // requires global var 'int verbosity'
+ {
+ std::string name_;
+ std::string padded_name;
+ const char * const stdin_name;
+ unsigned longest_name;
+ mutable bool first_post;
+
+public:
+ Pretty_print( const std::vector< std::string > & filenames )
+ : stdin_name( "(stdin)" ), longest_name( 0 ), first_post( false )
+ {
+ if( verbosity <= 0 ) return;
+ const unsigned stdin_name_len = std::strlen( stdin_name );
+ for( unsigned i = 0; i < filenames.size(); ++i )
+ {
+ const std::string & s = filenames[i];
+ const unsigned len = ( s == "-" ) ? stdin_name_len : s.size();
+ if( longest_name < len ) longest_name = len;
+ }
+ if( longest_name == 0 ) longest_name = stdin_name_len;
+ }
+
+ void set_name( const std::string & filename )
+ {
+ if( filename.size() && filename != "-" ) name_ = filename;
+ else name_ = stdin_name;
+ padded_name = " "; padded_name += name_; padded_name += ": ";
+ if( longest_name > name_.size() )
+ padded_name.append( longest_name - name_.size(), ' ' );
+ first_post = true;
+ }
+
+ void reset() const { if( name_.size() ) first_post = true; }
+ const char * name() const { return name_.c_str(); }
+ void operator()( const char * const msg = 0 ) const;
+ };
+
+
+inline bool isvalid_ds( const unsigned dictionary_size )
+ { return dictionary_size >= min_dictionary_size &&
+ dictionary_size <= max_dictionary_size; }
+
+
+inline int real_bits( unsigned value )
+ {
+ int bits = 0;
+ while( value > 0 ) { value >>= 1; ++bits; }
+ return bits;
+ }
+
+
+const uint8_t lzip_magic[4] = { 0x4C, 0x5A, 0x49, 0x50 }; // "LZIP"
+
+struct Lzip_header
+ {
+ enum { size = 6 };
+ uint8_t data[size]; // 0-3 magic bytes
+ // 4 version
+ // 5 coded dictionary size
+
+ void set_magic() { std::memcpy( data, lzip_magic, 4 ); data[4] = 1; }
+ bool check_magic() const { return std::memcmp( data, lzip_magic, 4 ) == 0; }
+
+ bool check_prefix( const int sz ) const // detect (truncated) header
+ {
+ for( int i = 0; i < sz && i < 4; ++i )
+ if( data[i] != lzip_magic[i] ) return false;
+ return sz > 0;
+ }
+
+ bool check_corrupt() const // detect corrupt header
+ {
+ int matches = 0;
+ for( int i = 0; i < 4; ++i )
+ if( data[i] == lzip_magic[i] ) ++matches;
+ return matches > 1 && matches < 4;
+ }
+
+ uint8_t version() const { return data[4]; }
+ bool check_version() const { return data[4] == 1; }
+
+ unsigned dictionary_size() const
+ {
+ unsigned sz = 1 << ( data[5] & 0x1F );
+ if( sz > min_dictionary_size )
+ sz -= ( sz / 16 ) * ( ( data[5] >> 5 ) & 7 );
+ return sz;
+ }
+
+ bool dictionary_size( const unsigned sz )
+ {
+ if( !isvalid_ds( sz ) ) return false;
+ data[5] = real_bits( sz - 1 );
+ if( sz > min_dictionary_size )
+ {
+ const unsigned base_size = 1 << data[5];
+ const unsigned fraction = base_size / 16;
+ for( unsigned i = 7; i >= 1; --i )
+ if( base_size - ( i * fraction ) >= sz )
+ { data[5] |= i << 5; break; }
+ }
+ return true;
+ }
+
+ bool check() const
+ { return check_magic() && check_version() &&
+ isvalid_ds( dictionary_size() ); }
+ };
+
+
+struct Lzip_trailer
+ {
+ enum { size = 20 };
+ uint8_t data[size]; // 0-3 CRC32 of the uncompressed data
+ // 4-11 size of the uncompressed data
+ // 12-19 member size including header and trailer
+
+ unsigned data_crc() const
+ {
+ unsigned tmp = 0;
+ for( int i = 3; i >= 0; --i ) { tmp <<= 8; tmp += data[i]; }
+ return tmp;
+ }
+
+ void data_crc( unsigned crc )
+ { for( int i = 0; i <= 3; ++i ) { data[i] = (uint8_t)crc; crc >>= 8; } }
+
+ unsigned long long data_size() const
+ {
+ unsigned long long tmp = 0;
+ for( int i = 11; i >= 4; --i ) { tmp <<= 8; tmp += data[i]; }
+ return tmp;
+ }
+
+ void data_size( unsigned long long sz )
+ { for( int i = 4; i <= 11; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } }
+
+ unsigned long long member_size() const
+ {
+ unsigned long long tmp = 0;
+ for( int i = 19; i >= 12; --i ) { tmp <<= 8; tmp += data[i]; }
+ return tmp;
+ }
+
+ void member_size( unsigned long long sz )
+ { for( int i = 12; i <= 19; ++i ) { data[i] = (uint8_t)sz; sz >>= 8; } }
+
+ bool check_consistency() const // check internal consistency
+ {
+ const unsigned crc = data_crc();
+ const unsigned long long dsize = data_size();
+ if( ( crc == 0 ) != ( dsize == 0 ) ) return false;
+ const unsigned long long msize = member_size();
+ if( msize < min_member_size ) return false;
+ const unsigned long long mlimit = ( 9 * dsize + 7 ) / 8 + min_member_size;
+ if( mlimit > dsize && msize > mlimit ) return false;
+ const unsigned long long dlimit = 7090 * ( msize - 26 ) - 1;
+ if( dlimit > msize && dsize > dlimit ) return false;
+ return true;
+ }
+ };
+
+
+struct Cl_options // command-line options
+ {
+ bool ignore_trailing;
+ bool loose_trailing;
+
+ Cl_options() : ignore_trailing( true ), loose_trailing( false ) {}
+ };
+
+
+inline void set_retval( int & retval, const int new_val )
+ { if( retval < new_val ) retval = new_val; }
+
+const char * const bad_magic_msg = "Bad magic number (file not in lzip format).";
+const char * const bad_dict_msg = "Invalid dictionary size in member header.";
+const char * const corrupt_mm_msg = "Corrupt header in multimember file.";
+const char * const trailing_msg = "Trailing data not allowed.";
+const char * const mem_msg = "Not enough memory.";
+
+// defined in compress.cc
+int readblock( const int fd, uint8_t * const buf, const int size );
+int writeblock( const int fd, const uint8_t * const buf, const int size );
+void xinit_mutex( pthread_mutex_t * const mutex );
+void xinit_cond( pthread_cond_t * const cond );
+void xdestroy_mutex( pthread_mutex_t * const mutex );
+void xdestroy_cond( pthread_cond_t * const cond );
+void xlock( pthread_mutex_t * const mutex );
+void xunlock( pthread_mutex_t * const mutex );
+void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex );
+void xsignal( pthread_cond_t * const cond );
+void xbroadcast( pthread_cond_t * const cond );
+int compress( const unsigned long long cfile_size,
+ const int data_size, const int dictionary_size,
+ const int match_len_limit, const int num_workers,
+ const int infd, const int outfd,
+ const Pretty_print & pp, const int debug_level );
+
+// defined in lzip_index.cc
+class Lzip_index; // forward declaration
+
+// defined in dec_stdout.cc
+int dec_stdout( const int num_workers, const int infd, const int outfd,
+ const Pretty_print & pp, const int debug_level,
+ const int out_slots, const Lzip_index & lzip_index );
+
+// defined in dec_stream.cc
+int dec_stream( const unsigned long long cfile_size, const int num_workers,
+ const int infd, const int outfd, const Cl_options & cl_opts,
+ const Pretty_print & pp, const int debug_level,
+ const int in_slots, const int out_slots );
+
+// defined in decompress.cc
+int preadblock( const int fd, uint8_t * const buf, const int size,
+ const long long pos );
+class Shared_retval;
+void decompress_error( struct LZ_Decoder * const decoder,
+ const Pretty_print & pp,
+ Shared_retval & shared_retval, const int worker_id );
+void show_results( const unsigned long long in_size,
+ const unsigned long long out_size,
+ const unsigned dictionary_size, const bool testing );
+int decompress( const unsigned long long cfile_size, int num_workers,
+ const int infd, const int outfd, const Cl_options & cl_opts,
+ const Pretty_print & pp, const int debug_level,
+ const int in_slots, const int out_slots,
+ const bool infd_isreg, const bool one_to_one );
+
+// defined in list.cc
+int list_files( const std::vector< std::string > & filenames,
+ const Cl_options & cl_opts );
+
+// defined in main.cc
+struct stat;
+const char * bad_version( const unsigned version );
+const char * format_ds( const unsigned dictionary_size );
+void show_header( const unsigned dictionary_size );
+int open_instream( const char * const name, struct stat * const in_statsp,
+ const bool one_to_one, const bool reg_only = false );
+void cleanup_and_fail( const int retval = 1 ); // terminate the program
+void show_error( const char * const msg, const int errcode = 0,
+ const bool help = false );
+void show_file_error( const char * const filename, const char * const msg,
+ const int errcode = 0 );
+void internal_error( const char * const msg );
+void show_progress( const unsigned long long packet_size,
+ const unsigned long long cfile_size = 0,
+ const Pretty_print * const p = 0 );
+
+
+class Slot_tally
+ {
+ const int num_slots; // total slots
+ int num_free; // remaining free slots
+ pthread_mutex_t mutex;
+ pthread_cond_t slot_av; // slot available
+
+ Slot_tally( const Slot_tally & ); // declared as private
+ void operator=( const Slot_tally & ); // declared as private
+
+public:
+ explicit Slot_tally( const int slots )
+ : num_slots( slots ), num_free( slots )
+ { xinit_mutex( &mutex ); xinit_cond( &slot_av ); }
+
+ ~Slot_tally() { xdestroy_cond( &slot_av ); xdestroy_mutex( &mutex ); }
+
+ bool all_free() { return num_free == num_slots; }
+
+ void get_slot() // wait for a free slot
+ {
+ xlock( &mutex );
+ while( num_free <= 0 ) xwait( &slot_av, &mutex );
+ --num_free;
+ xunlock( &mutex );
+ }
+
+ void leave_slot() // return a slot to the tally
+ {
+ xlock( &mutex );
+ if( ++num_free == 1 ) xsignal( &slot_av ); // num_free was 0
+ xunlock( &mutex );
+ }
+ };
+
+
+class Shared_retval // shared return value protected by a mutex
+ {
+ int retval;
+ pthread_mutex_t mutex;
+
+ Shared_retval( const Shared_retval & ); // declared as private
+ void operator=( const Shared_retval & ); // declared as private
+
+public:
+ Shared_retval() : retval( 0 ) { xinit_mutex( &mutex ); }
+
+ bool set_value( const int val ) // only one thread can set retval > 0
+ { // (and print an error message)
+ xlock( &mutex );
+ const bool done = ( retval == 0 && val > 0 );
+ if( done ) retval = val;
+ xunlock( &mutex );
+ return done;
+ }
+
+ int operator()() const { return retval; }
+ };
diff --git a/lzip_index.cc b/lzip_index.cc
new file mode 100644
index 0000000..8d1aa0c
--- /dev/null
+++ b/lzip_index.cc
@@ -0,0 +1,209 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#define _FILE_OFFSET_BITS 64
+
+#include <algorithm>
+#include <cerrno>
+#include <cstdio>
+#include <cstring>
+#include <string>
+#include <vector>
+#include <stdint.h>
+#include <unistd.h>
+
+#include "lzip.h"
+#include "lzip_index.h"
+
+
+namespace {
+
+int seek_read( const int fd, uint8_t * const buf, const int size,
+ const long long pos )
+ {
+ if( lseek( fd, pos, SEEK_SET ) == pos )
+ return readblock( fd, buf, size );
+ return 0;
+ }
+
+} // end namespace
+
+
+bool Lzip_index::check_header( const Lzip_header & header, const bool first )
+ {
+ if( !header.check_magic() )
+ { error_ = bad_magic_msg; retval_ = 2; if( first ) bad_magic_ = true;
+ return false; }
+ if( !header.check_version() )
+ { error_ = bad_version( header.version() ); retval_ = 2; return false; }
+ if( !isvalid_ds( header.dictionary_size() ) )
+ { error_ = bad_dict_msg; retval_ = 2; return false; }
+ return true;
+ }
+
+void Lzip_index::set_errno_error( const char * const msg )
+ {
+ error_ = msg; error_ += std::strerror( errno );
+ retval_ = 1;
+ }
+
+void Lzip_index::set_num_error( const char * const msg, unsigned long long num )
+ {
+ char buf[80];
+ snprintf( buf, sizeof buf, "%s%llu", msg, num );
+ error_ = buf;
+ retval_ = 2;
+ }
+
+
+bool Lzip_index::read_header( const int fd, Lzip_header & header,
+ const long long pos )
+ {
+ if( seek_read( fd, header.data, header.size, pos ) != header.size )
+ { set_errno_error( "Error reading member header: " ); return false; }
+ return true;
+ }
+
+
+// If successful, push last member and set pos to member header.
+bool Lzip_index::skip_trailing_data( const int fd, unsigned long long & pos,
+ const Cl_options & cl_opts )
+ {
+ if( pos < min_member_size ) return false;
+ enum { block_size = 16384,
+ buffer_size = block_size + Lzip_trailer::size - 1 + Lzip_header::size };
+ uint8_t buffer[buffer_size];
+ int bsize = pos % block_size; // total bytes in buffer
+ if( bsize <= buffer_size - block_size ) bsize += block_size;
+ int search_size = bsize; // bytes to search for trailer
+ int rd_size = bsize; // bytes to read from file
+ unsigned long long ipos = pos - rd_size; // aligned to block_size
+
+ while( true )
+ {
+ if( seek_read( fd, buffer, rd_size, ipos ) != rd_size )
+ { set_errno_error( "Error seeking member trailer: " ); return false; }
+ const uint8_t max_msb = ( ipos + search_size ) >> 56;
+ for( int i = search_size; i >= Lzip_trailer::size; --i )
+ if( buffer[i-1] <= max_msb ) // most significant byte of member_size
+ {
+ const Lzip_trailer & trailer =
+ *(const Lzip_trailer *)( buffer + i - trailer.size );
+ const unsigned long long member_size = trailer.member_size();
+ if( member_size == 0 ) // skip trailing zeros
+ { while( i > trailer.size && buffer[i-9] == 0 ) --i; continue; }
+ if( member_size > ipos + i || !trailer.check_consistency() ) continue;
+ Lzip_header header;
+ if( !read_header( fd, header, ipos + i - member_size ) ) return false;
+ if( !header.check() ) continue;
+ const Lzip_header & header2 = *(const Lzip_header *)( buffer + i );
+ const bool full_h2 = bsize - i >= header.size;
+ if( header2.check_prefix( bsize - i ) ) // last member
+ {
+ if( !full_h2 ) error_ = "Last member in input file is truncated.";
+ else if( check_header( header2, false ) )
+ error_ = "Last member in input file is truncated or corrupt.";
+ retval_ = 2; return false;
+ }
+ if( !cl_opts.loose_trailing && full_h2 && header2.check_corrupt() )
+ { error_ = corrupt_mm_msg; retval_ = 2; return false; }
+ if( !cl_opts.ignore_trailing )
+ { error_ = trailing_msg; retval_ = 2; return false; }
+ pos = ipos + i - member_size; // good member
+ const unsigned dictionary_size = header.dictionary_size();
+ if( dictionary_size_ < dictionary_size )
+ dictionary_size_ = dictionary_size;
+ member_vector.push_back( Member( 0, trailer.data_size(), pos,
+ member_size, dictionary_size ) );
+ return true;
+ }
+ if( ipos == 0 )
+ { set_num_error( "Bad trailer at pos ", pos - Lzip_trailer::size );
+ return false; }
+ bsize = buffer_size;
+ search_size = bsize - Lzip_header::size;
+ rd_size = block_size;
+ ipos -= rd_size;
+ std::memcpy( buffer + rd_size, buffer, buffer_size - rd_size );
+ }
+ }
+
+
+Lzip_index::Lzip_index( const int infd, const Cl_options & cl_opts )
+ : insize( lseek( infd, 0, SEEK_END ) ), retval_( 0 ), dictionary_size_( 0 ),
+ bad_magic_( false )
+ {
+ if( insize < 0 )
+ { set_errno_error( "Input file is not seekable: " ); return; }
+ if( insize < min_member_size )
+ { error_ = "Input file is too short."; retval_ = 2; return; }
+ if( insize > INT64_MAX )
+ { error_ = "Input file is too long (2^63 bytes or more).";
+ retval_ = 2; return; }
+
+ Lzip_header header;
+ if( !read_header( infd, header, 0 ) ||
+ !check_header( header, true ) ) return;
+
+ unsigned long long pos = insize; // always points to a header or to EOF
+ while( pos >= min_member_size )
+ {
+ Lzip_trailer trailer;
+ if( seek_read( infd, trailer.data, trailer.size, pos - trailer.size ) !=
+ trailer.size )
+ { set_errno_error( "Error reading member trailer: " ); break; }
+ const unsigned long long member_size = trailer.member_size();
+ if( member_size > pos || !trailer.check_consistency() ) // bad trailer
+ {
+ if( member_vector.empty() )
+ { if( skip_trailing_data( infd, pos, cl_opts ) ) continue; return; }
+ set_num_error( "Bad trailer at pos ", pos - trailer.size ); break;
+ }
+ if( !read_header( infd, header, pos - member_size ) ) break;
+ if( !header.check() ) // bad header
+ {
+ if( member_vector.empty() )
+ { if( skip_trailing_data( infd, pos, cl_opts ) ) continue; return; }
+ set_num_error( "Bad header at pos ", pos - member_size ); break;
+ }
+ pos -= member_size; // good member
+ const unsigned dictionary_size = header.dictionary_size();
+ if( dictionary_size_ < dictionary_size )
+ dictionary_size_ = dictionary_size;
+ member_vector.push_back( Member( 0, trailer.data_size(), pos,
+ member_size, dictionary_size ) );
+ }
+ if( pos != 0 || member_vector.empty() || retval_ != 0 )
+ {
+ member_vector.clear();
+ if( retval_ == 0 ) { error_ = "Can't create file index."; retval_ = 2; }
+ return;
+ }
+ std::reverse( member_vector.begin(), member_vector.end() );
+ for( unsigned long i = 0; ; ++i )
+ {
+ const long long end = member_vector[i].dblock.end();
+ if( end < 0 || end > INT64_MAX )
+ {
+ member_vector.clear();
+ error_ = "Data in input file is too long (2^63 bytes or more).";
+ retval_ = 2; return;
+ }
+ if( i + 1 >= member_vector.size() ) break;
+ member_vector[i+1].dblock.pos( end );
+ }
+ }
diff --git a/lzip_index.h b/lzip_index.h
new file mode 100644
index 0000000..a994f1c
--- /dev/null
+++ b/lzip_index.h
@@ -0,0 +1,94 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+
+#ifndef INT64_MAX
+#define INT64_MAX 0x7FFFFFFFFFFFFFFFLL
+#endif
+
+
+class Block
+ {
+ long long pos_, size_; // pos >= 0, size >= 0, pos + size <= INT64_MAX
+
+public:
+ Block( const long long p, const long long s ) : pos_( p ), size_( s ) {}
+
+ long long pos() const { return pos_; }
+ long long size() const { return size_; }
+ long long end() const { return pos_ + size_; }
+
+ void pos( const long long p ) { pos_ = p; }
+ void size( const long long s ) { size_ = s; }
+ };
+
+
+class Lzip_index
+ {
+ struct Member
+ {
+ Block dblock, mblock; // data block, member block
+ unsigned dictionary_size;
+
+ Member( const long long dpos, const long long dsize,
+ const long long mpos, const long long msize,
+ const unsigned dict_size )
+ : dblock( dpos, dsize ), mblock( mpos, msize ),
+ dictionary_size( dict_size ) {}
+ };
+
+ std::vector< Member > member_vector;
+ std::string error_;
+ const long long insize;
+ int retval_;
+ unsigned dictionary_size_; // largest dictionary size in the file
+ bool bad_magic_; // bad magic in first header
+
+ bool check_header( const Lzip_header & header, const bool first );
+ void set_errno_error( const char * const msg );
+ void set_num_error( const char * const msg, unsigned long long num );
+ bool read_header( const int fd, Lzip_header & header, const long long pos );
+ bool skip_trailing_data( const int fd, unsigned long long & pos,
+ const Cl_options & cl_opts );
+
+public:
+ Lzip_index( const int infd, const Cl_options & cl_opts );
+
+ long members() const { return member_vector.size(); }
+ const std::string & error() const { return error_; }
+ int retval() const { return retval_; }
+ unsigned dictionary_size() const { return dictionary_size_; }
+ bool bad_magic() const { return bad_magic_; }
+
+ long long udata_size() const
+ { if( member_vector.empty() ) return 0;
+ return member_vector.back().dblock.end(); }
+
+ long long cdata_size() const
+ { if( member_vector.empty() ) return 0;
+ return member_vector.back().mblock.end(); }
+
+ // total size including trailing data (if any)
+ long long file_size() const
+ { if( insize >= 0 ) return insize; else return 0; }
+
+ const Block & dblock( const long i ) const
+ { return member_vector[i].dblock; }
+ const Block & mblock( const long i ) const
+ { return member_vector[i].mblock; }
+ unsigned dictionary_size( const long i ) const
+ { return member_vector[i].dictionary_size; }
+ };
diff --git a/main.cc b/main.cc
new file mode 100644
index 0000000..548b58f
--- /dev/null
+++ b/main.cc
@@ -0,0 +1,1016 @@
+/* Plzip - Massively parallel implementation of lzip
+ Copyright (C) 2009 Laszlo Ersek.
+ Copyright (C) 2009-2024 Antonio Diaz Diaz.
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+*/
+/*
+ Exit status: 0 for a normal exit, 1 for environmental problems
+ (file not found, invalid command-line options, I/O errors, etc), 2 to
+ indicate a corrupt or invalid input file, 3 for an internal consistency
+ error (e.g., bug) which caused plzip to panic.
+*/
+
+#define _FILE_OFFSET_BITS 64
+
+#include <algorithm>
+#include <cerrno>
+#include <climits> // SSIZE_MAX
+#include <csignal>
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+#include <string>
+#include <vector>
+#include <fcntl.h>
+#include <stdint.h> // SIZE_MAX
+#include <unistd.h>
+#include <utime.h>
+#include <sys/stat.h>
+#include <lzlib.h>
+#if defined __MSVCRT__ || defined __OS2__
+#include <io.h>
+#if defined __MSVCRT__
+#define fchmod(x,y) 0
+#define fchown(x,y,z) 0
+#define strtoull std::strtoul
+#define SIGHUP SIGTERM
+#define S_ISSOCK(x) 0
+#ifndef S_IRGRP
+#define S_IRGRP 0
+#define S_IWGRP 0
+#define S_IROTH 0
+#define S_IWOTH 0
+#endif
+#endif
+#endif
+
+#include "arg_parser.h"
+#include "lzip.h"
+
+#ifndef O_BINARY
+#define O_BINARY 0
+#endif
+
+#if CHAR_BIT != 8
+#error "Environments where CHAR_BIT != 8 are not supported."
+#endif
+
+#if ( defined SIZE_MAX && SIZE_MAX < UINT_MAX ) || \
+ ( defined SSIZE_MAX && SSIZE_MAX < INT_MAX )
+#error "Environments where 'size_t' is narrower than 'int' are not supported."
+#endif
+
+int verbosity = 0;
+
+namespace {
+
+const char * const program_name = "plzip";
+const char * const program_year = "2024";
+const char * invocation_name = program_name; // default value
+
+const struct { const char * from; const char * to; } known_extensions[] = {
+ { ".lz", "" },
+ { ".tlz", ".tar" },
+ { 0, 0 } };
+
+struct Lzma_options
+ {
+ int dictionary_size; // 4 KiB .. 512 MiB
+ int match_len_limit; // 5 .. 273
+ };
+
+enum Mode { m_compress, m_decompress, m_list, m_test };
+
+/* Variables used in signal handler context.
+ They are not declared volatile because the handler never returns. */
+std::string output_filename;
+int outfd = -1;
+bool delete_output_on_interrupt = false;
+
+
+void show_help( const long num_online )
+ {
+ std::printf( "Plzip is a massively parallel (multi-threaded) implementation of lzip,\n"
+ "compatible with lzip 1.4 or newer. Plzip uses the compression library lzlib.\n"
+ "\nLzip is a lossless data compressor with a user interface similar to the one\n"
+ "of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov\n"
+ "chain-Algorithm' (LZMA) stream format to maximize interoperability. The\n"
+ "maximum dictionary size is 512 MiB so that any lzip file can be decompressed\n"
+ "on 32-bit machines. Lzip provides accurate and robust 3-factor integrity\n"
+ "checking. Lzip can compress about as fast as gzip (lzip -0) or compress most\n"
+ "files more than bzip2 (lzip -9). Decompression speed is intermediate between\n"
+ "gzip and bzip2. Lzip is better than gzip and bzip2 from a data recovery\n"
+ "perspective. Lzip has been designed, written, and tested with great care to\n"
+ "replace gzip and bzip2 as the standard general-purpose compressed format for\n"
+ "Unix-like systems.\n"
+ "\nPlzip can compress/decompress large files on multiprocessor machines much\n"
+ "faster than lzip, at the cost of a slightly reduced compression ratio (0.4\n"
+ "to 2 percent larger compressed files). Note that the number of usable\n"
+ "threads is limited by file size; on files larger than a few GB plzip can use\n"
+ "hundreds of processors, but on files of only a few MB plzip is no faster\n"
+ "than lzip.\n"
+ "\nUsage: %s [options] [files]\n", invocation_name );
+ std::printf( "\nOptions:\n"
+ " -h, --help display this help and exit\n"
+ " -V, --version output version information and exit\n"
+ " -a, --trailing-error exit with error status if trailing data\n"
+ " -B, --data-size=<bytes> set size of input data blocks [2x8=16 MiB]\n"
+ " -c, --stdout write to standard output, keep input files\n"
+ " -d, --decompress decompress, test compressed file integrity\n"
+ " -f, --force overwrite existing output files\n"
+ " -F, --recompress force re-compression of compressed files\n"
+ " -k, --keep keep (don't delete) input files\n"
+ " -l, --list print (un)compressed file sizes\n"
+ " -m, --match-length=<bytes> set match length limit in bytes [36]\n"
+ " -n, --threads=<n> set number of (de)compression threads [%ld]\n"
+ " -o, --output=<file> write to <file>, keep input files\n"
+ " -q, --quiet suppress all messages\n"
+ " -s, --dictionary-size=<bytes> set dictionary size limit in bytes [8 MiB]\n"
+ " -t, --test test compressed file integrity\n"
+ " -v, --verbose be verbose (a 2nd -v gives more)\n"
+ " -0 .. -9 set compression level [default 6]\n"
+ " --fast alias for -0\n"
+ " --best alias for -9\n"
+ " --loose-trailing allow trailing data seeming corrupt header\n"
+ " --in-slots=<n> number of 1 MiB input packets buffered [4]\n"
+ " --out-slots=<n> number of 1 MiB output packets buffered [64]\n"
+ " --check-lib compare version of lzlib.h with liblz.{a,so}\n",
+ num_online );
+ if( verbosity >= 1 )
+ {
+ std::printf( " --debug=<level> print mode(2), debug statistics(1) to stderr\n" );
+ }
+ std::printf( "\nIf no file names are given, or if a file is '-', plzip compresses or\n"
+ "decompresses from standard input to standard output.\n"
+ "Numbers may be followed by a multiplier: k = kB = 10^3 = 1000,\n"
+ "Ki = KiB = 2^10 = 1024, M = 10^6, Mi = 2^20, G = 10^9, Gi = 2^30, etc...\n"
+ "Dictionary sizes 12 to 29 are interpreted as powers of two, meaning 2^12 to\n"
+ "2^29 bytes.\n"
+ "\nThe bidimensional parameter space of LZMA can't be mapped to a linear scale\n"
+ "optimal for all files. If your files are large, very repetitive, etc, you\n"
+ "may need to use the options --dictionary-size and --match-length directly\n"
+ "to achieve optimal performance.\n"
+ "\nTo extract all the files from archive 'foo.tar.lz', use the commands\n"
+ "'tar -xf foo.tar.lz' or 'plzip -cd foo.tar.lz | tar -xf -'.\n"
+ "\nExit status: 0 for a normal exit, 1 for environmental problems\n"
+ "(file not found, invalid command-line options, I/O errors, etc), 2 to\n"
+ "indicate a corrupt or invalid input file, 3 for an internal consistency\n"
+ "error (e.g., bug) which caused plzip to panic.\n"
+ "\nReport bugs to lzip-bug@nongnu.org\n"
+ "Plzip home page: http://www.nongnu.org/lzip/plzip.html\n" );
+ }
+
+
+void show_lzlib_version()
+ {
+ std::printf( "Using lzlib %s\n", LZ_version() );
+#if !defined LZ_API_VERSION
+ std::fputs( "LZ_API_VERSION is not defined.\n", stdout );
+#elif LZ_API_VERSION >= 1012
+ std::printf( "Using LZ_API_VERSION = %u\n", LZ_api_version() );
+#else
+ std::printf( "Compiled with LZ_API_VERSION = %u. "
+ "Using an unknown LZ_API_VERSION\n", LZ_API_VERSION );
+#endif
+ }
+
+
+void show_version()
+ {
+ std::printf( "%s %s\n", program_name, PROGVERSION );
+ std::printf( "Copyright (C) 2009 Laszlo Ersek.\n" );
+ std::printf( "Copyright (C) %s Antonio Diaz Diaz.\n", program_year );
+ std::printf( "License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>\n"
+ "This is free software: you are free to change and redistribute it.\n"
+ "There is NO WARRANTY, to the extent permitted by law.\n" );
+ show_lzlib_version();
+ }
+
+
+int check_lzlib_ver() // <major>.<minor> or <major>.<minor>[a-z.-]*
+ {
+#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
+ const unsigned char * p = (unsigned char *)LZ_version_string;
+ unsigned major = 0, minor = 0;
+ while( major < 100000 && isdigit( *p ) )
+ { major *= 10; major += *p - '0'; ++p; }
+ if( *p == '.' ) ++p;
+ else
+out: { show_error( "Invalid LZ_version_string in lzlib.h" ); return 2; }
+ while( minor < 100 && isdigit( *p ) )
+ { minor *= 10; minor += *p - '0'; ++p; }
+ if( *p && *p != '-' && *p != '.' && !std::islower( *p ) ) goto out;
+ const unsigned version = major * 1000 + minor;
+ if( LZ_API_VERSION != version )
+ {
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s: Version mismatch in lzlib.h: "
+ "LZ_API_VERSION = %u, should be %u.\n",
+ program_name, LZ_API_VERSION, version );
+ return 2;
+ }
+#endif
+ return 0;
+ }
+
+
+int check_lib()
+ {
+ int retval = check_lzlib_ver();
+ if( std::strcmp( LZ_version_string, LZ_version() ) != 0 )
+ { set_retval( retval, 1 );
+ if( verbosity >= 0 )
+ std::printf( "warning: LZ_version_string != LZ_version() (%s vs %s)\n",
+ LZ_version_string, LZ_version() ); }
+#if defined LZ_API_VERSION && LZ_API_VERSION >= 1012
+ if( LZ_API_VERSION != LZ_api_version() )
+ { set_retval( retval, 1 );
+ if( verbosity >= 0 )
+ std::printf( "warning: LZ_API_VERSION != LZ_api_version() (%u vs %u)\n",
+ LZ_API_VERSION, LZ_api_version() ); }
+#endif
+ if( verbosity >= 1 ) show_lzlib_version();
+ return retval;
+ }
+
+} // end namespace
+
+void Pretty_print::operator()( const char * const msg ) const
+ {
+ if( verbosity < 0 ) return;
+ if( first_post )
+ {
+ first_post = false;
+ std::fputs( padded_name.c_str(), stderr );
+ if( !msg ) std::fflush( stderr );
+ }
+ if( msg ) std::fprintf( stderr, "%s\n", msg );
+ }
+
+
+const char * bad_version( const unsigned version )
+ {
+ static char buf[80];
+ snprintf( buf, sizeof buf, "Version %u member format not supported.",
+ version );
+ return buf;
+ }
+
+
+const char * format_ds( const unsigned dictionary_size )
+ {
+ enum { bufsize = 16, factor = 1024, n = 3 };
+ static char buf[bufsize];
+ const char * const prefix[n] = { "Ki", "Mi", "Gi" };
+ const char * p = "";
+ const char * np = " ";
+ unsigned num = dictionary_size;
+ bool exact = ( num % factor == 0 );
+
+ for( int i = 0; i < n && ( num > 9999 || ( exact && num >= factor ) ); ++i )
+ { num /= factor; if( num % factor != 0 ) exact = false;
+ p = prefix[i]; np = ""; }
+ snprintf( buf, bufsize, "%s%4u %sB", np, num, p );
+ return buf;
+ }
+
+
+void show_header( const unsigned dictionary_size )
+ {
+ std::fprintf( stderr, "dict %s, ", format_ds( dictionary_size ) );
+ }
+
+namespace {
+
+// separate numbers of 5 or more digits in groups of 3 digits using '_'
+const char * format_num3( unsigned long long num )
+ {
+ enum { buffers = 8, bufsize = 4 * sizeof num, n = 10 };
+ const char * const si_prefix = "kMGTPEZYRQ";
+ const char * const binary_prefix = "KMGTPEZYRQ";
+ static char buffer[buffers][bufsize]; // circle of static buffers for printf
+ static int current = 0;
+
+ char * const buf = buffer[current++]; current %= buffers;
+ char * p = buf + bufsize - 1; // fill the buffer backwards
+ *p = 0; // terminator
+ if( num > 1024 )
+ {
+ char prefix = 0; // try binary first, then si
+ for( int i = 0; i < n && num != 0 && num % 1024 == 0; ++i )
+ { num /= 1024; prefix = binary_prefix[i]; }
+ if( prefix ) *(--p) = 'i';
+ else
+ for( int i = 0; i < n && num != 0 && num % 1000 == 0; ++i )
+ { num /= 1000; prefix = si_prefix[i]; }
+ if( prefix ) *(--p) = prefix;
+ }
+ const bool split = num >= 10000;
+
+ for( int i = 0; ; )
+ {
+ *(--p) = num % 10 + '0'; num /= 10; if( num == 0 ) break;
+ if( split && ++i >= 3 ) { i = 0; *(--p) = '_'; }
+ }
+ return p;
+ }
+
+
+void show_option_error( const char * const arg, const char * const msg,
+ const char * const option_name )
+ {
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s: '%s': %s option '%s'.\n",
+ program_name, arg, msg, option_name );
+ }
+
+
+// Recognized formats: <num>k, <num>Ki, <num>[MGTPEZYRQ][i]
+unsigned long long getnum( const char * const arg,
+ const char * const option_name,
+ const unsigned long long llimit,
+ const unsigned long long ulimit )
+ {
+ char * tail;
+ errno = 0;
+ unsigned long long result = strtoull( arg, &tail, 0 );
+ if( tail == arg )
+ { show_option_error( arg, "Bad or missing numerical argument in",
+ option_name ); std::exit( 1 ); }
+
+ if( !errno && tail[0] )
+ {
+ const unsigned factor = ( tail[1] == 'i' ) ? 1024 : 1000;
+ int exponent = 0; // 0 = bad multiplier
+ switch( tail[0] )
+ {
+ case 'Q': exponent = 10; break;
+ case 'R': exponent = 9; break;
+ case 'Y': exponent = 8; break;
+ case 'Z': exponent = 7; break;
+ case 'E': exponent = 6; break;
+ case 'P': exponent = 5; break;
+ case 'T': exponent = 4; break;
+ case 'G': exponent = 3; break;
+ case 'M': exponent = 2; break;
+ case 'K': if( factor == 1024 ) exponent = 1; break;
+ case 'k': if( factor == 1000 ) exponent = 1; break;
+ }
+ if( exponent <= 0 )
+ { show_option_error( arg, "Bad multiplier in numerical argument of",
+ option_name ); std::exit( 1 ); }
+ for( int i = 0; i < exponent; ++i )
+ {
+ if( ulimit / factor >= result ) result *= factor;
+ else { errno = ERANGE; break; }
+ }
+ }
+ if( !errno && ( result < llimit || result > ulimit ) ) errno = ERANGE;
+ if( errno )
+ {
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s: '%s': Value out of limits [%s,%s] in "
+ "option '%s'.\n", program_name, arg, format_num3( llimit ),
+ format_num3( ulimit ), option_name );
+ std::exit( 1 );
+ }
+ return result;
+ }
+
+
+int get_dict_size( const char * const arg, const char * const option_name )
+ {
+ char * tail;
+ const long bits = std::strtol( arg, &tail, 0 );
+ if( bits >= LZ_min_dictionary_bits() &&
+ bits <= LZ_max_dictionary_bits() && *tail == 0 )
+ return 1 << bits;
+ int dictionary_size = getnum( arg, option_name, LZ_min_dictionary_size(),
+ LZ_max_dictionary_size() );
+ if( dictionary_size == 65535 ) ++dictionary_size; // no fast encoder
+ return dictionary_size;
+ }
+
+
+void set_mode( Mode & program_mode, const Mode new_mode )
+ {
+ if( program_mode != m_compress && program_mode != new_mode )
+ {
+ show_error( "Only one operation can be specified.", 0, true );
+ std::exit( 1 );
+ }
+ program_mode = new_mode;
+ }
+
+
+int extension_index( const std::string & name )
+ {
+ for( int eindex = 0; known_extensions[eindex].from; ++eindex )
+ {
+ const std::string ext( known_extensions[eindex].from );
+ if( name.size() > ext.size() &&
+ name.compare( name.size() - ext.size(), ext.size(), ext ) == 0 )
+ return eindex;
+ }
+ return -1;
+ }
+
+
+void set_c_outname( const std::string & name, const bool filenames_given,
+ const bool force_ext )
+ {
+ /* zupdate < 1.9 depends on lzip adding the extension '.lz' to name when
+ reading from standard input. */
+ output_filename = name;
+ if( force_ext ||
+ ( !filenames_given && extension_index( output_filename ) < 0 ) )
+ output_filename += known_extensions[0].from;
+ }
+
+
+void set_d_outname( const std::string & name, const int eindex )
+ {
+ if( eindex >= 0 )
+ {
+ const std::string from( known_extensions[eindex].from );
+ if( name.size() > from.size() )
+ {
+ output_filename.assign( name, 0, name.size() - from.size() );
+ output_filename += known_extensions[eindex].to;
+ return;
+ }
+ }
+ output_filename = name; output_filename += ".out";
+ if( verbosity >= 1 )
+ std::fprintf( stderr, "%s: %s: Can't guess original name -- using '%s'\n",
+ program_name, name.c_str(), output_filename.c_str() );
+ }
+
+} // end namespace
+
+int open_instream( const char * const name, struct stat * const in_statsp,
+ const bool one_to_one, const bool reg_only )
+ {
+ int infd = open( name, O_RDONLY | O_BINARY );
+ if( infd < 0 )
+ show_file_error( name, "Can't open input file", errno );
+ else
+ {
+ const int i = fstat( infd, in_statsp );
+ const mode_t mode = in_statsp->st_mode;
+ const bool can_read = ( i == 0 && !reg_only &&
+ ( S_ISBLK( mode ) || S_ISCHR( mode ) ||
+ S_ISFIFO( mode ) || S_ISSOCK( mode ) ) );
+ if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || one_to_one ) ) )
+ {
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s: %s: Input file is not a regular file%s.\n",
+ program_name, name, ( can_read && one_to_one ) ?
+ ",\n and neither '-c' nor '-o' were specified" : "" );
+ close( infd );
+ infd = -1;
+ }
+ }
+ return infd;
+ }
+
+namespace {
+
+int open_instream2( const char * const name, struct stat * const in_statsp,
+ const Mode program_mode, const int eindex,
+ const bool one_to_one, const bool recompress )
+ {
+ if( program_mode == m_compress && !recompress && eindex >= 0 )
+ {
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s: %s: Input file already has '%s' suffix.\n",
+ program_name, name, known_extensions[eindex].from );
+ return -1;
+ }
+ return open_instream( name, in_statsp, one_to_one, false );
+ }
+
+
+bool make_dirs( const std::string & name )
+ {
+ int i = name.size();
+ while( i > 0 && name[i-1] != '/' ) --i; // remove last component
+ while( i > 0 && name[i-1] == '/' ) --i; // remove slash(es)
+ const int dirsize = i; // size of dirname without trailing slash(es)
+
+ for( i = 0; i < dirsize; ) // if dirsize == 0, dirname is '/' or empty
+ {
+ while( i < dirsize && name[i] == '/' ) ++i;
+ const int first = i;
+ while( i < dirsize && name[i] != '/' ) ++i;
+ if( first < i )
+ {
+ const std::string partial( name, 0, i );
+ const mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
+ struct stat st;
+ if( stat( partial.c_str(), &st ) == 0 )
+ { if( !S_ISDIR( st.st_mode ) ) { errno = ENOTDIR; return false; } }
+ else if( mkdir( partial.c_str(), mode ) != 0 && errno != EEXIST )
+ return false; // if EEXIST, another process created the dir
+ }
+ }
+ return true;
+ }
+
+
+bool open_outstream( const bool force, const bool protect )
+ {
+ const mode_t usr_rw = S_IRUSR | S_IWUSR;
+ const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH;
+ const mode_t outfd_mode = protect ? usr_rw : all_rw;
+ int flags = O_CREAT | O_WRONLY | O_BINARY;
+ if( force ) flags |= O_TRUNC; else flags |= O_EXCL;
+
+ outfd = -1;
+ if( output_filename.size() &&
+ output_filename[output_filename.size()-1] == '/' ) errno = EISDIR;
+ else {
+ if( !protect && !make_dirs( output_filename ) )
+ { show_file_error( output_filename.c_str(),
+ "Error creating intermediate directory", errno ); return false; }
+ outfd = open( output_filename.c_str(), flags, outfd_mode );
+ if( outfd >= 0 ) { delete_output_on_interrupt = true; return true; }
+ if( errno == EEXIST )
+ { show_file_error( output_filename.c_str(),
+ "Output file already exists, skipping." ); return false; }
+ }
+ show_file_error( output_filename.c_str(), "Can't create output file", errno );
+ return false;
+ }
+
+
+void set_signals( void (*action)(int) )
+ {
+ std::signal( SIGHUP, action );
+ std::signal( SIGINT, action );
+ std::signal( SIGTERM, action );
+ }
+
+} // end namespace
+
+/* This can be called from any thread, main thread or sub-threads alike,
+ since they all call common helper functions like 'xlock' that call
+ cleanup_and_fail() in case of an error.
+*/
+void cleanup_and_fail( const int retval )
+ {
+ // only one thread can delete and exit
+ static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+
+ set_signals( SIG_IGN ); // ignore signals
+ pthread_mutex_lock( &mutex ); // ignore errors to avoid loop
+ const int saved_verbosity = verbosity;
+ verbosity = -1; // suppress messages from other threads
+ if( delete_output_on_interrupt )
+ {
+ delete_output_on_interrupt = false;
+ if( saved_verbosity >= 0 )
+ std::fprintf( stderr, "%s: %s: Deleting output file, if it exists.\n",
+ program_name, output_filename.c_str() );
+ if( outfd >= 0 ) { close( outfd ); outfd = -1; }
+ if( std::remove( output_filename.c_str() ) != 0 && errno != ENOENT &&
+ saved_verbosity >= 0 )
+ std::fprintf( stderr, "%s: warning: deletion of output file failed: %s\n",
+ program_name, std::strerror( errno ) );
+ }
+ std::exit( retval );
+ }
+
+namespace {
+
+extern "C" void signal_handler( int )
+ {
+ show_error( "Control-C or similar caught, quitting." );
+ cleanup_and_fail( 1 );
+ }
+
+
+bool check_tty_in( const char * const input_filename, const int infd,
+ const Mode program_mode, int & retval )
+ {
+ if( ( program_mode == m_decompress || program_mode == m_test ) &&
+ isatty( infd ) ) // for example /dev/tty
+ { show_file_error( input_filename,
+ "I won't read compressed data from a terminal." );
+ close( infd ); set_retval( retval, 2 );
+ if( program_mode != m_test ) cleanup_and_fail( retval );
+ return false; }
+ return true;
+ }
+
+bool check_tty_out( const Mode program_mode )
+ {
+ if( program_mode == m_compress && isatty( outfd ) )
+ { show_file_error( output_filename.size() ?
+ output_filename.c_str() : "(stdout)",
+ "I won't write compressed data to a terminal." );
+ return false; }
+ return true;
+ }
+
+
+// Set permissions, owner, and times.
+void close_and_set_permissions( const struct stat * const in_statsp )
+ {
+ bool warning = false;
+ if( in_statsp )
+ {
+ const mode_t mode = in_statsp->st_mode;
+ // fchown in many cases returns with EPERM, which can be safely ignored.
+ if( fchown( outfd, in_statsp->st_uid, in_statsp->st_gid ) == 0 )
+ { if( fchmod( outfd, mode ) != 0 ) warning = true; }
+ else
+ if( errno != EPERM ||
+ fchmod( outfd, mode & ~( S_ISUID | S_ISGID | S_ISVTX ) ) != 0 )
+ warning = true;
+ }
+ if( close( outfd ) != 0 )
+ { show_file_error( output_filename.c_str(), "Error closing output file",
+ errno ); cleanup_and_fail( 1 ); }
+ outfd = -1;
+ delete_output_on_interrupt = false;
+ if( in_statsp )
+ {
+ struct utimbuf t;
+ t.actime = in_statsp->st_atime;
+ t.modtime = in_statsp->st_mtime;
+ if( utime( output_filename.c_str(), &t ) != 0 ) warning = true;
+ }
+ if( warning && verbosity >= 1 )
+ show_file_error( output_filename.c_str(),
+ "warning: can't change output file attributes", errno );
+ }
+
+} // end namespace
+
+
+void show_error( const char * const msg, const int errcode, const bool help )
+ {
+ if( verbosity < 0 ) return;
+ if( msg && msg[0] )
+ std::fprintf( stderr, "%s: %s%s%s\n", program_name, msg,
+ ( errcode > 0 ) ? ": " : "",
+ ( errcode > 0 ) ? std::strerror( errcode ) : "" );
+ if( help )
+ std::fprintf( stderr, "Try '%s --help' for more information.\n",
+ invocation_name );
+ }
+
+
+void show_file_error( const char * const filename, const char * const msg,
+ const int errcode )
+ {
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s: %s: %s%s%s\n", program_name, filename, msg,
+ ( errcode > 0 ) ? ": " : "",
+ ( errcode > 0 ) ? std::strerror( errcode ) : "" );
+ }
+
+
+void internal_error( const char * const msg )
+ {
+ if( verbosity >= 0 )
+ std::fprintf( stderr, "%s: internal error: %s\n", program_name, msg );
+ std::exit( 3 );
+ }
+
+
+void show_progress( const unsigned long long packet_size,
+ const unsigned long long cfile_size,
+ const Pretty_print * const p )
+ {
+ static unsigned long long csize = 0; // file_size / 100
+ static unsigned long long pos = 0;
+ static const Pretty_print * pp = 0;
+ static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+ static bool enabled = true;
+
+ if( !enabled ) return;
+ if( p ) // initialize static vars
+ {
+ if( verbosity < 2 || !isatty( STDERR_FILENO ) ) { enabled = false; return; }
+ csize = cfile_size; pos = 0; pp = p;
+ }
+ if( pp )
+ {
+ xlock( &mutex );
+ pos += packet_size;
+ if( csize > 0 )
+ std::fprintf( stderr, "%4llu%% %.1f MB\r", pos / csize, pos / 1000000.0 );
+ else
+ std::fprintf( stderr, " %.1f MB\r", pos / 1000000.0 );
+ pp->reset(); (*pp)(); // restore cursor position
+ xunlock( &mutex );
+ }
+ }
+
+
+#if defined __MSVCRT__
+#include <windows.h>
+#define _SC_NPROCESSORS_ONLN 1
+#define _SC_THREAD_THREADS_MAX 2
+
+long sysconf( int flag )
+ {
+ if( flag == _SC_NPROCESSORS_ONLN )
+ {
+ SYSTEM_INFO si;
+ GetSystemInfo( &si );
+ return si.dwNumberOfProcessors;
+ }
+ if( flag != _SC_THREAD_THREADS_MAX ) errno = EINVAL;
+ return -1; // unlimited threads or error
+ }
+
+#endif // __MSVCRT__
+
+
+int main( const int argc, const char * const argv[] )
+ {
+ /* Mapping from gzip/bzip2 style 0..9 compression levels to the
+ corresponding LZMA compression parameters. */
+ const Lzma_options option_mapping[] =
+ {
+ { 65535, 16 }, // -0 (65535,16 chooses fast encoder)
+ { 1 << 20, 5 }, // -1
+ { 3 << 19, 6 }, // -2
+ { 1 << 21, 8 }, // -3
+ { 3 << 20, 12 }, // -4
+ { 1 << 22, 20 }, // -5
+ { 1 << 23, 36 }, // -6
+ { 1 << 24, 68 }, // -7
+ { 3 << 23, 132 }, // -8
+ { 1 << 25, 273 } }; // -9
+ Lzma_options encoder_options = option_mapping[6]; // default = "-6"
+ std::string default_output_filename;
+ int data_size = 0;
+ int debug_level = 0;
+ int num_workers = 0; // start this many worker threads
+ int in_slots = 4;
+ int out_slots = 64;
+ Mode program_mode = m_compress;
+ Cl_options cl_opts; // command-line options
+ bool force = false;
+ bool keep_input_files = false;
+ bool recompress = false;
+ bool to_stdout = false;
+ if( argc > 0 ) invocation_name = argv[0];
+
+ enum { opt_chk = 256, opt_dbg, opt_in, opt_lt, opt_out };
+ const Arg_parser::Option options[] =
+ {
+ { '0', "fast", Arg_parser::no },
+ { '1', 0, Arg_parser::no },
+ { '2', 0, Arg_parser::no },
+ { '3', 0, Arg_parser::no },
+ { '4', 0, Arg_parser::no },
+ { '5', 0, Arg_parser::no },
+ { '6', 0, Arg_parser::no },
+ { '7', 0, Arg_parser::no },
+ { '8', 0, Arg_parser::no },
+ { '9', "best", Arg_parser::no },
+ { 'a', "trailing-error", Arg_parser::no },
+ { 'b', "member-size", Arg_parser::yes },
+ { 'B', "data-size", Arg_parser::yes },
+ { 'c', "stdout", Arg_parser::no },
+ { 'd', "decompress", Arg_parser::no },
+ { 'f', "force", Arg_parser::no },
+ { 'F', "recompress", Arg_parser::no },
+ { 'h', "help", Arg_parser::no },
+ { 'k', "keep", Arg_parser::no },
+ { 'l', "list", Arg_parser::no },
+ { 'm', "match-length", Arg_parser::yes },
+ { 'n', "threads", Arg_parser::yes },
+ { 'o', "output", Arg_parser::yes },
+ { 'q', "quiet", Arg_parser::no },
+ { 's', "dictionary-size", Arg_parser::yes },
+ { 'S', "volume-size", Arg_parser::yes },
+ { 't', "test", Arg_parser::no },
+ { 'v', "verbose", Arg_parser::no },
+ { 'V', "version", Arg_parser::no },
+ { opt_chk, "check-lib", Arg_parser::no },
+ { opt_dbg, "debug", Arg_parser::yes },
+ { opt_in, "in-slots", Arg_parser::yes },
+ { opt_lt, "loose-trailing", Arg_parser::no },
+ { opt_out, "out-slots", Arg_parser::yes },
+ { 0, 0, Arg_parser::no } };
+
+ const Arg_parser parser( argc, argv, options );
+ if( parser.error().size() ) // bad option
+ { show_error( parser.error().c_str(), 0, true ); return 1; }
+
+ const long num_online = std::max( 1L, sysconf( _SC_NPROCESSORS_ONLN ) );
+ long max_workers = sysconf( _SC_THREAD_THREADS_MAX );
+ if( max_workers < 1 || max_workers > INT_MAX / (int)sizeof (pthread_t) )
+ max_workers = INT_MAX / sizeof (pthread_t);
+
+ int argind = 0;
+ for( ; argind < parser.arguments(); ++argind )
+ {
+ const int code = parser.code( argind );
+ if( !code ) break; // no more options
+ const char * const pn = parser.parsed_name( argind ).c_str();
+ const std::string & sarg = parser.argument( argind );
+ const char * const arg = sarg.c_str();
+ switch( code )
+ {
+ case '0': case '1': case '2': case '3': case '4':
+ case '5': case '6': case '7': case '8': case '9':
+ encoder_options = option_mapping[code-'0']; break;
+ case 'a': cl_opts.ignore_trailing = false; break;
+ case 'b': break;
+ case 'B': data_size = getnum( arg, pn, 2 * LZ_min_dictionary_size(),
+ 2 * LZ_max_dictionary_size() ); break;
+ case 'c': to_stdout = true; break;
+ case 'd': set_mode( program_mode, m_decompress ); break;
+ case 'f': force = true; break;
+ case 'F': recompress = true; break;
+ case 'h': show_help( num_online ); return 0;
+ case 'k': keep_input_files = true; break;
+ case 'l': set_mode( program_mode, m_list ); break;
+ case 'm': encoder_options.match_len_limit =
+ getnum( arg, pn, LZ_min_match_len_limit(),
+ LZ_max_match_len_limit() ); break;
+ case 'n': num_workers = getnum( arg, pn, 1, max_workers ); break;
+ case 'o': if( sarg == "-" ) to_stdout = true;
+ else { default_output_filename = sarg; } break;
+ case 'q': verbosity = -1; break;
+ case 's': encoder_options.dictionary_size = get_dict_size( arg, pn );
+ break;
+ case 'S': break;
+ case 't': set_mode( program_mode, m_test ); break;
+ case 'v': if( verbosity < 4 ) ++verbosity; break;
+ case 'V': show_version(); return 0;
+ case opt_chk: return check_lib();
+ case opt_dbg: debug_level = getnum( arg, pn, 0, 3 ); break;
+ case opt_in: in_slots = getnum( arg, pn, 1, 64 ); break;
+ case opt_lt: cl_opts.loose_trailing = true; break;
+ case opt_out: out_slots = getnum( arg, pn, 1, 1024 ); break;
+ default: internal_error( "uncaught option." );
+ }
+ } // end process options
+
+ if( LZ_version()[0] < '1' )
+ { show_error( "Wrong library version. At least lzlib 1.0 is required." );
+ return 1; }
+
+#if defined __MSVCRT__ || defined __OS2__
+ setmode( STDIN_FILENO, O_BINARY );
+ setmode( STDOUT_FILENO, O_BINARY );
+#endif
+
+ std::vector< std::string > filenames;
+ bool filenames_given = false;
+ for( ; argind < parser.arguments(); ++argind )
+ {
+ filenames.push_back( parser.argument( argind ) );
+ if( filenames.back() != "-" ) filenames_given = true;
+ }
+ if( filenames.empty() ) filenames.push_back("-");
+
+ if( program_mode == m_list ) return list_files( filenames, cl_opts );
+
+ const bool fast = encoder_options.dictionary_size == 65535 &&
+ encoder_options.match_len_limit == 16;
+ if( data_size <= 0 )
+ {
+ if( fast ) data_size = 1 << 20;
+ else data_size = 2 * std::max( 65536, encoder_options.dictionary_size );
+ }
+ else if( !fast && data_size < encoder_options.dictionary_size )
+ encoder_options.dictionary_size =
+ std::max( data_size, LZ_min_dictionary_size() );
+
+ if( num_workers <= 0 )
+ {
+ if( program_mode == m_compress && sizeof (void *) <= 4 )
+ {
+ // use less than 2.22 GiB on 32 bit systems
+ const long long limit = ( 27LL << 25 ) + ( 11LL << 27 ); // 4 * 568 MiB
+ const long long mem = ( 27LL * data_size ) / 8 +
+ ( fast ? 3LL << 19 : 11LL * encoder_options.dictionary_size );
+ const int nmax32 = std::max( limit / mem, 1LL );
+ if( max_workers > nmax32 ) max_workers = nmax32;
+ }
+ num_workers = std::min( num_online, max_workers );
+ }
+
+ if( program_mode == m_test ) to_stdout = false; // apply overrides
+ if( program_mode == m_test || to_stdout ) default_output_filename.clear();
+
+ if( to_stdout && program_mode != m_test ) // check tty only once
+ { outfd = STDOUT_FILENO; if( !check_tty_out( program_mode ) ) return 1; }
+ else outfd = -1;
+
+ const bool to_file = !to_stdout && program_mode != m_test &&
+ default_output_filename.size();
+ if( !to_stdout && program_mode != m_test && ( filenames_given || to_file ) )
+ set_signals( signal_handler );
+
+ Pretty_print pp( filenames );
+
+ int failed_tests = 0;
+ int retval = 0;
+ const bool one_to_one = !to_stdout && program_mode != m_test && !to_file;
+ bool stdin_used = false;
+ struct stat in_stats;
+ for( unsigned i = 0; i < filenames.size(); ++i )
+ {
+ std::string input_filename;
+ int infd;
+
+ pp.set_name( filenames[i] );
+ if( filenames[i] == "-" )
+ {
+ if( stdin_used ) continue; else stdin_used = true;
+ infd = STDIN_FILENO;
+ if( !check_tty_in( pp.name(), infd, program_mode, retval ) ) continue;
+ if( one_to_one ) { outfd = STDOUT_FILENO; output_filename.clear(); }
+ }
+ else
+ {
+ const int eindex = extension_index( input_filename = filenames[i] );
+ infd = open_instream2( input_filename.c_str(), &in_stats, program_mode,
+ eindex, one_to_one, recompress );
+ if( infd < 0 ) { set_retval( retval, 1 ); continue; }
+ if( !check_tty_in( pp.name(), infd, program_mode, retval ) ) continue;
+ if( one_to_one ) // open outfd after checking infd
+ {
+ if( program_mode == m_compress )
+ set_c_outname( input_filename, true, true );
+ else set_d_outname( input_filename, eindex );
+ if( !open_outstream( force, true ) )
+ { close( infd ); set_retval( retval, 1 ); continue; }
+ }
+ }
+
+ if( one_to_one && !check_tty_out( program_mode ) )
+ { set_retval( retval, 1 ); return retval; } // don't delete a tty
+
+ if( to_file && outfd < 0 ) // open outfd after checking infd
+ {
+ if( program_mode == m_compress ) set_c_outname( default_output_filename,
+ filenames_given, false );
+ else output_filename = default_output_filename;
+ if( !open_outstream( force, false ) || !check_tty_out( program_mode ) )
+ return 1; // check tty only once and don't try to delete a tty
+ }
+
+ const struct stat * const in_statsp =
+ ( input_filename.size() && one_to_one ) ? &in_stats : 0;
+ const bool infd_isreg = input_filename.size() && S_ISREG( in_stats.st_mode );
+ const unsigned long long cfile_size =
+ infd_isreg ? ( in_stats.st_size + 99 ) / 100 : 0;
+ int tmp;
+ if( program_mode == m_compress )
+ tmp = compress( cfile_size, data_size, encoder_options.dictionary_size,
+ encoder_options.match_len_limit, num_workers,
+ infd, outfd, pp, debug_level );
+ else
+ tmp = decompress( cfile_size, num_workers, infd, outfd, cl_opts, pp,
+ debug_level, in_slots, out_slots, infd_isreg,
+ one_to_one );
+ if( close( infd ) != 0 )
+ { show_file_error( pp.name(), "Error closing input file", errno );
+ set_retval( tmp, 1 ); }
+ set_retval( retval, tmp );
+ if( tmp )
+ { if( program_mode != m_test ) cleanup_and_fail( retval );
+ else ++failed_tests; }
+
+ if( delete_output_on_interrupt && one_to_one )
+ close_and_set_permissions( in_statsp );
+ if( input_filename.size() && !keep_input_files && one_to_one )
+ std::remove( input_filename.c_str() );
+ }
+ if( delete_output_on_interrupt ) // -o
+ close_and_set_permissions( ( retval == 0 && !stdin_used &&
+ filenames_given && filenames.size() == 1 ) ? &in_stats : 0 );
+ else if( outfd >= 0 && close( outfd ) != 0 ) // -c
+ {
+ show_error( "Error closing stdout", errno );
+ set_retval( retval, 1 );
+ }
+ if( failed_tests > 0 && verbosity >= 1 && filenames.size() > 1 )
+ std::fprintf( stderr, "%s: warning: %d %s failed the test.\n",
+ program_name, failed_tests,
+ ( failed_tests == 1 ) ? "file" : "files" );
+ return retval;
+ }
diff --git a/testsuite/check.sh b/testsuite/check.sh
new file mode 100755
index 0000000..7ec899e
--- /dev/null
+++ b/testsuite/check.sh
@@ -0,0 +1,447 @@
+#! /bin/sh
+# check script for Plzip - Massively parallel implementation of lzip
+# Copyright (C) 2009-2024 Antonio Diaz Diaz.
+#
+# This script is free software: you have unlimited permission
+# to copy, distribute, and modify it.
+
+LC_ALL=C
+export LC_ALL
+objdir=`pwd`
+testdir=`cd "$1" ; pwd`
+LZIP="${objdir}"/plzip
+framework_failure() { echo "failure in testing framework" ; exit 1 ; }
+
+if [ ! -f "${LZIP}" ] || [ ! -x "${LZIP}" ] ; then
+ echo "${LZIP}: cannot execute"
+ exit 1
+fi
+
+[ -e "${LZIP}" ] 2> /dev/null ||
+ {
+ echo "$0: a POSIX shell is required to run the tests"
+ echo "Try bash -c \"$0 $1 $2\""
+ exit 1
+ }
+
+if [ -d tmp ] ; then rm -rf tmp ; fi
+mkdir tmp
+cd "${objdir}"/tmp || framework_failure
+
+cat "${testdir}"/test.txt > in || framework_failure
+in_lz="${testdir}"/test.txt.lz
+in_em="${testdir}"/test_em.txt.lz
+fox_lz="${testdir}"/fox.lz
+fail=0
+lwarn8=0
+lwarn10=0
+test_failed() { fail=1 ; printf " $1" ; [ -z "$2" ] || printf "($2)" ; }
+lzlib_1_8() { [ ${lwarn8} = 0 ] &&
+ printf "\nwarning: header truncation detection requires lzlib 1.8 or newer"
+ lwarn8=1 ; }
+lzlib_1_10() { [ ${lwarn10} = 0 ] &&
+ printf "\nwarning: header HD=3 detection requires lzlib 1.10 or newer"
+ lwarn10=1 ; }
+
+"${LZIP}" --check-lib # just print warning
+[ $? != 2 ] || test_failed $LINENO # unless bad lzlib.h
+
+printf "testing plzip-%s..." "$2"
+
+"${LZIP}" -fkqm4 in
+[ $? = 1 ] || test_failed $LINENO
+[ ! -e in.lz ] || test_failed $LINENO
+"${LZIP}" -fkqm274 in
+[ $? = 1 ] || test_failed $LINENO
+[ ! -e in.lz ] || test_failed $LINENO
+for i in bad_size -1 0 4095 513MiB 1G 1T 1P 1E 1Z 1Y 10KB ; do
+ "${LZIP}" -fkqs $i in
+ [ $? = 1 ] || test_failed $LINENO $i
+ [ ! -e in.lz ] || test_failed $LINENO $i
+done
+"${LZIP}" -lq in
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -tq in
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -tq < in
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -cdq in
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -cdq < in
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -dq -o in < "${in_lz}"
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -dq -o in "${in_lz}"
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -dq -o out nx_file.lz
+[ $? = 1 ] || test_failed $LINENO
+[ ! -e out ] || test_failed $LINENO
+"${LZIP}" -q -o out.lz nx_file
+[ $? = 1 ] || test_failed $LINENO
+[ ! -e out.lz ] || test_failed $LINENO
+# these are for code coverage
+"${LZIP}" -lt "${in_lz}" 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -cdl "${in_lz}" 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -cdt "${in_lz}" 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -t -- nx_file.lz 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -t "" < /dev/null 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" --help > /dev/null || test_failed $LINENO
+"${LZIP}" -n1 -V > /dev/null || test_failed $LINENO
+"${LZIP}" -m 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -z 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" --bad_option 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" --t 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" --test=2 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" --output= 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" --output 2> /dev/null
+[ $? = 1 ] || test_failed $LINENO
+printf "LZIP\001-.............................." | "${LZIP}" -t 2> /dev/null
+printf "LZIP\002-.............................." | "${LZIP}" -t 2> /dev/null
+printf "LZIP\001+.............................." | "${LZIP}" -t 2> /dev/null
+
+printf "\ntesting decompression..."
+
+for i in "${in_lz}" "${in_em}" ; do
+ "${LZIP}" -lq "$i" || test_failed $LINENO "$i"
+ "${LZIP}" -t "$i" || test_failed $LINENO "$i"
+ "${LZIP}" -d "$i" -o out || test_failed $LINENO "$i"
+ cmp in out || test_failed $LINENO "$i"
+ "${LZIP}" -cd "$i" > out || test_failed $LINENO "$i"
+ cmp in out || test_failed $LINENO "$i"
+ "${LZIP}" -d "$i" -o - > out || test_failed $LINENO "$i"
+ cmp in out || test_failed $LINENO "$i"
+ "${LZIP}" -d < "$i" > out || test_failed $LINENO "$i"
+ cmp in out || test_failed $LINENO "$i"
+ rm -f out || framework_failure
+done
+
+lines=`"${LZIP}" -tvv "${in_em}" 2>&1 | wc -l` || test_failed $LINENO
+[ "${lines}" -eq 1 ] || test_failed $LINENO "${lines}"
+
+lines=`"${LZIP}" -lvv "${in_em}" | wc -l` || test_failed $LINENO
+[ "${lines}" -eq 11 ] || test_failed $LINENO "${lines}"
+
+cat "${in_lz}" > out.lz || framework_failure
+"${LZIP}" -dk out.lz || test_failed $LINENO
+cmp in out || test_failed $LINENO
+rm -f out || framework_failure
+"${LZIP}" -cd "${fox_lz}" > fox || test_failed $LINENO
+cat fox > copy || framework_failure
+cat "${in_lz}" > copy.lz || framework_failure
+"${LZIP}" -d copy.lz out.lz 2> /dev/null # skip copy, decompress out
+[ $? = 1 ] || test_failed $LINENO
+[ ! -e out.lz ] || test_failed $LINENO
+cmp fox copy || test_failed $LINENO
+cmp in out || test_failed $LINENO
+"${LZIP}" -df copy.lz || test_failed $LINENO
+[ ! -e copy.lz ] || test_failed $LINENO
+cmp in copy || test_failed $LINENO
+rm -f copy out || framework_failure
+
+printf "to be overwritten" > out || framework_failure
+"${LZIP}" -df -o out < "${in_lz}" || test_failed $LINENO
+cmp in out || test_failed $LINENO
+rm -f out || framework_failure
+"${LZIP}" -d -o ./- "${in_lz}" || test_failed $LINENO
+cmp in ./- || test_failed $LINENO
+rm -f ./- || framework_failure
+"${LZIP}" -d -o ./- < "${in_lz}" || test_failed $LINENO
+cmp in ./- || test_failed $LINENO
+rm -f ./- || framework_failure
+
+cat "${in_lz}" > anyothername || framework_failure
+"${LZIP}" -dv - anyothername - < "${in_lz}" > out 2> /dev/null ||
+ test_failed $LINENO
+cmp in out || test_failed $LINENO
+cmp in anyothername.out || test_failed $LINENO
+rm -f out anyothername.out || framework_failure
+
+"${LZIP}" -lq in "${in_lz}"
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -lq nx_file.lz "${in_lz}"
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -tq in "${in_lz}"
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -tq nx_file.lz "${in_lz}"
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -cdq in "${in_lz}" > out
+[ $? = 2 ] || test_failed $LINENO
+cat out in | cmp in - || test_failed $LINENO # out must be empty
+"${LZIP}" -cdq nx_file.lz "${in_lz}" > out # skip nx_file, decompress in
+[ $? = 1 ] || test_failed $LINENO
+cmp in out || test_failed $LINENO
+rm -f out || framework_failure
+cat "${in_lz}" > out.lz || framework_failure
+for i in 1 2 3 4 5 6 7 ; do
+ printf "g" >> out.lz || framework_failure
+ "${LZIP}" -alvv out.lz "${in_lz}" > /dev/null 2>&1
+ [ $? = 2 ] || test_failed $LINENO $i
+ "${LZIP}" -atvvvv out.lz "${in_lz}" 2> /dev/null
+ [ $? = 2 ] || test_failed $LINENO $i
+done
+"${LZIP}" -dq in out.lz
+[ $? = 2 ] || test_failed $LINENO
+[ -e out.lz ] || test_failed $LINENO
+[ ! -e out ] || test_failed $LINENO
+[ ! -e in.out ] || test_failed $LINENO
+"${LZIP}" -dq nx_file.lz out.lz
+[ $? = 1 ] || test_failed $LINENO
+[ ! -e out.lz ] || test_failed $LINENO
+[ ! -e nx_file ] || test_failed $LINENO
+cmp in out || test_failed $LINENO
+rm -f out || framework_failure
+
+cat in in > in2 || framework_failure
+"${LZIP}" -lq "${in_lz}" "${in_lz}" || test_failed $LINENO
+"${LZIP}" -t "${in_lz}" "${in_lz}" || test_failed $LINENO
+"${LZIP}" -cd "${in_lz}" "${in_lz}" -o out > out2 || test_failed $LINENO
+[ ! -e out ] || test_failed $LINENO # override -o
+cmp in2 out2 || test_failed $LINENO
+rm -f out2 || framework_failure
+"${LZIP}" -d "${in_lz}" "${in_lz}" -o out2 || test_failed $LINENO
+cmp in2 out2 || test_failed $LINENO
+rm -f out2 || framework_failure
+
+cat "${in_lz}" "${in_lz}" > out2.lz || framework_failure
+printf "\ngarbage" >> out2.lz || framework_failure
+"${LZIP}" -tvvvv out2.lz 2> /dev/null || test_failed $LINENO
+"${LZIP}" -alq out2.lz
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -atq out2.lz
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -atq < out2.lz
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -adkq out2.lz
+[ $? = 2 ] || test_failed $LINENO
+[ ! -e out2 ] || test_failed $LINENO
+"${LZIP}" -adkq -o out2 < out2.lz
+[ $? = 2 ] || test_failed $LINENO
+[ ! -e out2 ] || test_failed $LINENO
+printf "to be overwritten" > out2 || framework_failure
+"${LZIP}" -df out2.lz || test_failed $LINENO
+cmp in2 out2 || test_failed $LINENO
+rm -f out2 || framework_failure
+
+"${LZIP}" -d "${fox_lz}" -o a/b/c/fox || test_failed $LINENO
+cmp fox a/b/c/fox || test_failed $LINENO
+rm -rf a || framework_failure
+"${LZIP}" -d -o a/b/c/fox < "${fox_lz}" || test_failed $LINENO
+cmp fox a/b/c/fox || test_failed $LINENO
+rm -rf a || framework_failure
+"${LZIP}" -dq "${fox_lz}" -o a/b/c/
+[ $? = 1 ] || test_failed $LINENO
+[ ! -e a ] || test_failed $LINENO
+
+printf "\ntesting compression..."
+
+"${LZIP}" -c -0 in in in -o out3.lz > copy2.lz || test_failed $LINENO
+[ ! -e out3.lz ] || test_failed $LINENO # override -o
+"${LZIP}" -0f in in --output=copy2.lz || test_failed $LINENO
+"${LZIP}" -d copy2.lz -o out2 || test_failed $LINENO
+[ -e copy2.lz ] || test_failed $LINENO
+cmp in2 out2 || test_failed $LINENO
+rm -f in2 out2 copy2.lz || framework_failure
+
+"${LZIP}" -cf "${in_lz}" > lzlz 2> /dev/null # /dev/null is a tty on OS/2
+[ $? = 1 ] || test_failed $LINENO
+"${LZIP}" -Fvvm36 -o - "${in_lz}" > lzlz 2> /dev/null || test_failed $LINENO
+"${LZIP}" -cd lzlz | "${LZIP}" -d > out || test_failed $LINENO
+cmp in out || test_failed $LINENO
+rm -f lzlz out || framework_failure
+
+"${LZIP}" -0 -o ./- in || test_failed $LINENO
+"${LZIP}" -cd ./- | cmp in - || test_failed $LINENO
+rm -f ./- || framework_failure
+"${LZIP}" -0 -o ./- < in || test_failed $LINENO # add .lz
+[ ! -e ./- ] || test_failed $LINENO
+"${LZIP}" -cd -- -.lz | cmp in - || test_failed $LINENO
+rm -f ./-.lz || framework_failure
+
+for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do
+ "${LZIP}" -k -$i in || test_failed $LINENO $i
+ mv in.lz out.lz || test_failed $LINENO $i
+ printf "garbage" >> out.lz || framework_failure
+ "${LZIP}" -df out.lz || test_failed $LINENO $i
+ cmp in out || test_failed $LINENO $i
+
+ "${LZIP}" -$i in -c > out || test_failed $LINENO $i
+ "${LZIP}" -$i in -o o_out || test_failed $LINENO $i # don't add .lz
+ [ ! -e o_out.lz ] || test_failed $LINENO
+ cmp out o_out || test_failed $LINENO $i
+ rm -f o_out || framework_failure
+ printf "g" >> out || framework_failure
+ "${LZIP}" -cd out > copy || test_failed $LINENO $i
+ cmp in copy || test_failed $LINENO $i
+
+ "${LZIP}" -$i < in > out || test_failed $LINENO $i
+ "${LZIP}" -d < out > copy || test_failed $LINENO $i
+ cmp in copy || test_failed $LINENO $i
+
+ rm -f out || framework_failure
+ printf "to be overwritten" > out.lz || framework_failure
+ "${LZIP}" -f -$i -o out < in || test_failed $LINENO $i # add .lz
+ [ ! -e out ] || test_failed $LINENO
+ "${LZIP}" -df -o copy < out.lz || test_failed $LINENO $i
+ cmp in copy || test_failed $LINENO $i
+done
+rm -f copy out.lz || framework_failure
+
+cat in in in in > in4 || framework_failure
+for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ; do
+ "${LZIP}" -s4Ki -B8Ki -n$i < in4 > out4.lz || test_failed $LINENO $i
+ printf "g" >> out4.lz || framework_failure
+ "${LZIP}" -d -n$i < out4.lz > out4 || test_failed $LINENO $i
+ cmp in4 out4 || test_failed $LINENO $i
+ "${LZIP}" -d --in-slots=$i < out4.lz > out4 || test_failed $LINENO $i
+ cmp in4 out4 || test_failed $LINENO $i
+ "${LZIP}" -d --out-slots=$i < out4.lz > out4 || test_failed $LINENO $i
+ cmp in4 out4 || test_failed $LINENO $i
+
+ "${LZIP}" -c -s4Ki -B8Ki -n$i in4 > out4.lz || test_failed $LINENO $i
+ printf "g" >> out4.lz || framework_failure
+ "${LZIP}" -cd -n$i out4.lz > out4 || test_failed $LINENO $i
+ cmp in4 out4 || test_failed $LINENO $i
+ "${LZIP}" -cd --out-slots=$i out4.lz > out4 || test_failed $LINENO $i
+ cmp in4 out4 || test_failed $LINENO $i
+ rm -f out4 || framework_failure
+ "${LZIP}" -d -n$i out4.lz || test_failed $LINENO $i
+ cmp in4 out4 || test_failed $LINENO $i
+done
+rm -f in4 out4 || framework_failure
+
+cat in in in in in in in in | "${LZIP}" -1s4Ki | "${LZIP}" -t ||
+ test_failed $LINENO
+
+"${LZIP}" fox -o a/b/c/fox.lz || test_failed $LINENO
+cmp "${fox_lz}" a/b/c/fox.lz || test_failed $LINENO
+rm -rf a || framework_failure
+"${LZIP}" -o a/b/c/fox.lz < fox || test_failed $LINENO
+cmp "${fox_lz}" a/b/c/fox.lz || test_failed $LINENO
+rm -rf a fox || framework_failure
+
+printf "\ntesting bad input..."
+
+headers='LZIp LZiP LZip LzIP LzIp LziP lZIP lZIp lZiP lzIP'
+body='\001\014\000\203\377\373\377\377\300\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000$\000\000\000\000\000\000\000'
+cat "${in_lz}" > int.lz || framework_failure
+printf "LZIP${body}" >> int.lz || framework_failure
+if "${LZIP}" -tq int.lz ; then
+ for header in ${headers} ; do
+ printf "${header}${body}" > int.lz || framework_failure
+ "${LZIP}" -lq int.lz # first member
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq < int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -cdq int.lz > /dev/null
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -lq --loose-trailing int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq --loose-trailing int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq --loose-trailing < int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -cdq --loose-trailing int.lz > /dev/null
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ cat "${in_lz}" > int.lz || framework_failure
+ printf "${header}${body}" >> int.lz || framework_failure
+ "${LZIP}" -lq int.lz # trailing data
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq < int.lz
+ [ $? = 2 ] || lzlib_1_10 # requires lzlib 1.10
+ "${LZIP}" -cdq int.lz > /dev/null
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -lq --loose-trailing int.lz ||
+ test_failed $LINENO ${header}
+ "${LZIP}" -t --loose-trailing int.lz ||
+ test_failed $LINENO ${header}
+ "${LZIP}" -t --loose-trailing < int.lz ||
+ test_failed $LINENO ${header}
+ "${LZIP}" -cd --loose-trailing int.lz > /dev/null ||
+ test_failed $LINENO ${header}
+ "${LZIP}" -lq --loose-trailing --trailing-error int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq --loose-trailing --trailing-error int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -tq --loose-trailing --trailing-error < int.lz
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ "${LZIP}" -cdq --loose-trailing --trailing-error int.lz > /dev/null
+ [ $? = 2 ] || test_failed $LINENO ${header}
+ done
+else
+ printf "\nwarning: skipping header test: 'printf' does not work on your system."
+fi
+rm -f int.lz || framework_failure
+
+for i in fox_v2.lz fox_s11.lz fox_de20.lz \
+ fox_bcrc.lz fox_crc0.lz fox_das46.lz fox_mes81.lz ; do
+ "${LZIP}" -tq "${testdir}"/$i
+ [ $? = 2 ] || test_failed $LINENO $i
+done
+
+cat "${in_lz}" "${in_lz}" > in2.lz || framework_failure
+cat "${in_lz}" "${in_lz}" "${in_lz}" > in3.lz || framework_failure
+if dd if=in3.lz of=trunc.lz bs=14752 count=1 2> /dev/null &&
+ [ -e trunc.lz ] && cmp in2.lz trunc.lz > /dev/null 2>&1 ; then
+ for i in 6 20 14734 14753 14754 14755 14756 14757 14758 ; do
+ dd if=in3.lz of=trunc.lz bs=$i count=1 2> /dev/null
+ "${LZIP}" -lq trunc.lz
+ [ $? = 2 ] || test_failed $LINENO $i
+ "${LZIP}" -tq trunc.lz
+ [ $? = 2 ] || test_failed $LINENO $i
+ "${LZIP}" -tq < trunc.lz
+ [ $? = 2 ] || lzlib_1_8 # requires lzlib 1.8
+ "${LZIP}" -cdq trunc.lz > /dev/null
+ [ $? = 2 ] || test_failed $LINENO $i
+ "${LZIP}" -dq < trunc.lz > /dev/null
+ [ $? = 2 ] || lzlib_1_8 # requires lzlib 1.8
+ done
+else
+ printf "\nwarning: skipping truncation test: 'dd' does not work on your system."
+fi
+rm -f in2.lz in3.lz trunc.lz || framework_failure
+
+cat "${in_lz}" > ingin.lz || framework_failure
+printf "g" >> ingin.lz || framework_failure
+cat "${in_lz}" >> ingin.lz || framework_failure
+"${LZIP}" -lq ingin.lz
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -atq ingin.lz
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -atq < ingin.lz
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -acdq ingin.lz > /dev/null
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -adq < ingin.lz > /dev/null
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -tq ingin.lz
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -t < ingin.lz || test_failed $LINENO
+"${LZIP}" -cdq ingin.lz > out
+[ $? = 2 ] || test_failed $LINENO
+"${LZIP}" -d < ingin.lz > out || test_failed $LINENO
+cmp in out || test_failed $LINENO
+rm -f out ingin.lz || framework_failure
+
+echo
+if [ ${fail} = 0 ] ; then
+ echo "tests completed successfully."
+ cd "${objdir}" && rm -r tmp
+else
+ echo "tests failed."
+fi
+exit ${fail}
diff --git a/testsuite/fox.lz b/testsuite/fox.lz
new file mode 100644
index 0000000..509da82
--- /dev/null
+++ b/testsuite/fox.lz
Binary files differ
diff --git a/testsuite/fox_bcrc.lz b/testsuite/fox_bcrc.lz
new file mode 100644
index 0000000..8f6a7c4
--- /dev/null
+++ b/testsuite/fox_bcrc.lz
Binary files differ
diff --git a/testsuite/fox_crc0.lz b/testsuite/fox_crc0.lz
new file mode 100644
index 0000000..1abe926
--- /dev/null
+++ b/testsuite/fox_crc0.lz
Binary files differ
diff --git a/testsuite/fox_das46.lz b/testsuite/fox_das46.lz
new file mode 100644
index 0000000..43ed9f9
--- /dev/null
+++ b/testsuite/fox_das46.lz
Binary files differ
diff --git a/testsuite/fox_de20.lz b/testsuite/fox_de20.lz
new file mode 100644
index 0000000..10949d8
--- /dev/null
+++ b/testsuite/fox_de20.lz
Binary files differ
diff --git a/testsuite/fox_mes81.lz b/testsuite/fox_mes81.lz
new file mode 100644
index 0000000..d50ef2e
--- /dev/null
+++ b/testsuite/fox_mes81.lz
Binary files differ
diff --git a/testsuite/fox_s11.lz b/testsuite/fox_s11.lz
new file mode 100644
index 0000000..dca909c
--- /dev/null
+++ b/testsuite/fox_s11.lz
Binary files differ
diff --git a/testsuite/fox_v2.lz b/testsuite/fox_v2.lz
new file mode 100644
index 0000000..8620981
--- /dev/null
+++ b/testsuite/fox_v2.lz
Binary files differ
diff --git a/testsuite/test.txt b/testsuite/test.txt
new file mode 100644
index 0000000..9196a3a
--- /dev/null
+++ b/testsuite/test.txt
@@ -0,0 +1,676 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) <year> <name of author>
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) <year> <name of author>
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
diff --git a/testsuite/test.txt.lz b/testsuite/test.txt.lz
new file mode 100644
index 0000000..22cea6e
--- /dev/null
+++ b/testsuite/test.txt.lz
Binary files differ
diff --git a/testsuite/test_em.txt.lz b/testsuite/test_em.txt.lz
new file mode 100644
index 0000000..7e96250
--- /dev/null
+++ b/testsuite/test_em.txt.lz
Binary files differ