diff options
-rw-r--r-- | ChangeLog | 6 | ||||
-rw-r--r-- | INSTALL | 6 | ||||
-rw-r--r-- | NEWS | 16 | ||||
-rw-r--r-- | README | 49 | ||||
-rw-r--r-- | compress.cc | 54 | ||||
-rwxr-xr-x | configure | 18 | ||||
-rw-r--r-- | dec_stdout.cc | 22 | ||||
-rw-r--r-- | dec_stream.cc | 71 | ||||
-rw-r--r-- | decompress.cc | 22 | ||||
-rw-r--r-- | doc/plzip.1 | 4 | ||||
-rw-r--r-- | doc/plzip.info | 105 | ||||
-rw-r--r-- | doc/plzip.texinfo | 80 | ||||
-rw-r--r-- | file_index.cc | 20 | ||||
-rw-r--r-- | file_index.h | 2 | ||||
-rw-r--r-- | lzip.h | 7 | ||||
-rw-r--r-- | main.cc | 106 | ||||
-rwxr-xr-x | testsuite/check.sh | 45 |
17 files changed, 356 insertions, 277 deletions
@@ -1,3 +1,9 @@ +2013-07-20 Antonio Diaz Diaz <antonio@gnu.org> + + * Version 1.1-pre1 released. + * Show progress of compression at verbosity level 2 (-vv). + * SIGUSR1 and SIGUSR2 are no more used to signal a fatal error. + 2013-05-29 Antonio Diaz Diaz <antonio@gnu.org> * Version 1.0 released. @@ -1,7 +1,7 @@ Requirements ------------ You will need a C++ compiler and the lzlib compression library installed. -I use gcc 4.8.0 and 3.3.6, but the code should compile with any +I use gcc 4.8.1 and 3.3.6, but the code should compile with any standards compliant compiler. Lzlib must be version 1.0 or newer. Gcc is available at http://gcc.gnu.org. @@ -12,9 +12,9 @@ Procedure --------- 1. Unpack the archive if you have not done so already: - lzip -cd plzip[version].tar.lz | tar -xf - + tar -xf plzip[version].tar.lz or - gzip -cd plzip[version].tar.gz | tar -xf - + lzip -cd plzip[version].tar.lz | tar -xf - This creates the directory ./plzip[version] containing the source from the main archive. @@ -1,15 +1,5 @@ -Changes in version 1.0: +Changes in version 1.1: -Scalability of compression (max number of useful worker threads) has -been increased. +Plzip now shows the progress of compression at verbosity level 2 (-vv). -Scalability when decompressing from/to regular files has been increased. - -The number of worker threads is now limited to the number of members in -the input file when decompressing from a regular file. - -"configure" now accepts options with a separate argument. - -The target "install-as-lzip" has been added to the Makefile. - -The target "install-bin" has been added to the Makefile. +Signals "SIGUSR1" and "SIGUSR2" are no more used to signal a fatal error. @@ -1,21 +1,40 @@ Description Plzip is a massively parallel (multi-threaded), lossless data compressor -based on the lzlib compression library, with very safe integrity -checking and a user interface similar to the one of bzip2, gzip or lzip. +based on the lzlib compression library, with a user interface similar to +the one of lzip, bzip2 or gzip. -Plzip is intended for faster compression/decompression of big files on -multiprocessor machines, which makes it specially well suited for -distribution of big software files and large scale data archiving. On -files big enough (several GB), plzip can use hundreds of processors. - -Plzip uses the lzip file format; the files produced by plzip are fully -compatible with lzip-1.4 or newer, and can be rescued with lziprecover. +Plzip can compress/decompress large files on multiprocessor machines +much faster than lzip, at the cost of a slightly reduced compression +ratio. On files large enough (several GB), plzip can use hundreds of +processors. On files of only a few MB it is better to use lzip. Plzip uses the same well-defined exit status values used by lzip and bzip2, which makes it safer when used in pipes or scripts than compressors returning ambiguous warning values, like gzip. +Plzip uses the lzip file format; the files produced by plzip are fully +compatible with lzip-1.4 or newer, and can be rescued with lziprecover. + +The lzip file format is designed for long-term data archiving and +provides very safe integrity checking. The member trailer stores the +32-bit CRC of the original data, the size of the original data and the +size of the member. These values, together with the value remaining in +the range decoder and the end-of-stream marker, provide a 4 factor +integrity checking which guarantees that the decompressed version of the +data is identical to the original. This guards against corruption of the +compressed data, and against undetected bugs in plzip (hopefully very +unlikely). The chances of data corruption going undetected are +microscopic. Be aware, though, that the check occurs upon decompression, +so it can only tell you that something is wrong. It can't help you +recover the original uncompressed data. + +If you ever need to recover data from a damaged lzip file, try the +lziprecover program. Lziprecover makes lzip files resistant to bit-flip +(one of the most common forms of data corruption), and provides data +recovery capabilities, including error-checked merging of damaged copies +of a file. + Plzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". Each compressed file has the same modification date, permissions, and, when possible, @@ -33,18 +52,6 @@ or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing of concatenated compressed files is also supported. -As a self-check for your protection, plzip stores in the member trailer -the 32-bit CRC of the original data, the size of the original data and -the size of the member. These values, together with the value remaining -in the range decoder and the end-of-stream marker, provide a very safe 4 -factor integrity checking which guarantees that the decompressed version -of the data is identical to the original. This guards against corruption -of the compressed data, and against undetected bugs in plzip (hopefully -very unlikely). The chances of data corruption going undetected are -microscopic. Be aware, though, that the check occurs upon decompression, -so it can only tell you that something is wrong. It can't help you -recover the original uncompressed data. - Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. diff --git a/compress.cc b/compress.cc index c4428ea..050fdc1 100644 --- a/compress.cc +++ b/compress.cc @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. @@ -80,61 +80,70 @@ int writeblock( const int fd, const uint8_t * const buf, const int size ) void xinit( pthread_mutex_t * const mutex ) { const int errcode = pthread_mutex_init( mutex, 0 ); - if( errcode ) { show_error( "pthread_mutex_init", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_mutex_init", errcode ); cleanup_and_fail(); } } void xinit( pthread_cond_t * const cond ) { const int errcode = pthread_cond_init( cond, 0 ); - if( errcode ) { show_error( "pthread_cond_init", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_cond_init", errcode ); cleanup_and_fail(); } } void xdestroy( pthread_mutex_t * const mutex ) { const int errcode = pthread_mutex_destroy( mutex ); - if( errcode ) { show_error( "pthread_mutex_destroy", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_mutex_destroy", errcode ); cleanup_and_fail(); } } void xdestroy( pthread_cond_t * const cond ) { const int errcode = pthread_cond_destroy( cond ); - if( errcode ) { show_error( "pthread_cond_destroy", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_cond_destroy", errcode ); cleanup_and_fail(); } } void xlock( pthread_mutex_t * const mutex ) { const int errcode = pthread_mutex_lock( mutex ); - if( errcode ) { show_error( "pthread_mutex_lock", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_mutex_lock", errcode ); cleanup_and_fail(); } } void xunlock( pthread_mutex_t * const mutex ) { const int errcode = pthread_mutex_unlock( mutex ); - if( errcode ) { show_error( "pthread_mutex_unlock", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_mutex_unlock", errcode ); cleanup_and_fail(); } } void xwait( pthread_cond_t * const cond, pthread_mutex_t * const mutex ) { const int errcode = pthread_cond_wait( cond, mutex ); - if( errcode ) { show_error( "pthread_cond_wait", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_cond_wait", errcode ); cleanup_and_fail(); } } void xsignal( pthread_cond_t * const cond ) { const int errcode = pthread_cond_signal( cond ); - if( errcode ) { show_error( "pthread_cond_signal", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_cond_signal", errcode ); cleanup_and_fail(); } } void xbroadcast( pthread_cond_t * const cond ) { const int errcode = pthread_cond_broadcast( cond ); - if( errcode ) { show_error( "pthread_cond_broadcast", errcode ); fatal(); } + if( errcode ) + { show_error( "pthread_cond_broadcast", errcode ); cleanup_and_fail(); } } @@ -317,10 +326,10 @@ extern "C" void * csplitter( void * arg ) for( bool first_post = true; ; first_post = false ) { uint8_t * const data = new( std::nothrow ) uint8_t[data_size]; - if( !data ) { pp( mem_msg ); fatal(); } + if( !data ) { pp( mem_msg ); cleanup_and_fail(); } const int size = readblock( infd, data, data_size ); if( size != data_size && errno ) - { pp(); show_error( "Read error", errno ); fatal(); } + { pp(); show_error( "Read error", errno ); cleanup_and_fail(); } if( size > 0 || first_post ) // first packet may be empty { @@ -365,7 +374,7 @@ extern "C" void * cworker( void * arg ) const int max_compr_size = 42 + packet->size + ( ( packet->size + 7 ) / 8 ); uint8_t * const new_data = new( std::nothrow ) uint8_t[max_compr_size]; - if( !new_data ) { pp( mem_msg ); fatal(); } + if( !new_data ) { pp( mem_msg ); cleanup_and_fail(); } const int dict_size = std::max( LZ_min_dictionary_size(), std::min( dictionary_size, packet->size ) ); LZ_Encoder * const encoder = @@ -376,7 +385,7 @@ extern "C" void * cworker( void * arg ) pp( mem_msg ); else internal_error( "invalid argument to encoder" ); - fatal(); + cleanup_and_fail(); } int written = 0; @@ -403,7 +412,7 @@ extern "C" void * cworker( void * arg ) if( verbosity >= 0 ) std::fprintf( stderr, "LZ_compress_read error: %s.\n", LZ_strerror( LZ_compress_errno( encoder ) ) ); - fatal(); + cleanup_and_fail(); } new_pos += rd; if( new_pos > max_compr_size ) @@ -412,8 +421,9 @@ extern "C" void * cworker( void * arg ) } if( LZ_compress_close( encoder ) < 0 ) - { pp( "LZ_compress_close failed" ); fatal(); } + { pp( "LZ_compress_close failed" ); cleanup_and_fail(); } + if( verbosity >= 2 && packet->size > 0 ) show_progress( packet->size ); packet->data = new_data; packet->size = new_pos; courier.collect_packet( packet ); @@ -441,7 +451,7 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd ) { const int wr = writeblock( outfd, opacket->data, opacket->size ); if( wr != opacket->size ) - { pp(); show_error( "Write error", errno ); fatal(); } + { pp(); show_error( "Write error", errno ); cleanup_and_fail(); } } delete[] opacket->data; delete opacket; @@ -475,7 +485,7 @@ int compress( const int data_size, const int dictionary_size, pthread_t splitter_thread; int errcode = pthread_create( &splitter_thread, 0, csplitter, &splitter_arg ); if( errcode ) - { show_error( "Can't create splitter thread", errcode ); fatal(); } + { show_error( "Can't create splitter thread", errcode ); cleanup_and_fail(); } Worker_arg worker_arg; worker_arg.courier = &courier; @@ -484,12 +494,12 @@ int compress( const int data_size, const int dictionary_size, worker_arg.match_len_limit = match_len_limit; pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; - if( !worker_threads ) { pp( mem_msg ); fatal(); } + if( !worker_threads ) { pp( mem_msg ); cleanup_and_fail(); } for( int i = 0; i < num_workers; ++i ) { errcode = pthread_create( worker_threads + i, 0, cworker, &worker_arg ); if( errcode ) - { show_error( "Can't create worker threads", errcode ); fatal(); } + { show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); } } muxer( courier, pp, outfd ); @@ -498,13 +508,13 @@ int compress( const int data_size, const int dictionary_size, { errcode = pthread_join( worker_threads[i], 0 ); if( errcode ) - { show_error( "Can't join worker threads", errcode ); fatal(); } + { show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } } delete[] worker_threads; errcode = pthread_join( splitter_thread, 0 ); if( errcode ) - { show_error( "Can't join splitter thread", errcode ); fatal(); } + { show_error( "Can't join splitter thread", errcode ); cleanup_and_fail(); } if( verbosity >= 1 ) { @@ -1,14 +1,14 @@ #! /bin/sh -# configure script for Plzip - A parallel compressor compatible with lzip +# configure script for Plzip - Parallel compressor compatible with lzip # Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. # # This configure script is free software: you have unlimited permission # to copy, distribute and modify it. pkgname=plzip -pkgversion=1.0 +pkgversion=1.1-pre1 progname=plzip -srctrigger=doc/plzip.texinfo +srctrigger=doc/${pkgname}.texinfo # clear some things potentially inherited from environment. LC_ALL=C @@ -100,14 +100,14 @@ while [ $# != 0 ] ; do *=* | *-*-*) ;; *) echo "configure: unrecognized option: '${option}'" 1>&2 - echo "Try 'configure --help' for more information." + echo "Try 'configure --help' for more information." 1>&2 exit 1 ;; esac # Check if the option took a separate argument if [ "${arg2}" = yes ] ; then if [ $# != 0 ] ; then args="${args} \"$1\"" ; shift - else echo "configure: Missing argument to \"${option}\"" 1>&2 + else echo "configure: Missing argument to '${option}'" 1>&2 exit 1 fi fi @@ -125,10 +125,8 @@ if [ -z "${srcdir}" ] ; then fi if [ ! -r "${srcdir}/${srctrigger}" ] ; then - exec 1>&2 - echo - echo "configure: Can't find sources in ${srcdir} ${srcdirtext}" - echo "configure: (At least ${srctrigger} is missing)." + echo "configure: Can't find sources in ${srcdir} ${srcdirtext}" 1>&2 + echo "configure: (At least ${srctrigger} is missing)." 1>&2 exit 1 fi @@ -166,7 +164,7 @@ echo "CXXFLAGS = ${CXXFLAGS}" echo "LDFLAGS = ${LDFLAGS}" rm -f Makefile cat > Makefile << EOF -# Makefile for Plzip - A parallel compressor compatible with lzip +# Makefile for Plzip - Parallel compressor compatible with lzip # Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. # This file was generated automatically by configure. Do not edit. # diff --git a/dec_stdout.cc b/dec_stdout.cc index 36be19b..fda4e9e 100644 --- a/dec_stdout.cc +++ b/dec_stdout.cc @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. @@ -171,7 +171,7 @@ extern "C" void * dworker_o( void * arg ) LZ_Decoder * const decoder = LZ_decompress_open(); if( !new_data || !ibuffer || !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) - { pp( "Not enough memory" ); fatal(); } + { pp( "Not enough memory" ); cleanup_and_fail(); } int new_pos = 0; for( int i = worker_id; i < file_index.members(); i += num_workers ) @@ -188,7 +188,7 @@ extern "C" void * dworker_o( void * arg ) if( size > 0 ) { if( preadblock( infd, ibuffer, size, member_pos ) != size ) - { pp(); show_error( "Read error", errno ); fatal(); } + { pp(); show_error( "Read error", errno ); cleanup_and_fail(); } member_pos += size; member_rest -= size; if( LZ_decompress_write( decoder, ibuffer, size ) != size ) @@ -201,7 +201,7 @@ extern "C" void * dworker_o( void * arg ) const int rd = LZ_decompress_read( decoder, new_data + new_pos, max_packet_size - new_pos ); if( rd < 0 ) - fatal( decompress_read_error( decoder, pp, worker_id ) ); + cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) ); new_pos += rd; if( new_pos > max_packet_size ) internal_error( "opacket size exceeded in worker" ); @@ -216,7 +216,7 @@ extern "C" void * dworker_o( void * arg ) courier.collect_packet( opacket, worker_id ); new_pos = 0; new_data = new( std::nothrow ) uint8_t[max_packet_size]; - if( !new_data ) { pp( "Not enough memory" ); fatal(); } + if( !new_data ) { pp( "Not enough memory" ); cleanup_and_fail(); } } if( LZ_decompress_finished( decoder ) == 1 ) { @@ -235,9 +235,9 @@ extern "C" void * dworker_o( void * arg ) delete[] ibuffer; delete[] new_data; if( LZ_decompress_member_position( decoder ) != 0 ) - { pp( "Error, some data remains in decoder" ); fatal(); } + { pp( "Error, some data remains in decoder" ); cleanup_and_fail(); } if( LZ_decompress_close( decoder ) < 0 ) - { pp( "LZ_decompress_close failed" ); fatal(); } + { pp( "LZ_decompress_close failed" ); cleanup_and_fail(); } courier.worker_finished(); return 0; } @@ -256,7 +256,7 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd ) { const int wr = writeblock( outfd, opacket->data, opacket->size ); if( wr != opacket->size ) - { pp(); show_error( "Write error", errno ); fatal(); } + { pp(); show_error( "Write error", errno ); cleanup_and_fail(); } } delete[] opacket->data; delete opacket; @@ -280,7 +280,7 @@ int dec_stdout( const int num_workers, const int infd, const int outfd, Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; if( !worker_args || !worker_threads ) - { pp( "Not enough memory" ); fatal(); } + { pp( "Not enough memory" ); cleanup_and_fail(); } for( int i = 0; i < num_workers; ++i ) { worker_args[i].file_index = &file_index; @@ -292,7 +292,7 @@ int dec_stdout( const int num_workers, const int infd, const int outfd, const int errcode = pthread_create( &worker_threads[i], 0, dworker_o, &worker_args[i] ); if( errcode ) - { show_error( "Can't create worker threads", errcode ); fatal(); } + { show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); } } muxer( courier, pp, outfd ); @@ -301,7 +301,7 @@ int dec_stdout( const int num_workers, const int infd, const int outfd, { const int errcode = pthread_join( worker_threads[i], 0 ); if( errcode ) - { show_error( "Can't join worker threads", errcode ); fatal(); } + { show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } } delete[] worker_threads; delete[] worker_args; diff --git a/dec_stream.cc b/dec_stream.cc index 91659da..64dcce3 100644 --- a/dec_stream.cc +++ b/dec_stream.cc @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. @@ -248,22 +248,31 @@ extern "C" void * dsplitter_s( void * arg ) Packet_courier & courier = *tmp.courier; const Pretty_print & pp = *tmp.pp; const int infd = tmp.infd; - const int hsize = 6; // header size - const int tsize = 20; // trailer size + const int hsize = File_header::size; + const int tsize = File_trailer::size; const int buffer_size = max_packet_size; const int base_buffer_size = tsize + buffer_size + hsize; uint8_t * const base_buffer = new( std::nothrow ) uint8_t[base_buffer_size]; - if( !base_buffer ) { pp( "Not enough memory" ); fatal(); } + if( !base_buffer ) { pp( "Not enough memory" ); cleanup_and_fail(); } uint8_t * const buffer = base_buffer + tsize; int size = readblock( infd, buffer, buffer_size + hsize ) - hsize; bool at_stream_end = ( size < buffer_size ); if( size != buffer_size && errno ) - { pp(); show_error( "Read error", errno ); fatal(); } - if( size <= tsize ) - { pp( "Error reading member header" ); fatal(); } - if( find_magic( buffer, 0, 4 ) != 0 ) - { pp( "Bad magic number (file not in lzip format)" ); fatal(); } + { pp(); show_error( "Read error", errno ); cleanup_and_fail(); } + if( size + hsize < min_member_size ) + { pp( "Input file is too short" ); cleanup_and_fail( 2 ); } + const File_header & header = *(File_header *)buffer; + if( !header.verify_magic() ) + { pp( "Bad magic number (file not in lzip format)" ); cleanup_and_fail( 2 ); } + if( !header.verify_version() ) + { + if( verbosity >= 0 ) + { pp(); + std::fprintf( stderr, "Version %d member format not supported.\n", + header.version() ); } + cleanup_and_fail( 2 ); + } unsigned long long partial_member_size = 0; while( true ) @@ -274,13 +283,21 @@ extern "C" void * dsplitter_s( void * arg ) newpos = find_magic( buffer, newpos, size + 4 - newpos ); if( newpos <= size ) { - unsigned long long member_size = 0; - for( int i = 1; i <= 8; ++i ) - { member_size <<= 8; member_size += base_buffer[tsize+newpos-i]; } + const File_trailer & trailer = *(File_trailer *)(buffer + newpos - tsize); + const unsigned long long member_size = trailer.member_size(); if( partial_member_size + newpos - pos == member_size ) { // header found + const File_header & header = *(File_header *)(buffer + newpos); + if( !header.verify_version() ) + { + if( verbosity >= 0 ) + { pp(); + std::fprintf( stderr, "Version %d member format not supported.\n", + header.version() ); } + cleanup_and_fail( 2 ); + } uint8_t * const data = new( std::nothrow ) uint8_t[newpos - pos]; - if( !data ) { pp( "Not enough memory" ); fatal(); } + if( !data ) { pp( "Not enough memory" ); cleanup_and_fail(); } std::memcpy( data, buffer + pos, newpos - pos ); courier.receive_packet( data, newpos - pos ); courier.receive_packet( 0, 0 ); // end of member token @@ -293,7 +310,7 @@ extern "C" void * dsplitter_s( void * arg ) if( at_stream_end ) { uint8_t * data = new( std::nothrow ) uint8_t[size + hsize - pos]; - if( !data ) { pp( "Not enough memory" ); fatal(); } + if( !data ) { pp( "Not enough memory" ); cleanup_and_fail(); } std::memcpy( data, buffer + pos, size + hsize - pos ); courier.receive_packet( data, size + hsize - pos ); courier.receive_packet( 0, 0 ); // end of member token @@ -303,7 +320,7 @@ extern "C" void * dsplitter_s( void * arg ) { partial_member_size += buffer_size - pos; uint8_t * data = new( std::nothrow ) uint8_t[buffer_size - pos]; - if( !data ) { pp( "Not enough memory" ); fatal(); } + if( !data ) { pp( "Not enough memory" ); cleanup_and_fail(); } std::memcpy( data, buffer + pos, buffer_size - pos ); courier.receive_packet( data, buffer_size - pos ); } @@ -311,7 +328,7 @@ extern "C" void * dsplitter_s( void * arg ) size = readblock( infd, buffer + hsize, buffer_size ); at_stream_end = ( size < buffer_size ); if( size != buffer_size && errno ) - { pp(); show_error( "Read error", errno ); fatal(); } + { pp(); show_error( "Read error", errno ); cleanup_and_fail(); } } delete[] base_buffer; courier.finish(); // no more packets to send @@ -339,7 +356,7 @@ extern "C" void * dworker_s( void * arg ) uint8_t * new_data = new( std::nothrow ) uint8_t[max_packet_size]; LZ_Decoder * const decoder = LZ_decompress_open(); if( !new_data || !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) - { pp( "Not enough memory" ); fatal(); } + { pp( "Not enough memory" ); cleanup_and_fail(); } int new_pos = 0; bool trailing_garbage_found = false; @@ -370,7 +387,7 @@ extern "C" void * dworker_s( void * arg ) if( LZ_decompress_errno( decoder ) == LZ_header_error ) trailing_garbage_found = true; else - fatal( decompress_read_error( decoder, pp, worker_id ) ); + cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) ); } else new_pos += rd; if( new_pos > max_packet_size ) @@ -386,7 +403,7 @@ extern "C" void * dworker_s( void * arg ) courier.collect_packet( opacket, worker_id ); new_pos = 0; new_data = new( std::nothrow ) uint8_t[max_packet_size]; - if( !new_data ) { pp( "Not enough memory" ); fatal(); } + if( !new_data ) { pp( "Not enough memory" ); cleanup_and_fail(); } } if( trailing_garbage_found || LZ_decompress_finished( decoder ) == 1 ) @@ -409,9 +426,9 @@ extern "C" void * dworker_s( void * arg ) delete[] new_data; if( LZ_decompress_member_position( decoder ) != 0 ) - { pp( "Error, some data remains in decoder" ); fatal(); } + { pp( "Error, some data remains in decoder" ); cleanup_and_fail(); } if( LZ_decompress_close( decoder ) < 0 ) - { pp( "LZ_decompress_close failed" ); fatal(); } + { pp( "LZ_decompress_close failed" ); cleanup_and_fail(); } return 0; } @@ -431,7 +448,7 @@ void muxer( Packet_courier & courier, const Pretty_print & pp, const int outfd ) { const int wr = writeblock( outfd, opacket->data, opacket->size ); if( wr != opacket->size ) - { pp(); show_error( "Write error", errno ); fatal(); } + { pp(); show_error( "Write error", errno ); cleanup_and_fail(); } } delete[] opacket->data; delete opacket; @@ -462,12 +479,12 @@ int dec_stream( const int num_workers, const int infd, const int outfd, pthread_t splitter_thread; int errcode = pthread_create( &splitter_thread, 0, dsplitter_s, &splitter_arg ); if( errcode ) - { show_error( "Can't create splitter thread", errcode ); fatal(); } + { show_error( "Can't create splitter thread", errcode ); cleanup_and_fail(); } Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; if( !worker_args || !worker_threads ) - { pp( "Not enough memory" ); fatal(); } + { pp( "Not enough memory" ); cleanup_and_fail(); } for( int i = 0; i < num_workers; ++i ) { worker_args[i].courier = &courier; @@ -475,7 +492,7 @@ int dec_stream( const int num_workers, const int infd, const int outfd, worker_args[i].worker_id = i; errcode = pthread_create( &worker_threads[i], 0, dworker_s, &worker_args[i] ); if( errcode ) - { show_error( "Can't create worker threads", errcode ); fatal(); } + { show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); } } muxer( courier, pp, outfd ); @@ -484,14 +501,14 @@ int dec_stream( const int num_workers, const int infd, const int outfd, { errcode = pthread_join( worker_threads[i], 0 ); if( errcode ) - { show_error( "Can't join worker threads", errcode ); fatal(); } + { show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } } delete[] worker_threads; delete[] worker_args; errcode = pthread_join( splitter_thread, 0 ); if( errcode ) - { show_error( "Can't join splitter thread", errcode ); fatal(); } + { show_error( "Can't join splitter thread", errcode ); cleanup_and_fail(); } if( verbosity >= 2 && out_size > 0 && in_size > 0 ) std::fprintf( stderr, "%6.3f:1, %6.3f bits/byte, %5.2f%% saved. ", diff --git a/decompress.cc b/decompress.cc index c861b4d..d008d1c 100644 --- a/decompress.cc +++ b/decompress.cc @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. @@ -122,7 +122,7 @@ extern "C" void * dworker( void * arg ) LZ_Decoder * const decoder = LZ_decompress_open(); if( !ibuffer || !obuffer || !decoder || LZ_decompress_errno( decoder ) != LZ_ok ) - { pp( "Not enough memory" ); fatal(); } + { pp( "Not enough memory" ); cleanup_and_fail(); } for( int i = worker_id; i < file_index.members(); i += num_workers ) { @@ -140,7 +140,7 @@ extern "C" void * dworker( void * arg ) if( size > 0 ) { if( preadblock( infd, ibuffer, size, member_pos ) != size ) - { pp(); show_error( "Read error", errno ); fatal(); } + { pp(); show_error( "Read error", errno ); cleanup_and_fail(); } member_pos += size; member_rest -= size; if( LZ_decompress_write( decoder, ibuffer, size ) != size ) @@ -152,7 +152,7 @@ extern "C" void * dworker( void * arg ) { const int rd = LZ_decompress_read( decoder, obuffer, buffer_size ); if( rd < 0 ) - fatal( decompress_read_error( decoder, pp, worker_id ) ); + cleanup_and_fail( decompress_read_error( decoder, pp, worker_id ) ); if( rd > 0 && outfd >= 0 ) { const int wr = pwriteblock( outfd, obuffer, rd, data_pos ); @@ -162,7 +162,7 @@ extern "C" void * dworker( void * arg ) if( verbosity >= 0 ) std::fprintf( stderr, "Write error in worker %d: %s\n", worker_id, std::strerror( errno ) ); - fatal(); + cleanup_and_fail(); } } if( rd > 0 ) @@ -184,9 +184,9 @@ extern "C" void * dworker( void * arg ) delete[] obuffer; delete[] ibuffer; if( LZ_decompress_member_position( decoder ) != 0 ) - { pp( "Error, some data remains in decoder" ); fatal(); } + { pp( "Error, some data remains in decoder" ); cleanup_and_fail(); } if( LZ_decompress_close( decoder ) < 0 ) - { pp( "LZ_decompress_close failed" ); fatal(); } + { pp( "LZ_decompress_close failed" ); cleanup_and_fail(); } return 0; } @@ -208,7 +208,7 @@ int decompress( int num_workers, const int infd, const int outfd, return dec_stream( num_workers, infd, outfd, pp, debug_level, testing ); } if( file_index.retval() != 0 ) - { show_error( file_index.error().c_str() ); return file_index.retval(); } + { pp( file_index.error().c_str() ); return file_index.retval(); } if( num_workers > file_index.members() ) num_workers = file_index.members(); @@ -224,7 +224,7 @@ int decompress( int num_workers, const int infd, const int outfd, Worker_arg * worker_args = new( std::nothrow ) Worker_arg[num_workers]; pthread_t * worker_threads = new( std::nothrow ) pthread_t[num_workers]; if( !worker_args || !worker_threads ) - { pp( "Not enough memory" ); fatal(); } + { pp( "Not enough memory" ); cleanup_and_fail(); } for( int i = 0; i < num_workers; ++i ) { worker_args[i].file_index = &file_index; @@ -236,14 +236,14 @@ int decompress( int num_workers, const int infd, const int outfd, const int errcode = pthread_create( &worker_threads[i], 0, dworker, &worker_args[i] ); if( errcode ) - { show_error( "Can't create worker threads", errcode ); fatal(); } + { show_error( "Can't create worker threads", errcode ); cleanup_and_fail(); } } for( int i = num_workers - 1; i >= 0; --i ) { const int errcode = pthread_join( worker_threads[i], 0 ); if( errcode ) - { show_error( "Can't join worker threads", errcode ); fatal(); } + { show_error( "Can't join worker threads", errcode ); cleanup_and_fail(); } } delete[] worker_threads; delete[] worker_args; diff --git a/doc/plzip.1 b/doc/plzip.1 index 5b91105..781baf9 100644 --- a/doc/plzip.1 +++ b/doc/plzip.1 @@ -1,12 +1,12 @@ .\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.37.1. -.TH PLZIP "1" "May 2013" "Plzip 1.0" "User Commands" +.TH PLZIP "1" "July 2013" "Plzip 1.1-pre1" "User Commands" .SH NAME Plzip \- reduces the size of files .SH SYNOPSIS .B plzip [\fIoptions\fR] [\fIfiles\fR] .SH DESCRIPTION -Plzip \- A parallel compressor compatible with lzip. +Plzip \- Parallel compressor compatible with lzip. .SH OPTIONS .TP \fB\-h\fR, \fB\-\-help\fR diff --git a/doc/plzip.info b/doc/plzip.info index 2070f4f..72e9511 100644 --- a/doc/plzip.info +++ b/doc/plzip.info @@ -12,16 +12,16 @@ File: plzip.info, Node: Top, Next: Introduction, Up: (dir) Plzip Manual ************ -This manual is for Plzip (version 1.0, 29 May 2013). +This manual is for Plzip (version 1.1-pre1, 20 July 2013). * Menu: * Introduction:: Purpose and features of plzip -* Program Design:: Internal structure of plzip -* Invoking Plzip:: Command line interface -* File Format:: Detailed format of the compressed file +* Program design:: Internal structure of plzip +* Invoking plzip:: Command line interface +* File format:: Detailed format of the compressed file * Problems:: Reporting bugs -* Concept Index:: Index of concepts +* Concept index:: Index of concepts Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. @@ -30,27 +30,46 @@ This manual is for Plzip (version 1.0, 29 May 2013). copy, distribute and modify it. -File: plzip.info, Node: Introduction, Next: Program Design, Prev: Top, Up: Top +File: plzip.info, Node: Introduction, Next: Program design, Prev: Top, Up: Top 1 Introduction ************** Plzip is a massively parallel (multi-threaded), lossless data compressor -based on the lzlib compression library, with very safe integrity -checking and a user interface similar to the one of bzip2, gzip or lzip. +based on the lzlib compression library, with a user interface similar to +the one of lzip, bzip2 or gzip. - Plzip is intended for faster compression/decompression of big files -on multiprocessor machines, which makes it specially well suited for -distribution of big software files and large scale data archiving. On -files big enough (several GB), plzip can use hundreds of processors. + Plzip can compress/decompress large files on multiprocessor machines +much faster than lzip, at the cost of a slightly reduced compression +ratio. On files large enough (several GB), plzip can use hundreds of +processors. On files of only a few MB it is better to use lzip. + + Plzip uses the same well-defined exit status values used by lzip and +bzip2, which makes it safer when used in pipes or scripts than +compressors returning ambiguous warning values, like gzip. Plzip uses the lzip file format; the files produced by plzip are fully compatible with lzip-1.4 or newer, and can be rescued with lziprecover. - Plzip uses the same well-defined exit status values used by lzip and -bzip2, which makes it safer when used in pipes or scripts than -compressors returning ambiguous warning values, like gzip. + The lzip file format is designed for long-term data archiving and +provides very safe integrity checking. The member trailer stores the +32-bit CRC of the original data, the size of the original data and the +size of the member. These values, together with the value remaining in +the range decoder and the end-of-stream marker, provide a 4 factor +integrity checking which guarantees that the decompressed version of the +data is identical to the original. This guards against corruption of the +compressed data, and against undetected bugs in plzip (hopefully very +unlikely). The chances of data corruption going undetected are +microscopic. Be aware, though, that the check occurs upon decompression, +so it can only tell you that something is wrong. It can't help you +recover the original uncompressed data. + + If you ever need to recover data from a damaged lzip file, try the +lziprecover program. Lziprecover makes lzip files resistant to bit-flip +(one of the most common forms of data corruption), and provides data +recovery capabilities, including error-checked merging of damaged copies +of a file. Plzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". Each compressed @@ -76,18 +95,6 @@ filename.lz becomes filename filename.tlz becomes filename.tar anyothername becomes anyothername.out - As a self-check for your protection, plzip stores in the member -trailer the 32-bit CRC of the original data, the size of the original -data and the size of the member. These values, together with the value -remaining in the range decoder and the end-of-stream marker, provide a -very safe 4 factor integrity checking which guarantees that the -decompressed version of the data is identical to the original. This -guards against corruption of the compressed data, and against -undetected bugs in plzip (hopefully very unlikely). The chances of data -corruption going undetected are microscopic. Be aware, though, that the -check occurs upon decompression, so it can only tell you that something -is wrong. It can't help you recover the original uncompressed data. - WARNING! Even if plzip is bug-free, other causes may result in a corrupt compressed file (bugs in the system libraries, memory errors, etc). Therefore, if the data you are going to compress is important, @@ -96,9 +103,9 @@ until you verify the compressed file with a command like `plzip -cd file.lz | cmp file -'. -File: plzip.info, Node: Program Design, Next: Invoking Plzip, Prev: Introduction, Up: Top +File: plzip.info, Node: Program design, Next: Invoking plzip, Prev: Introduction, Up: Top -2 Program Design +2 Program design **************** For each input file, a splitter thread and several worker threads are @@ -119,9 +126,9 @@ speed of large files with many members is only limited by the number of processors available and by I/O speed. -File: plzip.info, Node: Invoking Plzip, Next: File Format, Prev: Program Design, Up: Top +File: plzip.info, Node: Invoking plzip, Next: File format, Prev: Program design, Up: Top -3 Invoking Plzip +3 Invoking plzip **************** The format for running plzip is: @@ -220,7 +227,7 @@ The format for running plzip is: `--verbose' Verbose mode. When compressing, show the compression ratio for each file - processed. + processed. A second -v shows the progress of compression. When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, decompressed size, and compressed size. @@ -275,9 +282,9 @@ invalid input file, 3 for an internal consistency error (eg, bug) which caused plzip to panic. -File: plzip.info, Node: File Format, Next: Problems, Prev: Invoking Plzip, Up: Top +File: plzip.info, Node: File format, Next: Problems, Prev: Invoking plzip, Up: Top -4 File Format +4 File format ************* Perfection is reached, not when there is no longer anything to add, but @@ -348,7 +355,7 @@ additional information before, between, or after them. -File: plzip.info, Node: Problems, Next: Concept Index, Prev: File Format, Up: Top +File: plzip.info, Node: Problems, Next: Concept index, Prev: File format, Up: Top 5 Reporting Bugs **************** @@ -363,34 +370,34 @@ for all eternity, if not longer. by running `plzip --version'. -File: plzip.info, Node: Concept Index, Prev: Problems, Up: Top +File: plzip.info, Node: Concept index, Prev: Problems, Up: Top -Concept Index +Concept index ************* * Menu: * bugs: Problems. (line 6) -* file format: File Format. (line 6) +* file format: File format. (line 6) * getting help: Problems. (line 6) * introduction: Introduction. (line 6) -* invoking: Invoking Plzip. (line 6) -* options: Invoking Plzip. (line 6) -* program design: Program Design. (line 6) -* usage: Invoking Plzip. (line 6) -* version: Invoking Plzip. (line 6) +* invoking: Invoking plzip. (line 6) +* options: Invoking plzip. (line 6) +* program design: Program design. (line 6) +* usage: Invoking plzip. (line 6) +* version: Invoking plzip. (line 6) Tag Table: Node: Top223 -Node: Introduction865 -Node: Program Design4113 -Node: Invoking Plzip5167 -Node: File Format10416 -Node: Problems12895 -Node: Concept Index13424 +Node: Introduction871 +Node: Program design4426 +Node: Invoking plzip5480 +Node: File format10776 +Node: Problems13255 +Node: Concept index13784 End Tag Table diff --git a/doc/plzip.texinfo b/doc/plzip.texinfo index b832884..c3b0613 100644 --- a/doc/plzip.texinfo +++ b/doc/plzip.texinfo @@ -6,8 +6,8 @@ @finalout @c %**end of header -@set UPDATED 29 May 2013 -@set VERSION 1.0 +@set UPDATED 20 July 2013 +@set VERSION 1.1-pre1 @dircategory Data Compression @direntry @@ -36,11 +36,11 @@ This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). @menu * Introduction:: Purpose and features of plzip -* Program Design:: Internal structure of plzip -* Invoking Plzip:: Command line interface -* File Format:: Detailed format of the compressed file +* Program design:: Internal structure of plzip +* Invoking plzip:: Command line interface +* File format:: Detailed format of the compressed file * Problems:: Reporting bugs -* Concept Index:: Index of concepts +* Concept index:: Index of concepts @end menu @sp 1 @@ -55,21 +55,40 @@ to copy, distribute and modify it. @cindex introduction Plzip is a massively parallel (multi-threaded), lossless data compressor -based on the lzlib compression library, with very safe integrity -checking and a user interface similar to the one of bzip2, gzip or lzip. +based on the lzlib compression library, with a user interface similar to +the one of lzip, bzip2 or gzip. -Plzip is intended for faster compression/decompression of big files on -multiprocessor machines, which makes it specially well suited for -distribution of big software files and large scale data archiving. On -files big enough (several GB), plzip can use hundreds of processors. - -Plzip uses the lzip file format; the files produced by plzip are fully -compatible with lzip-1.4 or newer, and can be rescued with lziprecover. +Plzip can compress/decompress large files on multiprocessor machines +much faster than lzip, at the cost of a slightly reduced compression +ratio. On files large enough (several GB), plzip can use hundreds of +processors. On files of only a few MB it is better to use lzip. Plzip uses the same well-defined exit status values used by lzip and bzip2, which makes it safer when used in pipes or scripts than compressors returning ambiguous warning values, like gzip. +Plzip uses the lzip file format; the files produced by plzip are fully +compatible with lzip-1.4 or newer, and can be rescued with lziprecover. + +The lzip file format is designed for long-term data archiving and +provides very safe integrity checking. The member trailer stores the +32-bit CRC of the original data, the size of the original data and the +size of the member. These values, together with the value remaining in +the range decoder and the end-of-stream marker, provide a 4 factor +integrity checking which guarantees that the decompressed version of the +data is identical to the original. This guards against corruption of the +compressed data, and against undetected bugs in plzip (hopefully very +unlikely). The chances of data corruption going undetected are +microscopic. Be aware, though, that the check occurs upon decompression, +so it can only tell you that something is wrong. It can't help you +recover the original uncompressed data. + +If you ever need to recover data from a damaged lzip file, try the +lziprecover program. Lziprecover makes lzip files resistant to bit-flip +(one of the most common forms of data corruption), and provides data +recovery capabilities, including error-checked merging of damaged copies +of a file. + Plzip replaces every file given in the command line with a compressed version of itself, with the name "original_name.lz". Each compressed file has the same modification date, permissions, and, when possible, @@ -96,18 +115,6 @@ file from that of the compressed file as follows: @item anyothername @tab becomes @tab anyothername.out @end multitable -As a self-check for your protection, plzip stores in the member trailer -the 32-bit CRC of the original data, the size of the original data and -the size of the member. These values, together with the value remaining -in the range decoder and the end-of-stream marker, provide a very safe 4 -factor integrity checking which guarantees that the decompressed version -of the data is identical to the original. This guards against corruption -of the compressed data, and against undetected bugs in plzip (hopefully -very unlikely). The chances of data corruption going undetected are -microscopic. Be aware, though, that the check occurs upon decompression, -so it can only tell you that something is wrong. It can't help you -recover the original uncompressed data. - WARNING! Even if plzip is bug-free, other causes may result in a corrupt compressed file (bugs in the system libraries, memory errors, etc). Therefore, if the data you are going to compress is important, give the @@ -116,8 +123,8 @@ you verify the compressed file with a command like @w{@samp{plzip -cd file.lz | cmp file -}}. -@node Program Design -@chapter Program Design +@node Program design +@chapter Program design @cindex program design For each input file, a splitter thread and several worker threads are @@ -138,8 +145,8 @@ large files with many members is only limited by the number of processors available and by I/O speed. -@node Invoking Plzip -@chapter Invoking Plzip +@node Invoking plzip +@chapter Invoking plzip @cindex invoking @cindex options @cindex usage @@ -237,7 +244,8 @@ Use it together with @samp{-v} to see information about the file. @item -v @itemx --verbose Verbose mode.@* -When compressing, show the compression ratio for each file processed.@* +When compressing, show the compression ratio for each file processed. A +second -v shows the progress of compression.@* When decompressing or testing, further -v's (up to 4) increase the verbosity level, showing status, compression ratio, decompressed size, and compressed size. @@ -297,8 +305,8 @@ invalid input file, 3 for an internal consistency error (eg, bug) which caused plzip to panic. -@node File Format -@chapter File Format +@node File format +@chapter File format @cindex file format Perfection is reached, not when there is no longer anything to add, but @@ -387,8 +395,8 @@ If you find a bug in plzip, please send electronic mail to find by running @w{@samp{plzip --version}}. -@node Concept Index -@unnumbered Concept Index +@node Concept index +@unnumbered Concept index @printindex cp diff --git a/file_index.cc b/file_index.cc index 5cdba46..452d0ab 100644 --- a/file_index.cc +++ b/file_index.cc @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. This program is free software: you can redistribute it and/or modify @@ -68,23 +68,23 @@ File_index::File_index( const int infd ) : retval_( 0 ) { error_ = "Input file is not seekable :"; error_ += std::strerror( errno ); retval_ = 1; return; } if( isize > INT64_MAX ) - { error_ = "Input file is too long (2^63 bytes or more)."; + { error_ = "Input file is too long (2^63 bytes or more)"; retval_ = 2; return; } long long pos = isize; // always points to a header or EOF File_header header; File_trailer trailer; if( isize < min_member_size ) - { error_ = "Input file is too short."; retval_ = 2; return; } + { error_ = "Input file is too short"; retval_ = 2; return; } if( seek_read( infd, header.data, File_header::size, 0 ) != File_header::size ) { error_ = "Error reading member header :"; error_ += std::strerror( errno ); retval_ = 1; return; } if( !header.verify_magic() ) - { error_ = "Bad magic number (file not in lzip format)."; + { error_ = "Bad magic number (file not in lzip format)"; retval_ = 2; return; } if( !header.verify_version() ) { error_ = "Version "; error_ += format_num( header.version() ); - error_ += "member format not supported."; retval_ = 2; return; } + error_ += "member format not supported"; retval_ = 2; return; } while( pos >= min_member_size ) { @@ -114,9 +114,9 @@ File_index::File_index( const int infd ) : retval_( 0 ) if( member_vector.size() == 0 && isize - pos > File_header::size && seek_read( infd, header.data, File_header::size, pos ) == File_header::size && header.verify_magic() && header.verify_version() ) - { // last trailer is corrupt - error_ = "Member size in trailer is corrupt at pos "; - error_ += format_num( isize - 8 ); retval_ = 2; break; + { + error_ = "Last member in input file is truncated or corrupt"; + retval_ = 2; break; } pos -= member_size; member_vector.push_back( Member( 0, trailer.data_size(), @@ -125,7 +125,7 @@ File_index::File_index( const int infd ) : retval_( 0 ) if( pos != 0 || member_vector.size() == 0 ) { member_vector.clear(); - if( retval_ == 0 ) { error_ = "Can't create file index."; retval_ = 2; } + if( retval_ == 0 ) { error_ = "Can't create file index"; retval_ = 2; } return; } std::reverse( member_vector.begin(), member_vector.end() ); @@ -135,7 +135,7 @@ File_index::File_index( const int infd ) : retval_( 0 ) if( end < 0 || end > INT64_MAX ) { member_vector.clear(); - error_ = "Data in input file is too long (2^63 bytes or more)."; + error_ = "Data in input file is too long (2^63 bytes or more)"; retval_ = 2; return; } member_vector[i+1].dblock.pos( end ); diff --git a/file_index.h b/file_index.h index 1dfbcf4..5493ffa 100644 --- a/file_index.h +++ b/file_index.h @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. This program is free software: you can redistribute it and/or modify @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. This program is free software: you can redistribute it and/or modify @@ -196,10 +196,13 @@ int decompress( int num_workers, const int infd, const int outfd, // defined in main.cc extern int verbosity; -void fatal( const int retval = 1 ); // terminate the program +void cleanup_and_fail( const int retval = 1 ); // terminate the program void show_error( const char * const msg, const int errcode = 0, const bool help = false ); void internal_error( const char * const msg ); +void show_progress( const int packet_size, + const Pretty_print * const p = 0, + const struct stat * const in_statsp = 0 ); class Slot_tally @@ -1,4 +1,4 @@ -/* Plzip - A parallel compressor compatible with lzip +/* Plzip - Parallel compressor compatible with lzip Copyright (C) 2009 Laszlo Ersek. Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. @@ -96,13 +96,11 @@ const mode_t usr_rw = S_IRUSR | S_IWUSR; const mode_t all_rw = usr_rw | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; mode_t outfd_mode = usr_rw; bool delete_output_on_interrupt = false; -pthread_t main_thread; -pid_t main_thread_pid; void show_help( const long num_online ) { - std::printf( "%s - A parallel compressor compatible with lzip.\n", Program_name ); + std::printf( "%s - Parallel compressor compatible with lzip.\n", Program_name ); std::printf( "\nUsage: %s [options] [files]\n", invocation_name ); std::printf( "\nOptions:\n" " -h, --help display this help and exit\n" @@ -262,12 +260,13 @@ int open_instream( const char * const name, struct stat * const in_statsp, const bool can_read = ( i == 0 && ( S_ISBLK( mode ) || S_ISCHR( mode ) || S_ISFIFO( mode ) || S_ISSOCK( mode ) ) ); - if( i != 0 || ( !S_ISREG( mode ) && ( !to_stdout || !can_read ) ) ) + const bool no_ofile = to_stdout || ( program_mode == m_test ); + if( i != 0 || ( !S_ISREG( mode ) && ( !can_read || !no_ofile ) ) ) { if( verbosity >= 0 ) std::fprintf( stderr, "%s: Input file '%s' is not a regular file%s.\n", program_name, name, - ( can_read && !to_stdout ) ? + ( can_read && !no_ofile ) ? " and '--stdout' was not specified" : "" ); close( infd ); infd = -1; @@ -340,22 +339,6 @@ bool check_tty( const int infd, const Mode program_mode ) } -void cleanup_and_fail( const int retval ) - { - if( delete_output_on_interrupt ) - { - delete_output_on_interrupt = false; - if( verbosity >= 0 ) - std::fprintf( stderr, "%s: Deleting output file '%s', if it exists.\n", - program_name, output_filename.c_str() ); - if( outfd >= 0 ) { close( outfd ); outfd = -1; } - if( std::remove( output_filename.c_str() ) != 0 && errno != ENOENT ) - show_error( "WARNING: deletion of output file (apparently) failed." ); - } - std::exit( retval ); - } - - // Set permissions, owner and times. void close_and_set_permissions( const struct stat * const in_statsp ) { @@ -382,13 +365,10 @@ void close_and_set_permissions( const struct stat * const in_statsp ) } -extern "C" void signal_handler( int sig ) +extern "C" void signal_handler( int ) { - if( !pthread_equal( pthread_self(), main_thread ) ) - kill( main_thread_pid, sig ); - if( sig != SIGUSR1 && sig != SIGUSR2 ) - show_error( "Control-C or similar caught, quitting." ); - cleanup_and_fail( ( sig != SIGUSR2 ) ? 1 : 2 ); + show_error( "Control-C or similar caught, quitting." ); + cleanup_and_fail( 1 ); } @@ -405,14 +385,6 @@ void set_signals() int verbosity = 0; -// This can be called from any thread, main thread or sub-threads alike, -// since they all call common helper functions that call fatal() in case -// of an error. -// -void fatal( const int retval ) - { signal_handler( ( retval != 2 ) ? SIGUSR1 : SIGUSR2 ); } - - void Pretty_print::operator()( const char * const msg ) const { if( verbosity >= 0 ) @@ -456,6 +428,60 @@ void internal_error( const char * const msg ) } +// This can be called from any thread, main thread or sub-threads alike, +// since they all call common helper functions that call cleanup_and_fail() +// in case of an error. +// +void cleanup_and_fail( const int retval ) + { + // only one thread can delete and exit + static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; + + pthread_mutex_lock( &mutex ); // ignore errors to avoid loop + if( delete_output_on_interrupt ) + { + delete_output_on_interrupt = false; + if( verbosity >= 0 ) + std::fprintf( stderr, "%s: Deleting output file '%s', if it exists.\n", + program_name, output_filename.c_str() ); + if( outfd >= 0 ) { close( outfd ); outfd = -1; } + if( std::remove( output_filename.c_str() ) != 0 && errno != ENOENT ) + show_error( "WARNING: deletion of output file (apparently) failed." ); + } + std::exit( retval ); + } + + +void show_progress( const int packet_size, + const Pretty_print * const p, + const struct stat * const in_statsp ) + { + static unsigned long long cfile_size = 0; // file_size / 100 + static unsigned long long pos = 0; + static const Pretty_print * pp = 0; + static pthread_mutex_t mutex; + + if( p ) // initialize static vars + { + if( !pp ) xinit( &mutex ); // init mutex only once + pos = 0; pp = p; + cfile_size = ( in_statsp && S_ISREG( in_statsp->st_mode ) ) ? + in_statsp->st_size / 100 : 0; + return; + } + if( pp ) + { + xlock( &mutex ); + pos += packet_size; + if( cfile_size > 0 ) + std::fprintf( stderr, "%4llu%%", pos / cfile_size ); + std::fprintf( stderr, " %.1f MB\r", pos / 1000000.0 ); + pp->reset(); (*pp)(); // restore cursor position + xunlock( &mutex ); + } + } + + int main( const int argc, const char * const argv[] ) { // Mapping from gzip/bzip2 style 1..9 compression modes @@ -486,8 +512,6 @@ int main( const int argc, const char * const argv[] ) bool recompress = false; bool to_stdout = false; invocation_name = argv[0]; - main_thread = pthread_self(); - main_thread_pid = getpid(); if( LZ_version()[0] != LZ_version_string[0] ) internal_error( "bad library version" ); @@ -598,8 +622,6 @@ int main( const int argc, const char * const argv[] ) if( !to_stdout && program_mode != m_test && ( filenames_given || default_output_filename.size() ) ) set_signals(); - std::signal( SIGUSR1, signal_handler ); - std::signal( SIGUSR2, signal_handler ); Pretty_print pp( filenames ); @@ -668,9 +690,13 @@ int main( const int argc, const char * const argv[] ) if( verbosity >= 1 ) pp(); int tmp; if( program_mode == m_compress ) + { + show_progress( 0, &pp, in_statsp ); // initialize static vars + if( verbosity >= 2 ) show_progress( 0 ); // show initial zero size tmp = compress( data_size, encoder_options.dictionary_size, encoder_options.match_len_limit, num_workers, infd, outfd, pp, debug_level ); + } else tmp = decompress( num_workers, infd, outfd, pp, debug_level, program_mode == m_test, infd_isreg ); diff --git a/testsuite/check.sh b/testsuite/check.sh index 031fe00..05ab346 100755 --- a/testsuite/check.sh +++ b/testsuite/check.sh @@ -1,5 +1,5 @@ #! /bin/sh -# check script for Plzip - A parallel compressor compatible with lzip +# check script for Plzip - Parallel compressor compatible with lzip # Copyright (C) 2009, 2010, 2011, 2012, 2013 Antonio Diaz Diaz. # # This script is free software: you have unlimited permission @@ -28,13 +28,21 @@ fail=0 printf "testing plzip-%s..." "$2" "${LZIP}" -cqs-1 in > /dev/null -if [ $? != 1 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 1 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -cqs0 in > /dev/null -if [ $? != 1 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 1 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -cqs4095 in > /dev/null -if [ $? != 1 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 1 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -cqm274 in > /dev/null -if [ $? != 1 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 1 ] ; then printf . ; else fail=1 ; printf - ; fi +"${LZIP}" -tq in +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi +"${LZIP}" -tq < in +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi +"${LZIP}" -cdq in +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi +"${LZIP}" -cdq < in +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -t "${in_lz}" || fail=1 "${LZIP}" -cd "${in_lz}" > copy || fail=1 @@ -42,7 +50,7 @@ cmp in copy || fail=1 printf . "${LZIP}" -cfq "${in_lz}" > out -if [ $? != 1 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 1 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -cF "${in_lz}" > out || fail=1 "${LZIP}" -cd out | "${LZIP}" -d > copy || fail=1 cmp in copy || fail=1 @@ -54,32 +62,32 @@ for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do printf "garbage" >> copy.lz || fail=1 "${LZIP}" -df copy.lz || fail=1 cmp in copy || fail=1 - printf . done +printf . for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do "${LZIP}" -c -$i in > out || fail=1 printf "g" >> out || fail=1 "${LZIP}" -cd out > copy || fail=1 cmp in copy || fail=1 - printf . done +printf . for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do "${LZIP}" -$i < in > out || fail=1 printf "garbage" >> out || fail=1 "${LZIP}" -d < out > copy || fail=1 cmp in copy || fail=1 - printf . done +printf . for i in s4Ki 0 1 2 3 4 5 6 7 8 9 ; do "${LZIP}" -f -$i -o out < in || fail=1 printf "g" >> out.lz || fail=1 "${LZIP}" -df -o copy < out.lz || fail=1 cmp in copy || fail=1 - printf . done +printf . "${LZIP}" < in > anyothername || fail=1 "${LZIP}" -d anyothername || fail=1 @@ -95,39 +103,38 @@ for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ; do "${LZIP}" -d -n$i out4.lz || fail=1 cmp in4 out4 || fail=1 rm -f out4 - printf . done +printf . for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ; do "${LZIP}" -s4Ki -B8Ki -n$i < in4 > out4 || fail=1 printf "g" >> out4 || fail=1 "${LZIP}" -d -n$i < out4 > copy4 || fail=1 cmp in4 copy4 || fail=1 - printf . done +printf . cat "${in_lz}" > ingin.lz || framework_failure printf "g" >> ingin.lz || framework_failure cat "${in_lz}" >> ingin.lz || framework_failure "${LZIP}" -tq ingin.lz -if [ $? != 2 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -cdq ingin.lz > out -if [ $? != 2 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -t < ingin.lz || fail=1 -printf . "${LZIP}" -d < ingin.lz > copy || fail=1 cmp in copy || fail=1 printf . dd if="${in_lz}" bs=1024 count=10 > trunc.lz 2> /dev/null || framework_failure "${LZIP}" -tq trunc.lz -if [ $? != 2 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -cdq trunc.lz > out -if [ $? != 2 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -tq < trunc.lz -if [ $? != 2 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi "${LZIP}" -dq < trunc.lz > out -if [ $? != 2 ] ; then fail=1 ; printf - ; else printf . ; fi +if [ $? = 2 ] ; then printf . ; else fail=1 ; printf - ; fi echo if [ ${fail} = 0 ] ; then |