summaryrefslogtreecommitdiffstats
path: root/doc/zutils.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/zutils.texi')
-rw-r--r--doc/zutils.texi745
1 files changed, 745 insertions, 0 deletions
diff --git a/doc/zutils.texi b/doc/zutils.texi
new file mode 100644
index 0000000..09237d2
--- /dev/null
+++ b/doc/zutils.texi
@@ -0,0 +1,745 @@
+\input texinfo @c -*-texinfo-*-
+@c %**start of header
+@setfilename zutils.info
+@documentencoding ISO-8859-15
+@settitle Zutils Manual
+@finalout
+@c %**end of header
+
+@set UPDATED 1 February 2014
+@set VERSION 1.2
+
+@dircategory Data Compression
+@direntry
+* Zutils: (zutils). Utilities dealing with compressed files
+@end direntry
+
+
+@ifnothtml
+@titlepage
+@title Zutils
+@subtitle Utilities dealing with compressed files
+@subtitle for Zutils version @value{VERSION}, @value{UPDATED}
+@author by Antonio Diaz Diaz
+
+@page
+@vskip 0pt plus 1filll
+@end titlepage
+
+@contents
+@end ifnothtml
+
+@node Top
+@top
+
+This manual is for Zutils (version @value{VERSION}, @value{UPDATED}).
+
+@menu
+* Introduction:: Purpose and features of zutils
+* Common options:: Common options
+* The zutilsrc file:: The zutils configuration file
+* Zcat:: Concatenating compressed files
+* Zcmp:: Comparing compressed files byte by byte
+* Zdiff:: Comparing compressed files line by line
+* Zgrep:: Searching inside compressed files
+* Ztest:: Testing integrity of compressed files
+* Zupdate:: Recompressing files to lzip format
+* Problems:: Reporting bugs
+* Concept index:: Index of concepts
+@end menu
+
+@sp 1
+Copyright @copyright{} 2009, 2010, 2011, 2012, 2013, 2014
+Antonio Diaz Diaz.
+
+This manual is free documentation: you have unlimited permission
+to copy, distribute and modify it.
+
+
+@node Introduction
+@chapter Introduction
+@cindex introduction
+
+Zutils is a collection of utilities able to deal with any combination of
+compressed and uncompressed files transparently. If any given file,
+including standard input, is compressed, its decompressed content is
+used. Compressed files are decompressed on the fly; no temporary files
+are created.
+
+These utilities are not wrapper scripts but safer and more efficient C++
+programs. In particular the @samp{--recursive} option is very efficient
+in those utilities supporting it.
+
+@noindent
+The provided utilities are zcat, zcmp, zdiff, zgrep, ztest and zupdate.@*
+The supported formats are bzip2, gzip, lzip and xz.@*
+The compressor to be used for each format is configurable at runtime.
+
+Zcat, zcmp, zdiff, and zgrep are improved replacements for the shell
+scripts provided with GNU gzip. Ztest is unique to zutils. Zupdate is
+similar to gzip's znew.
+
+NOTE: Bzip2 and lzip provide well-defined values of exit status, which
+makes them safe to use with zutils. Gzip and xz may return ambiguous
+warning values, making them less reliable back ends for zutils.
+
+LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never
+have been compressed. Decompressed is used to refer to data which has
+undergone the process of decompression.
+
+@sp 1
+Numbers given as arguments to options (positions, sizes) may be followed
+by a multiplier and an optional @samp{B} for "byte".
+
+Table of SI and binary prefixes (unit multipliers):
+
+@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)}
+@item Prefix @tab Value @tab | @tab Prefix @tab Value
+@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024)
+@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20)
+@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30)
+@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40)
+@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50)
+@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60)
+@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70)
+@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80)
+@end multitable
+
+
+@node Common options
+@chapter Common options
+@cindex common options
+
+The following options are available in all the utilities. Rather than
+writing identical descriptions for each of the programs, they are
+described here.
+
+@table @samp
+@item -h
+@itemx --help
+Print an informative help message describing the options and exit. Zgrep
+only supports the @samp{--help} form of this option.
+
+@item -V
+@itemx --version
+Print the version number on the standard output and exit.
+
+@item -N
+@itemx --no-rcfile
+Don't read the runtime configuration file @samp{zutilsrc}.
+
+@item --bz2=@var{command}
+@itemx --gz=@var{command}
+@itemx --lz=@var{command}
+@itemx --xz=@var{command}
+Set program (may include arguments) to be used as (de)compressor for the
+given format. These options override the values set in @file{zutilsrc}.
+The compression program used must meet three requirements:
+
+@enumerate
+@item
+When called with the @samp{-d} option, it must read compressed data from
+the standard input and produce decompressed data on the standard output.
+@item
+If the @samp{-q} option is passed to zutils, the compression program
+must also accept it.
+@item
+It must return 0 if no errors occurred, and a non-zero value otherwise.
+@end enumerate
+
+@end table
+
+
+@node The zutilsrc file
+@chapter The zutilsrc file
+@cindex the zutilsrc file
+
+@file{zutilsrc} is the runtime configuration file for zutils. In it you
+may define the compressor name and options to be used for each format.
+The @file{zutilsrc} file is optional; you do not need to install it in
+order to run zutils.
+
+The compressors specified in the command line override those specified
+in the @file{zutilsrc} file.
+
+You may copy the system @file{zutilsrc} file
+@file{$@{sysconfdir@}/zutilsrc} to @file{$HOME/.zutilsrc} and customize
+these options as you like. The file syntax is fairly obvious (and there
+are further instructions in it):
+
+@enumerate
+@item
+Any line beginning with @samp{#} is a comment line.
+@item
+Each non-comment line defines the command to be used for the given
+format, with the syntax:
+@example
+<format> = <compressor> [options]
+@end example
+where <format> is one of @samp{bz2}, @samp{gz}, @samp{lz} or @samp{xz}.
+@end enumerate
+
+
+@node Zcat
+@chapter Zcat
+@cindex zcat
+
+Zcat copies each given file (@samp{-} means standard input), to standard
+output. If any given file is compressed, its decompressed content is
+used. If a given file does not exist, and its name does not end with one
+of the known extensions, zcat tries the compressed file names
+corresponding to the supported formats.
+
+If no files are specified, data is read from standard input,
+decompressed if needed, and sent to standard output. Data read from
+standard input must be of the same type; all uncompressed or all in the
+same compression format.
+
+The format for running zcat is:
+
+@example
+zcat [@var{options}] [@var{files}]
+@end example
+
+@noindent
+Exit status is 0 if no errors occurred, non-zero otherwise.
+
+Zcat supports the following options:
+
+@table @samp
+@item -A
+@itemx --show-all
+Equivalent to @samp{-vET}.
+
+@item -b
+@itemx --number-nonblank
+Number all nonblank output lines, starting with 1. The line count is
+unlimited.
+
+@item -e
+Equivalent to @samp{-vE}.
+
+@item -E
+@itemx --show-ends
+Print a @samp{$} after the end of each line.
+
+@item --format=@var{fmt}
+Force the given compression format. Valid values for @var{fmt} are
+@samp{bz2}, @samp{gz}, @samp{lz} and @samp{xz}. If this option is used,
+the exact file name must be given. Other names won't be tried.
+
+@item -n
+@itemx --number
+Number all output lines, starting with 1. The line count is unlimited.
+
+@item -q
+@itemx --quiet
+Quiet operation. Suppress all messages.
+
+@item -r
+@itemx --recursive
+Operate recursively on directories.
+
+@item -s
+@itemx --squeeze-blank
+Replace multiple adjacent blank lines with a single blank line.
+
+@item -t
+Equivalent to @samp{-vT}.
+
+@item -T
+@itemx --show-tabs
+Print TAB characters as @samp{^I}.
+
+@item -v
+@itemx --show-nonprinting
+Print control characters except for LF (newline) and TAB using @samp{^}
+notation and precede characters larger than 127 with @samp{M-} (which
+stands for "meta").
+
+@item --verbose
+Verbose mode. Show error messages.
+
+@end table
+
+
+@node Zcmp
+@chapter Zcmp
+@cindex zcmp
+
+Zcmp compares two files (@samp{-} means standard input), and if they
+differ, tells the first byte and line number where they differ. Bytes
+and lines are numbered starting with 1. If any given file is compressed,
+its decompressed content is used. Compressed files are decompressed on
+the fly; no temporary files are created.
+
+The format for running zcmp is:
+
+@example
+zcmp [@var{options}] @var{file1} [@var{file2}]
+@end example
+
+@noindent
+This compares @var{file1} to @var{file2}. If @var{file2} is omitted zcmp
+tries the following:
+
+@enumerate
+@item
+If @var{file1} is compressed, compares its decompressed contents with
+the corresponding uncompressed file (the name of @var{file1} with the
+extension removed).
+@item
+If @var{file1} is uncompressed, compares it with the decompressed
+contents of @var{file1}.[lz|bz2|gz|xz] (the first one that is found).
+@item
+If no suitable file is found, compares @var{file1} with data read from
+standard input.
+@end enumerate
+
+@noindent
+An exit status of 0 means no differences were found, 1 means some
+differences were found, and 2 means trouble.
+
+Zcmp supports the following options:
+
+@table @samp
+@item -b
+@itemx --print-bytes
+Print the differing bytes. Print control bytes as a @samp{^} followed by
+a letter, and precede bytes larger than 127 with @samp{M-} (which stands
+for "meta").
+
+@item --format=[@var{fmt1}][,@var{fmt2}]
+Force the given compression formats. Any of @var{fmt1} or @var{fmt2} may
+be omitted and the corresponding format will be automatically detected.
+Valid values for @var{fmt} are @samp{bz2}, @samp{gz}, @samp{lz} and
+@samp{xz}. If at least one format is specified with this option, the
+exact file names of both @var{file1} and @var{file2} must be given.
+Other names won't be tried.
+
+@item -i @var{size}
+@itemx --ignore-initial=@var{size}
+Ignore any differences in the first @var{size} bytes of the input files.
+Treat files with fewer than @var{size} bytes as if they were empty. If
+@var{size} is in the form @samp{@var{size1},@var{size2}}, ignore the
+first @var{size1} bytes of the first input file and the first
+@var{size2} bytes of the second input file.
+
+@item -l
+@itemx -v
+@itemx --list
+@itemx --verbose
+Print the byte numbers (in decimal) and values (in octal) of all
+differing bytes.
+
+@item -n @var{count}
+@itemx --bytes=@var{count}
+Compare at most @var{count} input bytes.
+
+@item -q
+@itemx -s
+@itemx --quiet
+@itemx --silent
+Do not print anything; only return an exit status indicating whether the
+files differ.
+
+@end table
+
+
+@node Zdiff
+@chapter Zdiff
+@cindex zdiff
+
+Zdiff compares two files (@samp{-} means standard input), and if they
+differ, shows the differences line by line. If any given file is
+compressed, its decompressed content is used. Zdiff is a front end to
+the diff program and has the limitation that messages from diff refer to
+temporary filenames instead of those specified.
+
+The format for running zdiff is:
+
+@example
+zdiff [@var{options}] @var{file1} [@var{file2}]
+@end example
+
+@noindent
+This compares @var{file1} to @var{file2}. If @var{file2} is omitted
+zdiff tries the following:
+
+@enumerate
+@item
+If @var{file1} is compressed, compares its decompressed contents with
+the corresponding uncompressed file (the name of @var{file1} with the
+extension removed).
+@item
+If @var{file1} is uncompressed, compares it with the decompressed
+contents of @var{file1}.[lz|bz2|gz|xz] (the first one that is found).
+@item
+If no suitable file is found, compares @var{file1} with data read from
+standard input.
+@end enumerate
+
+@noindent
+An exit status of 0 means no differences were found, 1 means some
+differences were found, and 2 means trouble.
+
+Zdiff supports the following options:
+
+@table @samp
+@item -a
+@itemx --text
+Treat all files as text.
+
+@item -b
+@itemx --ignore-space-change
+Ignore changes in the amount of white space.
+
+@item -B
+@itemx --ignore-blank-lines
+Ignore changes whose lines are all blank.
+
+@itemx -c
+Use the context output format.
+
+@item -C @var{n}
+@itemx --context=@var{n}
+Same as -c but use @var{n} lines of context.
+
+@item -d
+@itemx --minimal
+Try hard to find a smaller set of changes.
+
+@item -E
+@itemx --ignore-tab-expansion
+Ignore changes due to tab expansion.
+
+@item --format=[@var{fmt1}][,@var{fmt2}]
+Force the given compression formats. Any of @var{fmt1} or @var{fmt2} may
+be omitted and the corresponding format will be automatically detected.
+Valid values for @var{fmt} are @samp{bz2}, @samp{gz}, @samp{lz} and
+@samp{xz}. If at least one format is specified with this option, the
+exact file names of both @var{file1} and @var{file2} must be given.
+Other names won't be tried.
+
+@item -i
+@itemx --ignore-case
+Ignore case differences in file contents.
+
+@item -p
+@itemx --show-c-function
+Show which C function each change is in.
+
+@item -q
+@itemx --brief
+Output only whether files differ.
+
+@item -s
+@itemx --report-identical-files
+Report when two files are identical.
+
+@item -t
+@itemx --expand-tabs
+Expand tabs to spaces in output.
+
+@item -T
+@itemx --initial-tab
+Make tabs line up by prepending a tab.
+
+@item -u
+Use the unified output format.
+
+@item -U @var{n}
+@itemx --unified=@var{n}
+Same as -u but use @var{n} lines of context.
+
+@item -w
+@itemx --ignore-all-space
+Ignore all white space.
+
+@end table
+
+
+@node Zgrep
+@chapter Zgrep
+@cindex zgrep
+
+Zgrep is a front end to the grep program that allows transparent search
+on any combination of compressed and uncompressed files. If any given
+file is compressed, its decompressed content is used. If a given file
+does not exist, and its name does not end with one of the known
+extensions, zgrep tries the compressed file names corresponding to the
+supported formats.
+
+If no files are specified, data is read from standard input,
+decompressed if needed, and fed to grep. Data read from standard input
+must be of the same type; all uncompressed or all in the same
+compression format.
+
+The format for running zgrep is:
+
+@example
+zgrep [@var{options}] @var{pattern} [@var{files}]
+@end example
+
+@noindent
+An exit status of 0 means at least one match was found, 1 means no
+matches were found, and 2 means trouble.
+
+Zgrep supports the following options:
+
+@table @samp
+@item -a
+@itemx --text
+Treat all files as text.
+
+@item -A @var{n}
+@itemx --after-context=@var{n}
+Print @var{n} lines of trailing context.
+
+@item -b
+@itemx --byte-offset
+Print the byte offset of each line.
+
+@item -B @var{n}
+@itemx --before-context=@var{n}
+Print @var{n} lines of leading context.
+
+@item -c
+@itemx --count
+Only print a count of matching lines per file.
+
+@item -C @var{n}
+@itemx --context=@var{n}
+Print @var{n} lines of output context.
+
+@item -e @var{pattern}
+@itemx --regexp=@var{pattern}
+Use @var{pattern} as the pattern to match.
+
+@item -E
+@itemx --extended-regexp
+Treat @var{pattern} as an extended regular expression.
+
+@item -f @var{file}
+@itemx --file=@var{file}
+Obtain patterns from @var{file}, one per line.
+
+@item -F
+@itemx --fixed-strings
+Treat @var{pattern} as a set of newline-separated strings.
+
+@item --format=@var{fmt}
+Force the given compression format. Valid values for @var{fmt} are
+@samp{bz2}, @samp{gz}, @samp{lz} and @samp{xz}. If this option is used,
+the exact file name must be given. Other names won't be tried.
+
+@item -h
+@itemx --no-filename
+Suppress the prefixing of filenames on output when multiple files are
+searched.
+
+@item -H
+@itemx --with-filename
+Print the filename for each match.
+
+@item -i
+@itemx --ignore-case
+Ignore case distinctions.
+
+@item -I
+Ignore binary files.
+
+@item -l
+@itemx --files-with-matches
+Only print names of files containing at least one match.
+
+@item -L
+@itemx --files-without-match
+Only print names of files not containing any matches.
+
+@item -m @var{n}
+@itemx --max-count=@var{n}
+Stop after @var{n} matches.
+
+@item -n
+@itemx --line-number
+Prefix each matched line with its line number in the input file.
+
+@item -o
+@itemx --only-matching
+Show only the part of matching lines that actually matches @var{pattern}.
+
+@item -q
+@itemx --quiet
+Suppress all messages. Exit immediately with zero status if any match is
+found, even if an error was detected.
+
+@item -r
+@itemx --recursive
+Operate recursively on directories.
+
+@item -s
+@itemx --no-messages
+Suppress error messages about nonexistent or unreadable files.
+
+@item -v
+@itemx --invert-match
+Select non-matching lines.
+
+@item --verbose
+Verbose mode. Show error messages.
+
+@item -w
+@itemx --word-regexp
+Match only whole words.
+
+@item -x
+@itemx --line-regexp
+Match only whole lines.
+
+@end table
+
+
+@node Ztest
+@chapter Ztest
+@cindex ztest
+
+Ztest verifies the integrity of the specified compressed files.
+Uncompressed files are ignored. If no files are specified, the integrity
+of compressed data read from standard input is verified. Data read from
+standard input must be all in the same compression format.
+
+Note that some xz files lack integrity information, and therefore can't
+be verified as reliably as the other formats can.
+
+The format for running ztest is:
+
+@example
+ztest [@var{options}] [@var{files}]
+@end example
+
+@noindent
+The exit status is 0 if all compressed files verify OK, 1 if
+environmental problems (file not found, invalid flags, I/O errors, etc),
+2 if any compressed file is corrupt or invalid.
+
+Ztest supports the following options:
+
+@table @samp
+@item --format=@var{fmt}
+Force the given compression format. Valid values for @var{fmt} are
+@samp{bz2}, @samp{gz}, @samp{lz} and @samp{xz}. If this option is used,
+all files not in the given format will fail.
+
+@item -q
+@itemx --quiet
+Quiet operation. Suppress all messages.
+
+@item -r
+@itemx --recursive
+Operate recursively on directories.
+
+@item -v
+@itemx --verbose
+Verbose mode. Show the verify status for each file processed.@*
+Further -v's increase the verbosity level.
+
+@end table
+
+
+@node Zupdate
+@chapter Zupdate
+@cindex zupdate
+
+Zupdate recompresses files from bzip2, gzip, and xz formats to lzip
+format. The originals are compared with the new files and then deleted.
+Only regular files with standard file name extensions are recompressed,
+other files are ignored. Compressed files are decompressed and then
+recompressed on the fly; no temporary files are created. The lzip format
+is chosen as destination because it is by far the most appropriate for
+long-term data archiving.
+
+If the lzip compressed version of a file already exists, the file is
+skipped unless the @samp{--force} option is given. In this case, if the
+comparison with the existing lzip version fails, an error is returned
+and the original file is not deleted. The operation of zupdate is meant
+to be safe and not produce any data loss. Therefore, existing lzip
+compressed files are never overwritten nor deleted.
+
+The names of the original files must have one of the following
+extensions: @samp{.bz2}, @samp{.tbz}, @samp{.tbz2}, @samp{.gz},
+@samp{.tgz}, @samp{.xz}, @samp{.txz}. The files produced have the
+extensions @samp{.lz} or @samp{.tar.lz}.
+
+The format for running zupdate is:
+
+@example
+zupdate [@var{options}] [@var{files}]
+@end example
+
+@noindent
+Exit status is 0 if all the compressed files were successfully
+recompressed (if needed), compared and deleted. Non-zero otherwise.
+
+Zupdate supports the following options:
+
+@table @samp
+@item -f
+@itemx --force
+Do not skip a file for which a lzip compressed version already exists.
+@samp{--force} compares the content of the input file with the content
+of the existing lzip file and deletes the input file if both contents
+are identical.
+
+@item -k
+@itemx --keep
+Keep (don't delete) the input file after comparing it with the lzip file.
+
+@item -l
+@itemx --lzip-verbose
+Pass a @samp{-v} option to the lzip compressor so that it shows the
+compression ratio for each file processed. Using lzip 1.15 and newer, a
+second @samp{-l} shows the progress of compression. Use it together with
+@samp{-v} to see the name of the file.
+
+@item -q
+@itemx --quiet
+Quiet operation. Suppress all messages.
+
+@item -r
+@itemx --recursive
+Operate recursively on directories.
+
+@item -v
+@itemx --verbose
+Verbose mode. Show the files being processed. A second @samp{-v} also
+shows the files being ignored.
+
+@item -0 .. -9
+Set the compression level of lzip. By default zupdate passes @samp{-9}
+to lzip.
+
+@end table
+
+
+@node Problems
+@chapter Reporting bugs
+@cindex bugs
+@cindex getting help
+
+There are probably bugs in zutils. There are certainly errors and
+omissions in this manual. If you report them, they will get fixed. If
+you don't, no one will ever know about them and they will remain unfixed
+for all eternity, if not longer.
+
+If you find a bug in zutils, please send electronic mail to
+@email{zutils-bug@@nongnu.org}. Include the version number, which you can
+find by running @w{@samp{zupdate --version}}.
+
+
+@node Concept index
+@unnumbered Concept index
+
+@printindex cp
+
+@bye