summaryrefslogtreecommitdiffstats
path: root/doc/zutils.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/zutils.texi')
-rw-r--r--doc/zutils.texi155
1 files changed, 91 insertions, 64 deletions
diff --git a/doc/zutils.texi b/doc/zutils.texi
index c494185..34a3128 100644
--- a/doc/zutils.texi
+++ b/doc/zutils.texi
@@ -6,10 +6,10 @@
@finalout
@c %**end of header
-@set UPDATED 5 January 2021
-@set VERSION 1.10
+@set UPDATED 25 January 2022
+@set VERSION 1.11
-@dircategory Data Compression
+@dircategory Compression
@direntry
* Zutils: (zutils). Utilities dealing with compressed files
@end direntry
@@ -50,7 +50,7 @@ This manual is for Zutils (version @value{VERSION}, @value{UPDATED}).
@end menu
@sp 1
-Copyright @copyright{} 2009-2021 Antonio Diaz Diaz.
+Copyright @copyright{} 2009-2022 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.
@@ -74,7 +74,7 @@ those utilities supporting it.
@noindent
The utilities provided are zcat, zcmp, zdiff, zgrep, ztest, and zupdate.@*
-The formats supported are bzip2, gzip, lzip, and xz.@*
+The formats supported are bzip2, gzip, lzip, xz, and zstd.@*
Zutils uses external compressors. The compressor to be used for each format
is configurable at runtime.
@@ -84,12 +84,15 @@ gzip's znew.
NOTE: Bzip2 and lzip provide well-defined values of exit status, which makes
them safe to use with zutils. Gzip and xz may return ambiguous warning
-values, making them less reliable back ends for zutils.
+values, making them less reliable back ends for zutils. Zstd currently does
+not even document its exit status in its man page.
@xref{compressor-requirements}.
FORMAT NOTE 1: The option @samp{--format} allows the processing of a subset
-of formats in recursive mode and when trying compressed file names:
-@w{@samp{zgrep foo -r --format=bz2,lz somedir somefile.tar}}.
+of formats in recursive mode and when trying compressed file names. For
+example, use the following command to search for the string @samp{foo} in
+gzip and lzip files only:
+@w{@samp{zgrep foo -r --format=gz,lz somedir somefile.tar}}.
FORMAT NOTE 2: If the option @samp{--force-format} is given, the files are
passed to the corresponding decompressor without verifying their format,
@@ -141,17 +144,19 @@ only supports the @samp{--help} form of this option.
@itemx --version
Print the version number on the standard output and exit.
This version number should be included in all bug reports.
+In verbose mode, zdiff and zgrep print also the version of the diff or grep
+program used respectively.
@item -M @var{format_list}
@itemx --format=@var{format_list}
-Process only the formats listed in the comma-separated
-@var{format_list}. Valid formats are @samp{bz2}, @samp{gz}, @samp{lz},
-@samp{xz}, and @samp{un} for @samp{uncompressed}, meaning "any file name
-without a known extension". This option excludes files based on
-extension, instead of format, because it is more efficient. The
-exclusion only applies to names generated automatically (for example
-when adding extensions to a file name or when operating recursively on
-directories). Files given in the command line are always processed.
+Process only the formats listed in the comma-separated @var{format_list}.
+Valid formats are @samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, @samp{zst},
+and @samp{un} for @samp{uncompressed}, meaning "any file name without a
+known extension". This option excludes files based on extension, instead of
+format, because it is more efficient. The exclusion only applies to names
+generated automatically (for example when adding extensions to a file name
+or when operating recursively on directories). Files given in the command
+line are always processed.
Each format in @var{format_list} enables file names with the following
extensions:
@@ -161,6 +166,7 @@ extensions:
@item gz @tab enables @tab .gz .tgz
@item lz @tab enables @tab .lz .tlz
@item xz @tab enables @tab .xz .txz
+@item zst @tab enables @tab .zst .tzst
@item un @tab enables @tab any other file name
@end multitable
@@ -172,19 +178,21 @@ Don't read the runtime configuration file @samp{zutilsrc}.
@itemx --gz=@var{command}
@itemx --lz=@var{command}
@itemx --xz=@var{command}
+@itemx --zst=@var{command}
Set program to be used as (de)compressor for the corresponding format.
@var{command} may include arguments. For example
@w{@samp{--lz='plzip --threads=2'}}. The program set with @samp{--lz} is
-used for both compression and decompression. The other three are used only
-for decompression. The name of the program can't begin with @samp{-}. These
+used for both compression and decompression. The others are used only for
+decompression. The name of the program can't begin with @samp{-}. These
options override the values set in @file{zutilsrc}. The compression program
used must meet three requirements:
@anchor{compressor-requirements}
@enumerate
@item
-When called with the option @samp{-d}, it must read compressed data from
-the standard input and produce decompressed data on the standard output.
+When called with the option @samp{-d} and without file names, it must read
+compressed data from the standard input and produce decompressed data on the
+standard output.
@item
If the option @samp{-q} is passed to zutils, the compression program must
also accept it.
@@ -220,7 +228,8 @@ format, with the syntax:
@example
<format> = <compressor> [options]
@end example
-where <format> is one of @samp{bz2}, @samp{gz}, @samp{lz}, or @samp{xz}.
+where <format> is one of @samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, or
+@samp{zst}.
@end enumerate
@@ -278,10 +287,10 @@ Number all output lines, starting with 1. The line count is unlimited.
@item -O @var{format}
@itemx --force-format=@var{format}
Force the compressed format given. Valid values for @var{format} are
-@samp{bz2}, @samp{gz}, @samp{lz}, and @samp{xz}. If this option is used,
-the files are passed to the corresponding decompressor without verifying
-their format, and the exact file name must be given. Other names won't
-be tried.
+@samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, and @samp{zst}. If this option
+is used, the files are passed to the corresponding decompressor without
+verifying their format, and the exact file name must be given. Other names
+won't be tried.
@item -q
@itemx --quiet
@@ -350,7 +359,7 @@ the corresponding uncompressed file (the name of @var{file1} with the
extension removed).
@item
If @var{file1} is uncompressed, compares it with the decompressed
-contents of @var{file1}.[lz|bz2|gz|xz] (the first one that is found).
+contents of @var{file1}.[lz|bz2|gz|zst|xz] (the first one that is found).
@end itemize
@noindent
@@ -387,13 +396,13 @@ Compare at most @var{count} input bytes.
@item -O [@var{format1}][,@var{format2}]
@itemx --force-format=[@var{format1}][,@var{format2}]
-Force the compressed formats given. Any of @var{format1} or
-@var{format2} may be omitted and the corresponding format will be
-automatically detected. Valid values for @var{format} are @samp{bz2},
-@samp{gz}, @samp{lz}, and @samp{xz}. If at least one format is specified
-with this option, the file is passed to the corresponding decompressor
-without verifying its format, and the exact file names of both
-@var{file1} and @var{file2} must be given. Other names won't be tried.
+Force the compressed formats given. Any of @var{format1} or @var{format2}
+may be omitted and the corresponding format will be automatically detected.
+Valid values for @var{format} are @samp{bz2}, @samp{gz}, @samp{lz},
+@samp{xz}, and @samp{zst}. If at least one format is specified with this
+option, the file is passed to the corresponding decompressor without
+verifying its format, and the exact file names of both @var{file1} and
+@var{file2} must be given. Other names won't be tried.
@item -q
@itemx -s
@@ -434,7 +443,7 @@ the corresponding uncompressed file (the name of @var{file1} with the
extension removed).
@item
If @var{file1} is uncompressed, compares it with the decompressed
-contents of @var{file1}.[lz|bz2|gz|xz] (the first one that is found).
+contents of @var{file1}.[lz|bz2|gz|zst|xz] (the first one that is found).
@end itemize
@noindent
@@ -478,13 +487,13 @@ Ignore case differences in file contents.
@item -O [@var{format1}][,@var{format2}]
@itemx --force-format=[@var{format1}][,@var{format2}]
-Force the compressed formats given. Any of @var{format1} or
-@var{format2} may be omitted and the corresponding format will be
-automatically detected. Valid values for @var{format} are @samp{bz2},
-@samp{gz}, @samp{lz}, and @samp{xz}. If at least one format is specified
-with this option, the file is passed to the corresponding decompressor
-without verifying its format, and the exact file names of both
-@var{file1} and @var{file2} must be given. Other names won't be tried.
+Force the compressed formats given. Any of @var{format1} or @var{format2}
+may be omitted and the corresponding format will be automatically detected.
+Valid values for @var{format} are @samp{bz2}, @samp{gz}, @samp{lz},
+@samp{xz}, and @samp{zst}. If at least one format is specified with this
+option, the file is passed to the corresponding decompressor without
+verifying its format, and the exact file names of both @var{file1} and
+@var{file2} must be given. Other names won't be tried.
@item -p
@itemx --show-c-function
@@ -513,6 +522,11 @@ Use the unified output format.
@itemx --unified=@var{n}
Same as -u but use @var{n} lines of context.
+@item -v
+@itemx --verbose
+When specified before @samp{--version}, print the version of the diff
+program used.
+
@item -w
@itemx --ignore-all-space
Ignore all white space.
@@ -644,10 +658,10 @@ Show only the part of matching lines that actually matches @var{pattern}.
@item -O @var{format}
@itemx --force-format=@var{format}
Force the compressed format given. Valid values for @var{format} are
-@samp{bz2}, @samp{gz}, @samp{lz}, and @samp{xz}. If this option is used,
-the files are passed to the corresponding decompressor without verifying
-their format, and the exact file name must be given. Other names won't
-be tried.
+@samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, and @samp{zst}. If this option
+is used, the files are passed to the corresponding decompressor without
+verifying their format, and the exact file name must be given. Other names
+won't be tried.
@item -q
@itemx --quiet
@@ -674,7 +688,8 @@ Suppress error messages about nonexistent or unreadable files.
Select non-matching lines.
@item --verbose
-Verbose mode. Show error messages.
+Verbose mode. Show error messages. When specified before @samp{--version},
+print the version of the grep program used.
@item -w
@itemx --word-regexp
@@ -703,6 +718,10 @@ test when testing multiple files.
If no files are specified, recursive searches examine the current working
directory, and nonrecursive searches read standard input.
+Bzip2, gzip, and lzip are the primary formats. Xz and zstd are optional. If
+the decompressor for the xz or zstd formats is not found, the corresponding
+files are ignored.
+
Note that error detection in the xz format is broken. First, some xz
files lack integrity information. Second, not all xz decompressors can
@uref{http://www.nongnu.org/lzip/xz_inadequate.html#fragmented,,verify the integrity}
@@ -730,11 +749,11 @@ ztest supports the following options:
@item -O @var{format}
@itemx --force-format=@var{format}
Force the compressed format given. Valid values for @var{format} are
-@samp{bz2}, @samp{gz}, @samp{lz}, and @samp{xz}. If this option is used, the
-files are passed to the corresponding decompressor without verifying their
-format, and any files in a format that the decompressor can't understand
-will fail. For example, @samp{--force-format=gz} can test gzipped (.gz) and
-compress'd (.Z) files if the compressor used is GNU gzip.
+@samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, and @samp{zst}. If this option
+is used, the files are passed to the corresponding decompressor without
+verifying their format, and any files in a format that the decompressor
+can't understand will fail. For example, @samp{--force-format=gz} can test
+gzipped (.gz) and compress'd (.Z) files if the compressor used is GNU gzip.
@item -q
@itemx --quiet
@@ -763,14 +782,14 @@ Further -v's increase the verbosity level.
@chapter Zupdate
@cindex zupdate
-zupdate recompresses files from bzip2, gzip, and xz formats to lzip format.
-Each original is compared with the new file and then deleted. Only regular
-files with standard file name extensions are recompressed, other files are
-ignored. Compressed files are decompressed and then recompressed on the fly;
-no temporary files are created. If an error happens while recompressing a
-file, zupdate exits immediately without recompressing the rest of the files.
-The lzip format is chosen as destination because it is the most appropriate
-for long-term data archiving.
+zupdate recompresses files from bzip2, gzip, xz, and zstd formats to lzip
+format. Each original is compared with the new file and then deleted. Only
+regular files with standard file name extensions are recompressed, other
+files are ignored. Compressed files are decompressed and then recompressed
+on the fly; no temporary files are created. If an error happens while
+recompressing a file, zupdate exits immediately without recompressing the
+rest of the files. The lzip format is chosen as destination because it is
+the most appropriate for long-term data archiving.
If no files are specified, recursive searches examine the current working
directory, and nonrecursive searches do nothing.
@@ -782,21 +801,29 @@ and the original file is not deleted. The operation of zupdate is meant
to be safe and not cause any data loss. Therefore, existing lzip
compressed files are never overwritten nor deleted.
+Recompressing files from a read-only file system to another place can be
+done by first linking the files from the destination directory and then
+compressing the links: @w{@samp{ln -s /src/foo.gz . && zupdate foo.gz}}
+
Combining the options @samp{--force} and @samp{--keep}, as in
@w{@samp{zupdate -f -k *.gz}}, verifies that there are no differences
between each pair of files in a multiformat set of files.
The names of the original files must have one of the following extensions:@*
-@samp{.bz2}, @samp{.gz}, or @samp{.xz}, which are recompressed to
-@samp{.lz};@*
-@samp{.tbz}, @samp{.tbz2}, @samp{.tgz}, or @samp{.txz}, which are
-recompressed to @samp{.tlz}.@*
+@samp{.bz2}, @samp{.gz}, @samp{.xz}, or @samp{.zst}, which are recompressed
+to @samp{.lz};@*
+@samp{.tbz}, @samp{.tbz2}, @samp{.tgz}, @samp{.txz}, or @samp{.tzst}, which
+are recompressed to @samp{.tlz}.@*
Keeping the combined extensions (@samp{.tgz} --> @samp{.tlz}) may be useful
when recompressing Slackware packages, for example.
+Bzip2, gzip, and lzip are the primary formats. Xz and zstd are optional. If
+the decompressor for the xz or zstd formats is not found, the corresponding
+files are ignored.
+
Recompressing a file is much like copying or moving it; therefore zupdate
preserves the access and modification dates, permissions, and, when
-possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
+possible, ownership of the file just as @w{@samp{cp -p}} does. (If the user ID or
the group ID can't be duplicated, the file permission bits S_ISUID and
S_ISGID are cleared).