From 29cd838eab01ed7110f3ccb2e8c6a35c8a31dbcc Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Thu, 11 Apr 2024 10:21:29 +0200 Subject: Adding upstream version 1:0.1.9998svn3589+dfsg. Signed-off-by: Daniel Baumann --- src/grep/doc/grep.texi | 2109 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2109 insertions(+) create mode 100644 src/grep/doc/grep.texi (limited to 'src/grep/doc/grep.texi') diff --git a/src/grep/doc/grep.texi b/src/grep/doc/grep.texi new file mode 100644 index 0000000..01ac81e --- /dev/null +++ b/src/grep/doc/grep.texi @@ -0,0 +1,2109 @@ +\input texinfo @c -*-texinfo-*- +@c %**start of header +@setfilename grep.info +@include version.texi +@settitle GNU Grep @value{VERSION} + +@c Combine indices. +@syncodeindex ky cp +@syncodeindex pg cp +@syncodeindex tp cp +@defcodeindex op +@syncodeindex op cp +@syncodeindex vr cp +@c %**end of header + +@documentencoding UTF-8 +@c These two require Texinfo 5.0 or later, so use the older +@c equivalent @set variables supported in 4.11 and later. +@ignore +@codequotebacktick on +@codequoteundirected on +@end ignore +@set txicodequoteundirected +@set txicodequotebacktick +@iftex +@c TeX sometimes fails to hyphenate, so help it here. +@hyphenation{spec-i-fied} +@end iftex + +@copying +This manual is for @command{grep}, a pattern matching engine. + +Copyright @copyright{} 1999--2002, 2005, 2008--2021 Free Software Foundation, +Inc. + +@quotation +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the section entitled +``GNU Free Documentation License''. +@end quotation +@end copying + +@dircategory Text creation and manipulation +@direntry +* grep: (grep). Print lines that match patterns. +@end direntry + +@titlepage +@title GNU Grep: Print lines that match patterns +@subtitle version @value{VERSION}, @value{UPDATED} +@author Alain Magloire et al. +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@contents + + +@ifnottex +@node Top +@top grep + +@command{grep} prints lines that contain a match for one or more patterns. + +This manual is for version @value{VERSION} of GNU Grep. + +@insertcopying +@end ifnottex + +@menu +* Introduction:: Introduction. +* Invoking:: Command-line options, environment, exit status. +* Regular Expressions:: Regular Expressions. +* Usage:: Examples. +* Performance:: Performance tuning. +* Reporting Bugs:: Reporting Bugs. +* Copying:: License terms for this manual. +* Index:: Combined index. +@end menu + + +@node Introduction +@chapter Introduction + +@cindex searching for patterns + +Given one or more patterns, @command{grep} searches input files +for matches to the patterns. +When it finds a match in a line, +it copies the line to standard output (by default), +or produces whatever other sort of output you have requested with options. + +Though @command{grep} expects to do the matching on text, +it has no limits on input line length other than available memory, +and it can match arbitrary characters within a line. +If the final byte of an input file is not a newline, +@command{grep} silently supplies one. +Since newline is also a separator for the list of patterns, +there is no way to match newline characters in a text. + + +@node Invoking +@chapter Invoking @command{grep} + +The general synopsis of the @command{grep} command line is + +@example +grep [@var{option}...] [@var{patterns}] [@var{file}...] +@end example + +@noindent +There can be zero or more @var{option} arguments, and zero or more +@var{file} arguments. The @var{patterns} argument contains one or +more patterns separated by newlines, and is omitted when patterns are +given via the @samp{-e@ @var{patterns}} or @samp{-f@ @var{file}} +options. Typically @var{patterns} should be quoted when +@command{grep} is used in a shell command. + +@menu +* Command-line Options:: Short and long names, grouped by category. +* Environment Variables:: POSIX, GNU generic, and GNU grep specific. +* Exit Status:: Exit status returned by @command{grep}. +* grep Programs:: @command{grep} programs. +@end menu + +@node Command-line Options +@section Command-line Options + +@command{grep} comes with a rich set of options: +some from POSIX and some being GNU extensions. +Long option names are always a GNU extension, +even for options that are from POSIX specifications. +Options that are specified by POSIX, +under their short names, +are explicitly marked as such +to facilitate POSIX-portable programming. +A few option names are provided +for compatibility with older or more exotic implementations. + +@menu +* Generic Program Information:: +* Matching Control:: +* General Output Control:: +* Output Line Prefix Control:: +* Context Line Control:: +* File and Directory Selection:: +* Other Options:: +@end menu + +Several additional options control +which variant of the @command{grep} matching engine is used. +@xref{grep Programs}. + +@node Generic Program Information +@subsection Generic Program Information + +@table @option + +@item --help +@opindex --help +@cindex usage summary, printing +Print a usage message briefly summarizing the command-line options +and the bug-reporting address, then exit. + +@item -V +@itemx --version +@opindex -V +@opindex --version +@cindex version, printing +Print the version number of @command{grep} to the standard output stream. +This version number should be included in all bug reports. + +@end table + +@node Matching Control +@subsection Matching Control + +@table @option + +@item -e @var{patterns} +@itemx --regexp=@var{patterns} +@opindex -e +@opindex --regexp=@var{patterns} +@cindex patterns option +Use @var{patterns} as one or more patterns; newlines within +@var{patterns} separate each pattern from the next. +If this option is used multiple times or is combined with the +@option{-f} (@option{--file}) option, search for all patterns given. +Typically @var{patterns} should be quoted when @command{grep} is used +in a shell command. +(@option{-e} is specified by POSIX.) + +@item -f @var{file} +@itemx --file=@var{file} +@opindex -f +@opindex --file +@cindex patterns from file +Obtain patterns from @var{file}, one per line. +If this option is used multiple times or is combined with the +@option{-e} (@option{--regexp}) option, search for all patterns given. +The empty file contains zero patterns, and therefore matches nothing. +(@option{-f} is specified by POSIX.) + +@item -i +@itemx -y +@itemx --ignore-case +@opindex -i +@opindex -y +@opindex --ignore-case +@cindex case insensitive search +Ignore case distinctions in patterns and input data, +so that characters that differ only in case +match each other. Although this is straightforward when letters +differ in case only via lowercase-uppercase pairs, the behavior is +unspecified in other situations. For example, uppercase ``S'' has an +unusual lowercase counterpart ``ſ'' (Unicode character U+017F, LATIN +SMALL LETTER LONG S) in many locales, and it is unspecified whether +this unusual character matches ``S'' or ``s'' even though uppercasing +it yields ``S''. Another example: the lowercase German letter ``ß'' +(U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the +two-character string ``SS'' but it does not match ``SS'', and it might +not match the uppercase letter ``ẞ'' (U+1E9E, LATIN CAPITAL LETTER +SHARP S) even though lowercasing the latter yields the former. + +@option{-y} is an obsolete synonym that is provided for compatibility. +(@option{-i} is specified by POSIX.) + +@item --no-ignore-case +@opindex --no-ignore-case +Do not ignore case distinctions in patterns and input data. This is +the default. This option is useful for passing to shell scripts that +already use @option{-i}, in order to cancel its effects because the +two options override each other. + +@item -v +@itemx --invert-match +@opindex -v +@opindex --invert-match +@cindex invert matching +@cindex print non-matching lines +Invert the sense of matching, to select non-matching lines. +(@option{-v} is specified by POSIX.) + +@item -w +@itemx --word-regexp +@opindex -w +@opindex --word-regexp +@cindex matching whole words +Select only those lines containing matches that form whole words. +The test is that the matching substring must either +be at the beginning of the line, +or preceded by a non-word constituent character. +Similarly, +it must be either at the end of the line +or followed by a non-word constituent character. +Word constituent characters are letters, digits, and the underscore. +This option has no effect if @option{-x} is also specified. + +Because the @option{-w} option can match a substring that does not +begin and end with word constituents, it differs from surrounding a +regular expression with @samp{\<} and @samp{\>}. For example, although +@samp{grep -w @@} matches a line containing only @samp{@@}, @samp{grep +'\<@@\>'} cannot match any line because @samp{@@} is not a +word constituent. @xref{The Backslash Character and Special +Expressions}. + +@item -x +@itemx --line-regexp +@opindex -x +@opindex --line-regexp +@cindex match the whole line +Select only those matches that exactly match the whole line. +For regular expression patterns, this is like parenthesizing each +pattern and then surrounding it with @samp{^} and @samp{$}. +(@option{-x} is specified by POSIX.) + +@end table + +@node General Output Control +@subsection General Output Control + +@table @option + +@item -c +@itemx --count +@opindex -c +@opindex --count +@cindex counting lines +Suppress normal output; +instead print a count of matching lines for each input file. +With the @option{-v} (@option{--invert-match}) option, +count non-matching lines. +(@option{-c} is specified by POSIX.) + +@item --color[=@var{WHEN}] +@itemx --colour[=@var{WHEN}] +@opindex --color +@opindex --colour +@cindex highlight, color, colour +Surround the matched (non-empty) strings, matching lines, context lines, +file names, line numbers, byte offsets, and separators (for fields and +groups of context lines) with escape sequences to display them in color +on the terminal. +The colors are defined by the environment variable @env{GREP_COLORS} +and default to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} +for bold red matched text, magenta file names, green line numbers, +green byte offsets, cyan separators, and default terminal colors otherwise. +The deprecated environment variable @env{GREP_COLOR} is still supported, +but its setting does not have priority; +it defaults to @samp{01;31} (bold red) +which only covers the color for matched text. +@var{WHEN} is @samp{never}, @samp{always}, or @samp{auto}. + +@item -L +@itemx --files-without-match +@opindex -L +@opindex --files-without-match +@cindex files which don't match +Suppress normal output; +instead print the name of each input file from which +no output would normally have been printed. + +@item -l +@itemx --files-with-matches +@opindex -l +@opindex --files-with-matches +@cindex names of matching files +Suppress normal output; +instead print the name of each input file from which +output would normally have been printed. +Scanning each input file stops upon first match. +(@option{-l} is specified by POSIX.) + +@item -m @var{num} +@itemx --max-count=@var{num} +@opindex -m +@opindex --max-count +@cindex max-count +Stop after the first @var{num} selected lines. +If the input is standard input from a regular file, +and @var{num} selected lines are output, +@command{grep} ensures that the standard input is positioned +just after the last selected line before exiting, +regardless of the presence of trailing context lines. +This enables a calling process to resume a search. +For example, the following shell script makes use of it: + +@example +while grep -m 1 'PATTERN' +do + echo xxxx +done < FILE +@end example + +But the following probably will not work because a pipe is not a regular +file: + +@example +# This probably will not work. +cat FILE | +while grep -m 1 'PATTERN' +do + echo xxxx +done +@end example + +@cindex context lines +When @command{grep} stops after @var{num} selected lines, +it outputs any trailing context lines. +When the @option{-c} or @option{--count} option is also used, +@command{grep} does not output a count greater than @var{num}. +When the @option{-v} or @option{--invert-match} option is also used, +@command{grep} stops after outputting @var{num} non-matching lines. + +@item -o +@itemx --only-matching +@opindex -o +@opindex --only-matching +@cindex only matching +Print only the matched (non-empty) parts of matching lines, +with each such part on a separate output line. +Output lines use the same delimiters as input, and delimiters are null +bytes if @option{-z} (@option{--null-data}) is also used (@pxref{Other +Options}). + +@item -q +@itemx --quiet +@itemx --silent +@opindex -q +@opindex --quiet +@opindex --silent +@cindex quiet, silent +Quiet; do not write anything to standard output. +Exit immediately with zero status if any match is found, +even if an error was detected. +Also see the @option{-s} or @option{--no-messages} option. +(@option{-q} is specified by POSIX.) + +@item -s +@itemx --no-messages +@opindex -s +@opindex --no-messages +@cindex suppress error messages +Suppress error messages about nonexistent or unreadable files. +Portability note: +unlike GNU @command{grep}, +7th Edition Unix @command{grep} did not conform to POSIX, +because it lacked @option{-q} +and its @option{-s} option behaved like +GNU @command{grep}'s @option{-q} option.@footnote{Of course, 7th Edition +Unix predated POSIX by several years!} +USG-style @command{grep} also lacked @option{-q} +but its @option{-s} option behaved like GNU @command{grep}'s. +Portable shell scripts should avoid both +@option{-q} and @option{-s} and should redirect +standard and error output to @file{/dev/null} instead. +(@option{-s} is specified by POSIX.) + +@end table + +@node Output Line Prefix Control +@subsection Output Line Prefix Control + +When several prefix fields are to be output, +the order is always file name, line number, and byte offset, +regardless of the order in which these options were specified. + +@table @option + +@item -b +@itemx --byte-offset +@opindex -b +@opindex --byte-offset +@cindex byte offset +Print the 0-based byte offset within the input file +before each line of output. +If @option{-o} (@option{--only-matching}) is specified, +print the offset of the matching part itself. + +@item -H +@itemx --with-filename +@opindex -H +@opindex --with-filename +@cindex with filename prefix +Print the file name for each match. +This is the default when there is more than one file to search. + +@item -h +@itemx --no-filename +@opindex -h +@opindex --no-filename +@cindex no filename prefix +Suppress the prefixing of file names on output. +This is the default when there is only one file +(or only standard input) to search. + +@item --label=@var{LABEL} +@opindex --label +@cindex changing name of standard input +Display input actually coming from standard input +as input coming from file @var{LABEL}. +This can be useful for commands that transform a file's contents +before searching; e.g.: + +@example +gzip -cd foo.gz | grep --label=foo -H 'some pattern' +@end example + +@item -n +@itemx --line-number +@opindex -n +@opindex --line-number +@cindex line numbering +Prefix each line of output with the 1-based line number within its input file. +(@option{-n} is specified by POSIX.) + +@item -T +@itemx --initial-tab +@opindex -T +@opindex --initial-tab +@cindex tab-aligned content lines +Make sure that the first character of actual line content lies on a tab stop, +so that the alignment of tabs looks normal. +This is useful with options that prefix their output to the actual content: +@option{-H}, @option{-n}, and @option{-b}. +This may also prepend spaces to output line numbers and byte offsets +so that lines from a single file all start at the same column. + +@item -Z +@itemx --null +@opindex -Z +@opindex --null +@cindex zero-terminated file names +Output a zero byte (the ASCII NUL character) +instead of the character that normally follows a file name. +For example, +@samp{grep -lZ} outputs a zero byte after each file name +instead of the usual newline. +This option makes the output unambiguous, +even in the presence of file names containing unusual characters like newlines. +This option can be used with commands like +@samp{find -print0}, @samp{perl -0}, @samp{sort -z}, and @samp{xargs -0} +to process arbitrary file names, +even those that contain newline characters. + +@end table + +@node Context Line Control +@subsection Context Line Control + +@cindex context lines +@dfn{Context lines} are non-matching lines that are near a matching line. +They are output only if one of the following options are used. +Regardless of how these options are set, +@command{grep} never outputs any given line more than once. +If the @option{-o} (@option{--only-matching}) option is specified, +these options have no effect and a warning is given upon their use. + +@table @option + +@item -A @var{num} +@itemx --after-context=@var{num} +@opindex -A +@opindex --after-context +@cindex after context +@cindex context lines, after match +Print @var{num} lines of trailing context after matching lines. + +@item -B @var{num} +@itemx --before-context=@var{num} +@opindex -B +@opindex --before-context +@cindex before context +@cindex context lines, before match +Print @var{num} lines of leading context before matching lines. + +@item -C @var{num} +@itemx -@var{num} +@itemx --context=@var{num} +@opindex -C +@opindex --context +@opindex -@var{num} +@cindex context lines +Print @var{num} lines of leading and trailing output context. + +@item --group-separator=@var{string} +@opindex --group-separator +@cindex group separator +When @option{-A}, @option{-B} or @option{-C} are in use, +print @var{string} instead of @option{--} between groups of lines. + +@item --no-group-separator +@opindex --group-separator +@cindex group separator +When @option{-A}, @option{-B} or @option{-C} are in use, +do not print a separator between groups of lines. + +@end table + +Here are some points about how @command{grep} chooses +the separator to print between prefix fields and line content: + +@itemize @bullet +@item +Matching lines normally use @samp{:} as a separator +between prefix fields and actual line content. + +@item +Context (i.e., non-matching) lines use @samp{-} instead. + +@item +When context is not specified, +matching lines are simply output one right after another. + +@item +When context is specified, +lines that are adjacent in the input form a group +and are output one right after another, while +by default a separator appears between non-adjacent groups. + +@item +The default separator +is a @samp{--} line; its presence and appearance +can be changed with the options above. + +@item +Each group may contain +several matching lines when they are close enough to each other +that two adjacent groups connect and can merge into a single +contiguous one. +@end itemize + +@node File and Directory Selection +@subsection File and Directory Selection + +@table @option + +@item -a +@itemx --text +@opindex -a +@opindex --text +@cindex suppress binary data +@cindex binary files +Process a binary file as if it were text; +this is equivalent to the @samp{--binary-files=text} option. + +@item --binary-files=@var{type} +@opindex --binary-files +@cindex binary files +If a file's data or metadata +indicate that the file contains binary data, +assume that the file is of type @var{type}. +Non-text bytes indicate binary data; these are either output bytes that are +improperly encoded for the current locale (@pxref{Environment +Variables}), or null input bytes when the +@option{-z} (@option{--null-data}) option is not given (@pxref{Other +Options}). + +By default, @var{type} is @samp{binary}, and @command{grep} +suppresses output after null input binary data is discovered, +and suppresses output lines that contain improperly encoded data. +When some output is suppressed, @command{grep} follows any output +with a one-line message saying that a binary file matches. + +If @var{type} is @samp{without-match}, +when @command{grep} discovers null input binary data +it assumes that the rest of the file does not match; +this is equivalent to the @option{-I} option. + +If @var{type} is @samp{text}, +@command{grep} processes binary data as if it were text; +this is equivalent to the @option{-a} option. + +When @var{type} is @samp{binary}, @command{grep} may treat non-text +bytes as line terminators even without the @option{-z} +(@option{--null-data}) option. This means choosing @samp{binary} +versus @samp{text} can affect whether a pattern matches a file. For +example, when @var{type} is @samp{binary} the pattern @samp{q$} might +match @samp{q} immediately followed by a null byte, even though this +is not matched when @var{type} is @samp{text}. Conversely, when +@var{type} is @samp{binary} the pattern @samp{.} (period) might not +match a null byte. + +@emph{Warning:} The @option{-a} (@option{--binary-files=text}) option +might output binary garbage, which can have nasty side effects if the +output is a terminal and if the terminal driver interprets some of it +as commands. On the other hand, when reading files whose text +encodings are unknown, it can be helpful to use @option{-a} or to set +@samp{LC_ALL='C'} in the environment, in order to find more matches +even if the matches are unsafe for direct display. + +@item -D @var{action} +@itemx --devices=@var{action} +@opindex -D +@opindex --devices +@cindex device search +If an input file is a device, FIFO, or socket, use @var{action} to process it. +If @var{action} is @samp{read}, +all devices are read just as if they were ordinary files. +If @var{action} is @samp{skip}, +devices, FIFOs, and sockets are silently skipped. +By default, devices are read if they are on the command line or if the +@option{-R} (@option{--dereference-recursive}) option is used, and are +skipped if they are encountered recursively and the @option{-r} +(@option{--recursive}) option is used. +This option has no effect on a file that is read via standard input. + +@item -d @var{action} +@itemx --directories=@var{action} +@opindex -d +@opindex --directories +@cindex directory search +@cindex symbolic links +If an input file is a directory, use @var{action} to process it. +By default, @var{action} is @samp{read}, +which means that directories are read just as if they were ordinary files +(some operating systems and file systems disallow this, +and will cause @command{grep} +to print error messages for every directory or silently skip them). +If @var{action} is @samp{skip}, directories are silently skipped. +If @var{action} is @samp{recurse}, +@command{grep} reads all files under each directory, recursively, +following command-line symbolic links and skipping other symlinks; +this is equivalent to the @option{-r} option. + +@item --exclude=@var{glob} +@opindex --exclude +@cindex exclude files +@cindex searching directory trees +Skip any command-line file with a name suffix that matches the pattern +@var{glob}, using wildcard matching; a name suffix is either the whole +name, or a trailing part that starts with a non-slash character +immediately after a slash (@samp{/}) in the name. +When searching recursively, skip any subfile whose base +name matches @var{glob}; the base name is the part after the last +slash. A pattern can use +@samp{*}, @samp{?}, and @samp{[}...@samp{]} as wildcards, +and @code{\} to quote a wildcard or backslash character literally. + +@item --exclude-from=@var{file} +@opindex --exclude-from +@cindex exclude files +@cindex searching directory trees +Skip files whose name matches any of the patterns +read from @var{file} (using wildcard matching as described +under @option{--exclude}). + +@item --exclude-dir=@var{glob} +@opindex --exclude-dir +@cindex exclude directories +Skip any command-line directory with a name suffix that matches the +pattern @var{glob}. When searching recursively, skip any subdirectory +whose base name matches @var{glob}. Ignore any redundant trailing +slashes in @var{glob}. + +@item -I +Process a binary file as if it did not contain matching data; +this is equivalent to the @samp{--binary-files=without-match} option. + +@item --include=@var{glob} +@opindex --include +@cindex include files +@cindex searching directory trees +Search only files whose name matches @var{glob}, +using wildcard matching as described under @option{--exclude}. +If contradictory @option{--include} and @option{--exclude} options are +given, the last matching one wins. If no @option{--include} or +@option{--exclude} options match, a file is included unless the first +such option is @option{--include}. + +@item -r +@itemx --recursive +@opindex -r +@opindex --recursive +@cindex recursive search +@cindex searching directory trees +@cindex symbolic links +For each directory operand, +read and process all files in that directory, recursively. +Follow symbolic links on the command line, but skip symlinks +that are encountered recursively. +Note that if no file operand is given, grep searches the working directory. +This is the same as the @samp{--directories=recurse} option. + +@item -R +@itemx --dereference-recursive +@opindex -R +@opindex --dereference-recursive +@cindex recursive search +@cindex searching directory trees +@cindex symbolic links +For each directory operand, read and process all files in that +directory, recursively, following all symbolic links. + +@end table + +@node Other Options +@subsection Other Options + +@table @option + +@item -- +@opindex -- +@cindex option delimiter +Delimit the option list. Later arguments, if any, are treated as +operands even if they begin with @samp{-}. For example, @samp{grep PAT -- +-file1 file2} searches for the pattern PAT in the files named @file{-file1} +and @file{file2}. + +@item --line-buffered +@opindex --line-buffered +@cindex line buffering +Use line buffering for standard output, regardless of output device. +By default, standard output is line buffered for interactive devices, +and is fully buffered otherwise. With full buffering, the output +buffer is flushed when full; with line buffering, the buffer is also +flushed after every output line. The buffer size is system dependent. + +@item -U +@itemx --binary +@opindex -U +@opindex --binary +@cindex MS-Windows binary I/O +@cindex binary I/O +On platforms that distinguish between text and binary I/O, +use the latter when reading and writing files other +than the user's terminal, so that all input bytes are read and written +as-is. This overrides the default behavior where @command{grep} +follows the operating system's advice whether to use text or binary +I/O@. On MS-Windows when @command{grep} uses text I/O it reads a +carriage return--newline pair as a newline and a Control-Z as +end-of-file, and it writes a newline as a carriage return--newline +pair. + +When using text I/O @option{--byte-offset} (@option{-b}) counts and +@option{--binary-files} heuristics apply to input data after text-I/O +processing. Also, the @option{--binary-files} heuristics need not agree +with the @option{--binary} option; that is, they may treat the data as +text even if @option{--binary} is given, or vice versa. +@xref{File and Directory Selection}. + +This option has no effect on GNU and other POSIX-compatible platforms, +which do not distinguish text from binary I/O. + +@item -z +@itemx --null-data +@opindex -z +@opindex --null-data +@cindex zero-terminated lines +Treat input and output data as sequences of lines, each terminated by +a zero byte (the ASCII NUL character) instead of a newline. +Like the @option{-Z} or @option{--null} option, +this option can be used with commands like +@samp{sort -z} to process arbitrary file names. + +@end table + +@node Environment Variables +@section Environment Variables + +The behavior of @command{grep} is affected +by the following environment variables. + +@vindex LANGUAGE @r{environment variable} +@vindex LC_ALL @r{environment variable} +@vindex LC_MESSAGES @r{environment variable} +@vindex LANG @r{environment variable} +The locale for category @w{@code{LC_@var{foo}}} +is specified by examining the three environment variables +@env{LC_ALL}, @w{@env{LC_@var{foo}}}, and @env{LANG}, +in that order. +The first of these variables that is set specifies the locale. +For example, if @env{LC_ALL} is not set, +but @env{LC_COLLATE} is set to @samp{pt_BR}, +then the Brazilian Portuguese locale is used +for the @env{LC_COLLATE} category. +As a special case for @env{LC_MESSAGES} only, the environment variable +@env{LANGUAGE} can contain a colon-separated list of languages that +overrides the three environment variables that ordinarily specify +the @env{LC_MESSAGES} category. +The @samp{C} locale is used if none of these environment variables are set, +if the locale catalog is not installed, +or if @command{grep} was not compiled +with national language support (NLS). +The shell command @code{locale -a} lists locales that are currently available. + +Many of the environment variables in the following list let you +control highlighting using +Select Graphic Rendition (SGR) +commands interpreted by the terminal or terminal emulator. +(See the +section +in the documentation of your text terminal +for permitted values and their meanings as character attributes.) +These substring values are integers in decimal representation +and can be concatenated with semicolons. +@command{grep} takes care of assembling the result +into a complete SGR sequence (@samp{\33[}...@samp{m}). +Common values to concatenate include +@samp{1} for bold, +@samp{4} for underline, +@samp{5} for blink, +@samp{7} for inverse, +@samp{39} for default foreground color, +@samp{30} to @samp{37} for foreground colors, +@samp{90} to @samp{97} for 16-color mode foreground colors, +@samp{38;5;0} to @samp{38;5;255} +for 88-color and 256-color modes foreground colors, +@samp{49} for default background color, +@samp{40} to @samp{47} for background colors, +@samp{100} to @samp{107} for 16-color mode background colors, +and @samp{48;5;0} to @samp{48;5;255} +for 88-color and 256-color modes background colors. + +The two-letter names used in the @env{GREP_COLORS} environment variable +(and some of the others) refer to terminal ``capabilities,'' the ability +of a terminal to highlight text, or change its color, and so on. +These capabilities are stored in an online database and accessed by +the @code{terminfo} library. + +@cindex environment variables + +@table @env + +@item GREP_COLOR +@vindex GREP_COLOR @r{environment variable} +@cindex highlight markers +This variable specifies the color used to highlight matched (non-empty) text. +It is deprecated in favor of @env{GREP_COLORS}, but still supported. +The @samp{mt}, @samp{ms}, and @samp{mc} capabilities of @env{GREP_COLORS} +have priority over it. +It can only specify the color used to highlight +the matching non-empty text in any matching line +(a selected line when the @option{-v} command-line option is omitted, +or a context line when @option{-v} is specified). +The default is @samp{01;31}, +which means a bold red foreground text on the terminal's default background. + +@item GREP_COLORS +@vindex GREP_COLORS @r{environment variable} +@cindex highlight markers +This variable specifies the colors and other attributes +used to highlight various parts of the output. +Its value is a colon-separated list of @code{terminfo} capabilities +that defaults to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} +with the @samp{rv} and @samp{ne} boolean capabilities omitted (i.e., false). +Supported capabilities are as follows. + +@table @code +@item sl= +@vindex sl GREP_COLORS @r{capability} +SGR substring for whole selected lines +(i.e., +matching lines when the @option{-v} command-line option is omitted, +or non-matching lines when @option{-v} is specified). +If however the boolean @samp{rv} capability +and the @option{-v} command-line option are both specified, +it applies to context matching lines instead. +The default is empty (i.e., the terminal's default color pair). + +@item cx= +@vindex cx GREP_COLORS @r{capability} +SGR substring for whole context lines +(i.e., +non-matching lines when the @option{-v} command-line option is omitted, +or matching lines when @option{-v} is specified). +If however the boolean @samp{rv} capability +and the @option{-v} command-line option are both specified, +it applies to selected non-matching lines instead. +The default is empty (i.e., the terminal's default color pair). + +@item rv +@vindex rv GREP_COLORS @r{capability} +Boolean value that reverses (swaps) the meanings of +the @samp{sl=} and @samp{cx=} capabilities +when the @option{-v} command-line option is specified. +The default is false (i.e., the capability is omitted). + +@item mt=01;31 +@vindex mt GREP_COLORS @r{capability} +SGR substring for matching non-empty text in any matching line +(i.e., +a selected line when the @option{-v} command-line option is omitted, +or a context line when @option{-v} is specified). +Setting this is equivalent to setting both @samp{ms=} and @samp{mc=} +at once to the same value. +The default is a bold red text foreground over the current line background. + +@item ms=01;31 +@vindex ms GREP_COLORS @r{capability} +SGR substring for matching non-empty text in a selected line. +(This is used only when the @option{-v} command-line option is omitted.) +The effect of the @samp{sl=} (or @samp{cx=} if @samp{rv}) capability +remains active when this takes effect. +The default is a bold red text foreground over the current line background. + +@item mc=01;31 +@vindex mc GREP_COLORS @r{capability} +SGR substring for matching non-empty text in a context line. +(This is used only when the @option{-v} command-line option is specified.) +The effect of the @samp{cx=} (or @samp{sl=} if @samp{rv}) capability +remains active when this takes effect. +The default is a bold red text foreground over the current line background. + +@item fn=35 +@vindex fn GREP_COLORS @r{capability} +SGR substring for file names prefixing any content line. +The default is a magenta text foreground over the terminal's default background. + +@item ln=32 +@vindex ln GREP_COLORS @r{capability} +SGR substring for line numbers prefixing any content line. +The default is a green text foreground over the terminal's default background. + +@item bn=32 +@vindex bn GREP_COLORS @r{capability} +SGR substring for byte offsets prefixing any content line. +The default is a green text foreground over the terminal's default background. + +@item se=36 +@vindex fn GREP_COLORS @r{capability} +SGR substring for separators that are inserted +between selected line fields (@samp{:}), +between context line fields (@samp{-}), +and between groups of adjacent lines +when nonzero context is specified (@samp{--}). +The default is a cyan text foreground over the terminal's default background. + +@item ne +@vindex ne GREP_COLORS @r{capability} +Boolean value that prevents clearing to the end of line +using Erase in Line (EL) to Right (@samp{\33[K}) +each time a colorized item ends. +This is needed on terminals on which EL is not supported. +It is otherwise useful on terminals +for which the @code{back_color_erase} +(@code{bce}) boolean @code{terminfo} capability does not apply, +when the chosen highlight colors do not affect the background, +or when EL is too slow or causes too much flicker. +The default is false (i.e., the capability is omitted). +@end table + +Note that boolean capabilities have no @samp{=}... part. +They are omitted (i.e., false) by default and become true when specified. + + +@item LC_ALL +@itemx LC_COLLATE +@itemx LANG +@vindex LC_ALL @r{environment variable} +@vindex LC_COLLATE @r{environment variable} +@vindex LANG @r{environment variable} +@cindex character type +@cindex national language support +@cindex NLS +These variables specify the locale for the @env{LC_COLLATE} category, +which might affect how range expressions like @samp{[a-z]} are +interpreted. + +@item LC_ALL +@itemx LC_CTYPE +@itemx LANG +@vindex LC_ALL @r{environment variable} +@vindex LC_CTYPE @r{environment variable} +@vindex LANG @r{environment variable} +@cindex encoding error +@cindex null character +These variables specify the locale for the @env{LC_CTYPE} category, +which determines the type of characters, +e.g., which characters are whitespace. +This category also determines the character encoding. +@xref{Character Encoding}. + +@item LANGUAGE +@itemx LC_ALL +@itemx LC_MESSAGES +@itemx LANG +@vindex LANGUAGE @r{environment variable} +@vindex LC_ALL @r{environment variable} +@vindex LC_MESSAGES @r{environment variable} +@vindex LANG @r{environment variable} +@cindex language of messages +@cindex message language +@cindex national language support +@cindex translation of message language +These variables specify the locale for the @env{LC_MESSAGES} category, +which determines the language that @command{grep} uses for messages. +The default @samp{C} locale uses American English messages. + +@item POSIXLY_CORRECT +@vindex POSIXLY_CORRECT @r{environment variable} +If set, @command{grep} behaves as POSIX requires; otherwise, +@command{grep} behaves more like other GNU programs. +POSIX +requires that options that +follow file names must be treated as file names; +by default, +such options are permuted to the front of the operand list +and are treated as options. +Also, @env{POSIXLY_CORRECT} disables special handling of an +invalid bracket expression. @xref{invalid-bracket-expr}. + +@item _@var{N}_GNU_nonoption_argv_flags_ +@vindex _@var{N}_GNU_nonoption_argv_flags_ @r{environment variable} +(Here @code{@var{N}} is @command{grep}'s numeric process ID.) +If the @var{i}th character of this environment variable's value is @samp{1}, +do not consider the @var{i}th operand of @command{grep} to be an option, +even if it appears to be one. +A shell can put this variable in the environment for each command it runs, +specifying which operands are the results of file name wildcard expansion +and therefore should not be treated as options. +This behavior is available only with the GNU C library, +and only when @env{POSIXLY_CORRECT} is not set. + +@end table + +The @env{GREP_OPTIONS} environment variable of @command{grep} 2.20 and +earlier is no longer supported, as it caused problems when writing +portable scripts. To make arbitrary changes to how @command{grep} +works, you can use an alias or script instead. For example, if +@command{grep} is in the directory @samp{/usr/bin} you can prepend +@file{$HOME/bin} to your @env{PATH} and create an executable script +@file{$HOME/bin/grep} containing the following: + +@example +#! /bin/sh +export PATH=/usr/bin +exec grep --color=auto --devices=skip "$@@" +@end example + + +@node Exit Status +@section Exit Status +@cindex exit status +@cindex return status + +Normally the exit status is 0 if a line is selected, 1 if no lines +were selected, and 2 if an error occurred. However, if the +@option{-q} or @option{--quiet} or @option{--silent} option is used +and a line is selected, the exit status is 0 even if an error +occurred. Other @command{grep} implementations may exit with status +greater than 2 on error. + +@node grep Programs +@section @command{grep} Programs +@cindex @command{grep} programs +@cindex variants of @command{grep} + +@command{grep} searches the named input files +for lines containing a match to the given patterns. +By default, @command{grep} prints the matching lines. +A file named @file{-} stands for standard input. +If no input is specified, @command{grep} searches the working +directory @file{.} if given a command-line option specifying +recursion; otherwise, @command{grep} searches standard input. +There are four major variants of @command{grep}, +controlled by the following options. + +@table @option + +@item -G +@itemx --basic-regexp +@opindex -G +@opindex --basic-regexp +@cindex matching basic regular expressions +Interpret patterns as basic regular expressions (BREs). +This is the default. + +@item -E +@itemx --extended-regexp +@opindex -E +@opindex --extended-regexp +@cindex matching extended regular expressions +Interpret patterns as extended regular expressions (EREs). +(@option{-E} is specified by POSIX.) + +@item -F +@itemx --fixed-strings +@opindex -F +@opindex --fixed-strings +@cindex matching fixed strings +Interpret patterns as fixed strings, not regular expressions. +(@option{-F} is specified by POSIX.) + +@item -P +@itemx --perl-regexp +@opindex -P +@opindex --perl-regexp +@cindex matching Perl-compatible regular expressions +Interpret patterns as Perl-compatible regular expressions (PCREs). +PCRE support is here to stay, but consider this option experimental when +combined with the @option{-z} (@option{--null-data}) option, and note that +@samp{grep@ -P} may warn of unimplemented features. +@xref{Other Options}. + +@end table + +In addition, +two variant programs @command{egrep} and @command{fgrep} are available. +@command{egrep} is the same as @samp{grep@ -E}. +@command{fgrep} is the same as @samp{grep@ -F}. +Direct invocation as either +@command{egrep} or @command{fgrep} is deprecated, +but is provided to allow historical applications +that rely on them to run unmodified. + + +@node Regular Expressions +@chapter Regular Expressions +@cindex regular expressions + +A @dfn{regular expression} is a pattern that describes a set of strings. +Regular expressions are constructed analogously to arithmetic expressions, +by using various operators to combine smaller expressions. +@command{grep} understands +three different versions of regular expression syntax: +basic (BRE), extended (ERE), and Perl-compatible (PCRE). +In GNU @command{grep}, +there is no difference in available functionality between the basic and +extended syntaxes. +In other implementations, basic regular expressions are less powerful. +The following description applies to extended regular expressions; +differences for basic regular expressions are summarized afterwards. +Perl-compatible regular expressions give additional functionality, and +are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual +pages, but work only if PCRE is available in the system. + +@menu +* Fundamental Structure:: +* Character Classes and Bracket Expressions:: +* The Backslash Character and Special Expressions:: +* Anchoring:: +* Back-references and Subexpressions:: +* Basic vs Extended:: +* Character Encoding:: +* Matching Non-ASCII:: +@end menu + +@node Fundamental Structure +@section Fundamental Structure + +@cindex ordinary characters +@cindex special characters +In regular expressions, the characters @samp{.?*+@{|()[\^$} are +@dfn{special characters} and have uses described below. All other +characters are @dfn{ordinary characters}, and each ordinary character +is a regular expression that matches itself. + +@opindex . +@cindex dot +@cindex period +The period @samp{.} matches any single character. +It is unspecified whether @samp{.} matches an encoding error. + +@cindex interval expressions +A regular expression may be followed by one of several +repetition operators; the operators beginning with @samp{@{} +are called @dfn{interval expressions}. + +@table @samp + +@item ? +@opindex ? +@cindex question mark +@cindex match expression at most once +The preceding item is optional and is matched at most once. + +@item * +@opindex * +@cindex asterisk +@cindex match expression zero or more times +The preceding item is matched zero or more times. + +@item + +@opindex + +@cindex plus sign +@cindex match expression one or more times +The preceding item is matched one or more times. + +@item @{@var{n}@} +@opindex @{@var{n}@} +@cindex braces, one argument +@cindex match expression @var{n} times +The preceding item is matched exactly @var{n} times. + +@item @{@var{n},@} +@opindex @{@var{n},@} +@cindex braces, second argument omitted +@cindex match expression @var{n} or more times +The preceding item is matched @var{n} or more times. + +@item @{,@var{m}@} +@opindex @{,@var{m}@} +@cindex braces, first argument omitted +@cindex match expression at most @var{m} times +The preceding item is matched at most @var{m} times. +This is a GNU extension. + +@item @{@var{n},@var{m}@} +@opindex @{@var{n},@var{m}@} +@cindex braces, two arguments +@cindex match expression from @var{n} to @var{m} times +The preceding item is matched at least @var{n} times, but not more than +@var{m} times. + +@end table + +The empty regular expression matches the empty string. +Two regular expressions may be concatenated; +the resulting regular expression +matches any string formed by concatenating two substrings +that respectively match the concatenated expressions. + +Two regular expressions may be joined by the infix operator @samp{|}; +the resulting regular expression +matches any string matching either alternate expression. + +Repetition takes precedence over concatenation, +which in turn takes precedence over alternation. +A whole expression may be enclosed in parentheses +to override these precedence rules and form a subexpression. +An unmatched @samp{)} matches just itself. + +@node Character Classes and Bracket Expressions +@section Character Classes and Bracket Expressions + +@cindex bracket expression +@cindex character class +A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and +@samp{]}. +It matches any single character in that list. +If the first character of the list is the caret @samp{^}, +then it matches any character @strong{not} in the list, +and it is unspecified whether it matches an encoding error. +For example, the regular expression +@samp{[0123456789]} matches any single digit, +whereas @samp{[^()]} matches any single character that is not +an opening or closing parenthesis, and might or might not match an +encoding error. + +@cindex range expression +Within a bracket expression, a @dfn{range expression} consists of two +characters separated by a hyphen. +It matches any single character that +sorts between the two characters, inclusive. +In the default C locale, the sorting sequence is the native character +order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}. +In other locales, the sorting sequence is not specified, and +@samp{[a-d]} might be equivalent to @samp{[abcd]} or to +@samp{[aBbCcDd]}, or it might fail to match any character, or the set of +characters that it matches might even be erratic. +To obtain the traditional interpretation +of bracket expressions, you can use the @samp{C} locale by setting the +@env{LC_ALL} environment variable to the value @samp{C}. + +Finally, certain named classes of characters are predefined within +bracket expressions, as follows. +Their interpretation depends on the @env{LC_CTYPE} locale; +for example, @samp{[[:alnum:]]} means the character class of numbers and letters +in the current locale. + +@cindex classes of characters +@cindex character classes +@table @samp + +@item [:alnum:] +@opindex alnum @r{character class} +@cindex alphanumeric characters +Alphanumeric characters: +@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII +character encoding, this is the same as @samp{[0-9A-Za-z]}. + +@item [:alpha:] +@opindex alpha @r{character class} +@cindex alphabetic characters +Alphabetic characters: +@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII +character encoding, this is the same as @samp{[A-Za-z]}. + +@item [:blank:] +@opindex blank @r{character class} +@cindex blank characters +Blank characters: +space and tab. + +@item [:cntrl:] +@opindex cntrl @r{character class} +@cindex control characters +Control characters. +In ASCII, these characters have octal codes 000 +through 037, and 177 (DEL). +In other character sets, these are +the equivalent characters, if any. + +@item [:digit:] +@opindex digit @r{character class} +@cindex digit characters +@cindex numeric characters +Digits: @code{0 1 2 3 4 5 6 7 8 9}. + +@item [:graph:] +@opindex graph @r{character class} +@cindex graphic characters +Graphical characters: +@samp{[:alnum:]} and @samp{[:punct:]}. + +@item [:lower:] +@opindex lower @r{character class} +@cindex lower-case letters +Lower-case letters; in the @samp{C} locale and ASCII character +encoding, this is +@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}. + +@item [:print:] +@opindex print @r{character class} +@cindex printable characters +Printable characters: +@samp{[:alnum:]}, @samp{[:punct:]}, and space. + +@item [:punct:] +@opindex punct @r{character class} +@cindex punctuation characters +Punctuation characters; in the @samp{C} locale and ASCII character +encoding, this is +@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}. + +@item [:space:] +@opindex space @r{character class} +@cindex space characters +@cindex whitespace characters +Space characters: in the @samp{C} locale, this is +tab, newline, vertical tab, form feed, carriage return, and space. +@xref{Usage}, for more discussion of matching newlines. + +@item [:upper:] +@opindex upper @r{character class} +@cindex upper-case letters +Upper-case letters: in the @samp{C} locale and ASCII character +encoding, this is +@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}. + +@item [:xdigit:] +@opindex xdigit @r{character class} +@cindex xdigit class +@cindex hexadecimal digits +Hexadecimal digits: +@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}. + +@end table +Note that the brackets in these class names are +part of the symbolic names, and must be included in addition to +the brackets delimiting the bracket expression. + +@anchor{invalid-bracket-expr} +If you mistakenly omit the outer brackets, and search for say, @samp{[:upper:]}, +GNU @command{grep} prints a diagnostic and exits with status 2, on +the assumption that you did not intend to search for the nominally +equivalent regular expression: @samp{[:epru]}. +Set the @env{POSIXLY_CORRECT} environment variable to disable this feature. + +Special characters lose their special meaning inside bracket expressions. + +@table @samp +@item ] +ends the bracket expression if it's not the first list item. +So, if you want to make the @samp{]} character a list item, +you must put it first. + +@item [. +represents the open collating symbol. + +@item .] +represents the close collating symbol. + +@item [= +represents the open equivalence class. + +@item =] +represents the close equivalence class. + +@item [: +represents the open character class symbol, and should be followed by a +valid character class name. + +@item :] +represents the close character class symbol. + +@item - +represents the range if it's not first or last in a list or the ending point +of a range. + +@item ^ +represents the characters not in the list. +If you want to make the @samp{^} +character a list item, place it anywhere but first. + +@end table + +@node The Backslash Character and Special Expressions +@section The Backslash Character and Special Expressions +@cindex backslash + +The @samp{\} character followed by a special character is a regular +expression that matches the special character. +The @samp{\} character, +when followed by certain ordinary characters, +takes a special meaning: + +@table @samp + +@item \b +Match the empty string at the edge of a word. + +@item \B +Match the empty string provided it's not at the edge of a word. + +@item \< +Match the empty string at the beginning of a word. + +@item \> +Match the empty string at the end of a word. + +@item \w +Match word constituent, it is a synonym for @samp{[_[:alnum:]]}. + +@item \W +Match non-word constituent, it is a synonym for @samp{[^_[:alnum:]]}. + +@item \s +Match whitespace, it is a synonym for @samp{[[:space:]]}. + +@item \S +Match non-whitespace, it is a synonym for @samp{[^[:space:]]}. + +@end table + +For example, @samp{\brat\b} matches the separate word @samp{rat}, +@samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}. + +@node Anchoring +@section Anchoring +@cindex anchoring + +The caret @samp{^} and the dollar sign @samp{$} are special characters that +respectively match the empty string at the beginning and end of a line. +They are termed @dfn{anchors}, since they force the match to be ``anchored'' +to beginning or end of a line, respectively. + +@node Back-references and Subexpressions +@section Back-references and Subexpressions +@cindex subexpression +@cindex back-reference + +The back-reference @samp{\@var{n}}, +where @var{n} is a single nonzero digit, matches +the substring previously matched by the @var{n}th parenthesized subexpression +of the regular expression. +For example, @samp{(a)\1} matches @samp{aa}. +If the parenthesized subexpression does not participate in the match, +the back-reference makes the whole match fail; +for example, @samp{(a)*\1} fails to match @samp{a}. +If the parenthesized subexpression matches more than one substring, +the back-reference refers to the last matched substring; +for example, @samp{^(ab*)*\1$} matches @samp{ababbabb} but not @samp{ababbab}. +When multiple regular expressions are given with +@option{-e} or from a file (@samp{-f @var{file}}), +back-references are local to each expression. + +@xref{Known Bugs}, for some known problems with back-references. + +@node Basic vs Extended +@section Basic vs Extended Regular Expressions +@cindex basic regular expressions + +In basic regular expressions the characters @samp{?}, @samp{+}, +@samp{@{}, @samp{|}, @samp{(}, and @samp{)} lose their special meaning; +instead use the backslashed versions @samp{\?}, @samp{\+}, @samp{\@{}, +@samp{\|}, @samp{\(}, and @samp{\)}. Also, a backslash is needed +before an interval expression's closing @samp{@}}, and an unmatched +@code{\)} is invalid. + +Portable scripts should avoid the following constructs, as +POSIX says they produce undefined results: + +@itemize @bullet +@item +Extended regular expressions that use back-references. +@item +Basic regular expressions that use @samp{\?}, @samp{\+}, or @samp{\|}. +@item +Empty parenthesized regular expressions like @samp{()}. +@item +Empty alternatives (as in, e.g, @samp{a|}). +@item +Repetition operators that immediately follow empty expressions, +unescaped @samp{$}, or other repetition operators. +@item +A backslash escaping an ordinary character (e.g., @samp{\S}), +unless it is a back-reference. +@item +An unescaped @samp{[} that is not part of a bracket expression. +@item +In extended regular expressions, an unescaped @samp{@{} that is not +part of an interval expression. +@end itemize + +@cindex interval expressions +Traditional @command{egrep} did not support interval expressions and +some @command{egrep} implementations use @samp{\@{} and @samp{\@}} instead, so +portable scripts should avoid interval expressions in @samp{grep@ -E} patterns +and should use @samp{[@{]} to match a literal @samp{@{}. + +GNU @command{grep@ -E} attempts to support traditional usage by +assuming that @samp{@{} is not special if it would be the start of an +invalid interval expression. +For example, the command +@samp{grep@ -E@ '@{1'} searches for the two-character string @samp{@{1} +instead of reporting a syntax error in the regular expression. +POSIX allows this behavior as an extension, but portable scripts +should avoid it. + +@node Character Encoding +@section Character Encoding +@cindex character encoding + +The @env{LC_CTYPE} locale specifies the encoding of characters in +patterns and data, that is, whether text is encoded in UTF-8, ASCII, +or some other encoding. @xref{Environment Variables}. + +In the @samp{C} or @samp{POSIX} locale, every character is encoded as +a single byte and every byte is a valid character. In more-complex +encodings such as UTF-8, a sequence of multiple bytes may be needed to +represent a character, and some bytes may be encoding errors that do +not contribute to the representation of any character. POSIX does not +specify the behavior of @command{grep} when patterns or input data +contain encoding errors or null characters, so portable scripts should +avoid such usage. As an extension to POSIX, GNU @command{grep} treats +null characters like any other character. However, unless the +@option{-a} (@option{--binary-files=text}) option is used, the +presence of null characters in input or of encoding errors in output +causes GNU @command{grep} to treat the file as binary and suppress +details about matches. @xref{File and Directory Selection}. + +Regardless of locale, the 103 characters in the POSIX Portable +Character Set (a subset of ASCII) are always encoded as a single byte, +and the 128 ASCII characters have their usual single-byte encodings on +all but oddball platforms. + +@node Matching Non-ASCII +@section Matching Non-ASCII and Non-printable Characters +@cindex non-ASCII matching +@cindex non-printable matching + +In a regular expression, non-ASCII and non-printable characters other +than newline are not special, and represent themselves. For example, +in a locale using UTF-8 the command @samp{grep 'Λ@tie{}ω'} (where the +white space between @samp{Λ} and the @samp{ω} is a tab character) +searches for @samp{Λ} (Unicode character U+039B GREEK CAPITAL LETTER +LAMBDA), followed by a tab (U+0009 TAB), followed by @samp{ω} (U+03C9 +GREEK SMALL LETTER OMEGA). + +Suppose you want to limit your pattern to only printable characters +(or even only printable ASCII characters) to keep your script readable +or portable, but you also want to match specific non-ASCII or non-null +non-printable characters. If you are using the @option{-P} +(@option{--perl-regexp}) option, PCREs give you several ways to do +this. Otherwise, if you are using Bash, the GNU project's shell, you +can represent these characters via ANSI-C quoting. For example, the +Bash commands @samp{grep $'Λ\tω'} and @samp{grep $'\u039B\t\u03C9'} +both search for the same three-character string @samp{Λ@tie{}ω} +mentioned earlier. However, because Bash translates ANSI-C quoting +before @command{grep} sees the pattern, this technique should not be +used to match printable ASCII characters; for example, @samp{grep +$'\u005E'} is equivalent to @samp{grep '^'} and matches any line, not +just lines containing the character @samp{^} (U+005E CIRCUMFLEX +ACCENT). + +Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable +shell scripts written in ASCII should use other methods to match +specific non-ASCII characters. For example, in a UTF-8 locale the +command @samp{grep "$(printf '\316\233\t\317\211\n')"} is a portable +albeit hard-to-read alternative to Bash's @samp{grep $'Λ\tω'}. +However, none of these techniques will let you put a null character +directly into a command-line pattern; null characters can appear only +in a pattern specified via the @option{-f} (@option{--file}) option. + +@node Usage +@chapter Usage + +@cindex usage, examples +Here is an example command that invokes GNU @command{grep}: + +@example +grep -i 'hello.*world' menu.h main.c +@end example + +@noindent +This lists all lines in the files @file{menu.h} and @file{main.c} that +contain the string @samp{hello} followed by the string @samp{world}; +this is because @samp{.*} matches zero or more characters within a line. +@xref{Regular Expressions}. +The @option{-i} option causes @command{grep} +to ignore case, causing it to match the line @samp{Hello, world!}, which +it would not otherwise match. + +Here is a more complex example, +showing the location and contents of any line +containing @samp{f} and ending in @samp{.c}, +within all files in the current directory whose names +start with non-@samp{.}, contain @samp{g}, and end in @samp{.h}. +The @option{-n} option outputs line numbers, the @option{--} argument +treats any later arguments as file names not options even if +@code{*g*.h} expands to a file name that starts with @samp{-}, +and the empty file @file{/dev/null} causes file names to be output +even if only one file name happens to be of the form @samp{*g*.h}. + +@example +grep -n -- 'f.*\.c$' *g*.h /dev/null +@end example + +@noindent +Note that the regular expression syntax used in the pattern differs +from the globbing syntax that the shell uses to match file names. + +@xref{Invoking}, for more details about +how to invoke @command{grep}. + +@cindex using @command{grep}, Q&A +@cindex FAQ about @command{grep} usage +Here are some common questions and answers about @command{grep} usage. + +@enumerate + +@item +How can I list just the names of matching files? + +@example +grep -l 'main' test-*.c +@end example + +@noindent +lists names of @samp{test-*.c} files in the current directory whose contents +mention @samp{main}. + +@item +How do I search directories recursively? + +@example +grep -r 'hello' /home/gigi +@end example + +@noindent +searches for @samp{hello} in all files +under the @file{/home/gigi} directory. +For more control over which files are searched, +use @command{find} and @command{grep}. +For example, the following command searches only C files: + +@example +find /home/gigi -name '*.c' ! -type d \ + -exec grep -H 'hello' '@{@}' + +@end example + +This differs from the command: + +@example +grep -H 'hello' /home/gigi/*.c +@end example + +which merely looks for @samp{hello} in non-hidden C files in +@file{/home/gigi} whose names end in @samp{.c}. +The @command{find} command line above is more similar to the command: + +@example +grep -r --include='*.c' 'hello' /home/gigi +@end example + +@item +What if a pattern or file has a leading @samp{-}? + +@example +grep -- '--cut here--' * +@end example + +@noindent +searches for all lines matching @samp{--cut here--}. +Without @option{--}, +@command{grep} would attempt to parse @samp{--cut here--} as a list of +options, and there would be similar problems with any file names +beginning with @samp{-}. + +Alternatively, you can prevent misinterpretation of leading @samp{-} +by using @option{-e} for patterns and leading @samp{./} for files: + +@example +grep -e '--cut here--' ./* +@end example + +@item +Suppose I want to search for a whole word, not a part of a word? + +@example +grep -w 'hello' test*.log +@end example + +@noindent +searches only for instances of @samp{hello} that are entire words; +it does not match @samp{Othello}. +For more control, use @samp{\<} and +@samp{\>} to match the start and end of words. +For example: + +@example +grep 'hello\>' test*.log +@end example + +@noindent +searches only for words ending in @samp{hello}, so it matches the word +@samp{Othello}. + +@item +How do I output context around the matching lines? + +@example +grep -C 2 'hello' test*.log +@end example + +@noindent +prints two lines of context around each matching line. + +@item +How do I force @command{grep} to print the name of the file? + +Append @file{/dev/null}: + +@example +grep 'eli' /etc/passwd /dev/null +@end example + +gets you: + +@example +/etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash +@end example + +Alternatively, use @option{-H}, which is a GNU extension: + +@example +grep -H 'eli' /etc/passwd +@end example + +@item +Why do people use strange regular expressions on @command{ps} output? + +@example +ps -ef | grep '[c]ron' +@end example + +If the pattern had been written without the square brackets, it would +have matched not only the @command{ps} output line for @command{cron}, +but also the @command{ps} output line for @command{grep}. +Note that on some platforms, +@command{ps} limits the output to the width of the screen; +@command{grep} does not have any limit on the length of a line +except the available memory. + +@item +Why does @command{grep} report ``Binary file matches''? + +If @command{grep} listed all matching ``lines'' from a binary file, it +would probably generate output that is not useful, and it might even +muck up your display. +So GNU @command{grep} suppresses output from +files that appear to be binary files. +To force GNU @command{grep} +to output lines even from files that appear to be binary, use the +@option{-a} or @samp{--binary-files=text} option. +To eliminate the +``Binary file matches'' messages, use the @option{-I} or +@samp{--binary-files=without-match} option, +or the @option{-s} or @option{--no-messages} option. + +@item +Why doesn't @samp{grep -lv} print non-matching file names? + +@samp{grep -lv} lists the names of all files containing one or more +lines that do not match. +To list the names of all files that contain no +matching lines, use the @option{-L} or @option{--files-without-match} +option. + +@item +I can do ``OR'' with @samp{|}, but what about ``AND''? + +@example +grep 'paul' /etc/motd | grep 'franc,ois' +@end example + +@noindent +finds all lines that contain both @samp{paul} and @samp{franc,ois}. + +@item +Why does the empty pattern match every input line? + +The @command{grep} command searches for lines that contain strings +that match a pattern. Every line contains the empty string, so an +empty pattern causes @command{grep} to find a match on each line. It +is not the only such pattern: @samp{^}, @samp{$}, and many +other patterns cause @command{grep} to match every line. + +To match empty lines, use the pattern @samp{^$}. To match blank +lines, use the pattern @samp{^[[:blank:]]*$}. To match no lines at +all, use the command @samp{grep -f /dev/null}. + +@item +How can I search in both standard input and in files? + +Use the special file name @samp{-}: + +@example +cat /etc/passwd | grep 'alain' - /etc/motd +@end example + +@item +Why is this back-reference failing? + +@example +echo 'ba' | grep -E '(a)\1|b\1' +@end example + +This outputs an error message, because the second @samp{\1} +has nothing to refer back to, meaning it will never match anything. + +@item +How can I match across lines? + +Standard grep cannot do this, as it is fundamentally line-based. +Therefore, merely using the @code{[:space:]} character class does not +match newlines in the way you might expect. + +With the GNU @command{grep} option @option{-z} (@option{--null-data}), each +input and output ``line'' is null-terminated; @pxref{Other Options}. Thus, +you can match newlines in the input, but typically if there is a match +the entire input is output, so this usage is often combined with +output-suppressing options like @option{-q}, e.g.: + +@example +printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar' +@end example + +If this does not suffice, you can transform the input +before giving it to @command{grep}, or turn to @command{awk}, +@command{sed}, @command{perl}, or many other utilities that are +designed to operate across lines. + +@item +What do @command{grep}, @command{fgrep}, and @command{egrep} stand for? + +The name @command{grep} comes from the way line editing was done on Unix. +For example, +@command{ed} uses the following syntax +to print a list of matching lines on the screen: + +@example +global/regular expression/print +g/re/p +@end example + +@command{fgrep} stands for Fixed @command{grep}; +@command{egrep} stands for Extended @command{grep}. + +@end enumerate + + +@node Performance +@chapter Performance + +@cindex performance +Typically @command{grep} is an efficient way to search text. However, +it can be quite slow in some cases, and it can search large files +where even minor performance tweaking can help significantly. +Although the algorithm used by @command{grep} is an implementation +detail that can change from release to release, understanding its +basic strengths and weaknesses can help you improve its performance. + +The @command{grep} command operates partly via a set of automata that +are designed for efficiency, and partly via a slower matcher that +takes over when the fast matchers run into unusual features like +back-references. When feasible, the Boyer--Moore fast string +searching algorithm is used to match a single fixed pattern, and the +Aho--Corasick algorithm is used to match multiple fixed patterns. + +@cindex locales +Generally speaking @command{grep} operates more efficiently in +single-byte locales, since it can avoid the special processing needed +for multi-byte characters. If your patterns will work just as well +that way, setting @env{LC_ALL} to a single-byte locale can help +performance considerably. Setting @samp{LC_ALL='C'} can be +particularly efficient, as @command{grep} is tuned for that locale. + +@cindex case insensitive search +Outside the @samp{C} locale, case-insensitive search, and search for +bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be +surprisingly inefficient due to difficulties in fast portable access to +concepts like multi-character collating elements. + +@cindex back-references +A back-reference such as @samp{\1} can hurt performance significantly +in some cases, since back-references cannot in general be implemented +via a finite state automaton, and instead trigger a backtracking +algorithm that can be quite inefficient. For example, although the +pattern @samp{^(.*)\1@{14@}(.*)\2@{13@}$} matches only lines whose +lengths can be written as a sum @math{15x + 14y} for nonnegative +integers @math{x} and @math{y}, the pattern matcher does not perform +linear Diophantine analysis and instead backtracks through all +possible matching strings, using an algorithm that is exponential in +the worst case. + +@cindex holes in files +On some operating systems that support files with holes---large +regions of zeros that are not physically present on secondary +storage---@command{grep} can skip over the holes efficiently without +needing to read the zeros. This optimization is not available if the +@option{-a} (@option{--binary-files=text}) option is used (@pxref{File and +Directory Selection}), unless the @option{-z} (@option{--null-data}) +option is also used (@pxref{Other Options}). + +For more about the algorithms used by @command{grep} and about +related string matching algorithms, see: + +@frenchspacing on +@itemize @bullet +@item +Aho AV. Algorithms for finding patterns in strings. +In: van Leeuwen J. @emph{Handbook of Theoretical Computer Science}, vol. A. +New York: Elsevier; 1990. p. 255--300. +This surveys classic string matching algorithms, some of which are +used by @command{grep}. + +@item +Aho AV, Corasick MJ. Efficient string matching: an aid to bibliographic search. +@emph{CACM}. 1975;18(6):333--40. +@url{https://dx.doi.org/10.1145/360825.360855}. +This introduces the Aho--Corasick algorithm. + +@item +Boyer RS, Moore JS. A fast string searching algorithm. +@emph{CACM}. 1977;20(10):762--72. +@url{https://dx.doi.org/10.1145/359842.359859}. +This introduces the Boyer--Moore algorithm. + +@item +Faro S, Lecroq T. The exact online string matching problem: a review +of the most recent results. +@emph{ACM Comput Surv}. 2013;45(2):13. +@url{https://dx.doi.org/10.1145/2431211.2431212}. +This surveys string matching algorithms that might help improve the +performance of @command{grep} in the future. +@end itemize +@frenchspacing off + +@node Reporting Bugs +@chapter Reporting bugs + +@cindex bugs, reporting +Bug reports can be found at the +@url{https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep, +GNU bug report logs for @command{grep}}. +If you find a bug not listed there, please email it to +@email{bug-grep@@gnu.org} to create a new bug report. + +@menu +* Known Bugs:: +@end menu + +@node Known Bugs +@section Known Bugs +@cindex Bugs, known + +Large repetition counts in the @samp{@{n,m@}} construct may cause +@command{grep} to use lots of memory. +In addition, certain other +obscure regular expressions require exponential time and +space, and may cause @command{grep} to run out of memory. + +Back-references can greatly slow down matching, as they can generate +exponentially many matching possibilities that can consume both time +and memory to explore. Also, the POSIX specification for +back-references is at times unclear. Furthermore, many regular +expression implementations have back-reference bugs that can cause +programs to return incorrect answers or even crash, and fixing these +bugs has often been low-priority: for example, as of 2021 the +@url{https://sourceware.org/bugzilla/,GNU C library bug database} +contained back-reference bugs +@url{https://sourceware.org/bugzilla/show_bug.cgi?id=52,,52}, +@url{https://sourceware.org/bugzilla/show_bug.cgi?id=10844,,10844}, +@url{https://sourceware.org/bugzilla/show_bug.cgi?id=11053,,11053}, +@url{https://sourceware.org/bugzilla/show_bug.cgi?id=24269,,24269} +and @url{https://sourceware.org/bugzilla/show_bug.cgi?id=25322,,25322}, +with little sign of forthcoming fixes. Luckily, +back-references are rarely useful and it should be little trouble to +avoid them in practical applications. + + +@node Copying +@chapter Copying +@cindex copying + +GNU @command{grep} is licensed under the GNU GPL, which makes it @dfn{free +software}. + +The ``free'' in ``free software'' refers to liberty, not price. As +some GNU project advocates like to point out, think of ``free speech'' +rather than ``free beer''. In short, you have the right (freedom) to +run and change @command{grep} and distribute it to other people, and---if you +want---charge money for doing either. The important restriction is +that you have to grant your recipients the same rights and impose the +same restrictions. + +This general method of licensing software is sometimes called +@dfn{open source}. The GNU project prefers the term ``free software'' +for reasons outlined at +@url{https://www.gnu.org/philosophy/open-source-misses-the-point.html}. + +This manual is free documentation in the same sense. The +documentation license is included below. The license for the program +is available with the source code, or at +@url{https://www.gnu.org/licenses/gpl.html}. + +@menu +* GNU Free Documentation License:: +@end menu + +@node GNU Free Documentation License +@section GNU Free Documentation License + +@include fdl.texi + + +@node Index +@unnumbered Index + +@printindex cp + +@bye -- cgit v1.2.3