diff options
Diffstat (limited to '')
-rw-r--r-- | src/grep/NEWS | 1290 |
1 files changed, 1290 insertions, 0 deletions
diff --git a/src/grep/NEWS b/src/grep/NEWS new file mode 100644 index 0000000..94c8a39 --- /dev/null +++ b/src/grep/NEWS @@ -0,0 +1,1290 @@ +GNU grep NEWS -*- outline -*- + +* Noteworthy changes in release 3.7 (2021-08-14) [stable] + +** Changes in behavior + + Use of the --unix-byte-offsets (-u) option now evokes a warning. + Since 3.1, this Windows-only option has had no effect. + +** Bug fixes + + Preprocessing N patterns would take at least O(N^2) time when too many + patterns hashed to too few buckets. This now takes seconds, not days: + : | grep -Ff <(seq 6400000 | tr 0-9 A-J) + [Bug#44754 introduced in grep 3.5] + + +* Noteworthy changes in release 3.6 (2020-11-08) [stable] + +** Changes in behavior + + The GREP_OPTIONS environment variable no longer affects grep's behavior. + The variable was declared obsolescent in grep 2.21 (2014), and since + then any use had caused grep to issue a diagnostic. + +** Bug fixes + + grep's DFA matcher performed an invalid regex transformation + that would convert an ERE like a+a+a+ to a+a+, which would make + grep a+a+a+ mistakenly match "aa". + [Bug#44351 introduced in grep 3.2] + + grep -P now reports the troublesome input filename upon PCRE execution + failure. Before, searching many files for something rare might fail with + just "exceeded PCRE's backtracking limit". Now, it also reports which file + triggered the failure. + + +* Noteworthy changes in release 3.5 (2020-09-27) [stable] + +** Changes in behavior + + The message that a binary file matches is now sent to standard error + and the message has been reworded from "Binary file FOO matches" to + "grep: FOO: binary file matches", to avoid confusion with ordinary + output or when file names contain spaces and the like, and to be + more consistent with other diagnostics. For example, commands + like 'grep PATTERN FILE | wc' no longer add 1 to the count of + matching text lines due to the presence of the message. Like other + stderr messages, the message is now omitted if the --no-messages + (-s) option is given. + + Two other stderr messages now use the typical form too. They are + now "grep: FOO: warning: recursive directory loop" and "grep: FOO: + input file is also the output". + + The --files-without-match (-L) option has reverted to its behavior + in grep 3.1 and earlier. That is, grep -L again succeeds when a + line is selected, not when a file is listed. The behavior in grep + 3.2 through 3.4 was causing compatibility problems. + +** Bug fixes + + grep -I no longer issues a spurious "Binary file FOO matches" line. + [Bug#33552 introduced in grep 2.23] + + In UTF-8 locales, grep -w no longer ignores a multibyte word + constituent just before what would otherwise be a word match. + [Bug#43225 introduced in grep 2.28] + + grep -i no longer mishandles ASCII characters that match multibyte + characters. For example, 'LC_ALL=tr_TR.utf8 grep -i i' no longer + dumps core merely because 'i' matches 'İ' (U+0130 LATIN CAPITAL + LETTER I WITH DOT ABOVE) in Turkish when ignoring case. + [Bug#43577 introduced partly in grep 2.28 and partly in grep 3.4] + + A performance regression with -E and many patterns has been mostly fixed. + "Mostly" as there is a performance tradeoff between Bug#22357 and Bug#40634. + [Bug#40634 introduced in grep 2.28] + + A performance regression with many duplicate patterns has been fixed. + [Bug#43040 introduced in grep 3.4] + + An N^2 RSS performance regression with many patterns has been fixed + in common cases (no backref, and no use of -o or --color). + With only 80,000 lines of /usr/share/dict/linux.words, the following + would use 100GB of RSS and take 3 minutes. With the fix, it used less + than 400MB and took less than one second: + head -80000 /usr/share/dict/linux.words > w; grep -vf w w + [Bug#43527 introduced in grep 3.4] + +** Build-related + + "make dist" builds .tar.gz files again, as they are still used in + some barebones builds. + + +* Noteworthy changes in release 3.4 (2020-01-02) [stable] + +** New features + + The new --no-ignore-case option causes grep to observe case + distinctions, overriding any previous -i (--ignore-case) option. + +** Bug fixes + + '.' no longer matches some invalid byte sequences in UTF-8 locales. + [bug introduced in grep 2.7] + + grep -Fw can no longer false match in non-UTF-8 multibyte locales + For example, this command would erroneously print its input line: + echo ab | LC_CTYPE=ja_JP.eucjp grep -Fw b + [Bug#38223 introduced in grep 2.28] + + The exit status of 'grep -L' is no longer incorrect when standard + output is /dev/null. + [Bug#37716 introduced in grep 3.2] + + A performance bug has been fixed when grep is given many patterns, + each with no back-reference. + [Bug#33249 introduced in grep 2.5] + + A performance bug has been fixed for patterns like '01.2' that + cause grep to reorder tokens internally. + [Bug#34951 introduced in grep 3.2] + +** Build-related + + The build procedure no longer relies on any already-built src/grep + that might be absent or broken. Instead, it uses the system 'grep' + to bootstrap, and uses src/grep only to test the build. On Solaris + /usr/bin/grep is broken, but you can install GNU or XPG4 'grep' from + the standard Solaris distribution before building GNU Grep yourself. + [bug introduced in grep 2.8] + + +* Noteworthy changes in release 3.3 (2018-12-20) [stable] + +** Bug fixes + + Some uses of \b in the C locale and with the DFA matcher would fail, e.g., + the following would print nothing (it should print the input line): + echo 123-x|LC_ALL=C grep '.\bx' + Using a multibyte locale, using certain regexp constructs (some ranges, + back-references), or forcing use of the PCRE matcher via --perl-regexp (-P) + would avoid the bug. + [bug introduced in grep 3.2] + + +* Noteworthy changes in release 3.2 (2018-12-20) [stable] + +** Changes in behavior + + The --files-without-match (-L) option now causes grep to succeed + when a file is listed, instead of when a line is selected. This + resembles what git-grep does. + +** Bug fixes + + The --recursive (-r) option no longer fails on MS-Windows. + [bug introduced in grep 2.11] + +** Improvements + + An over-30x performance improvement when many 'or'd expressions + share a common prefix, thanks to improvements in gnulib's dfa.c, + by Norihiro Tanaka. See gnulib commits v0.1-2110-ge648401be, + v0.1-2111-g4299106ce, v0.1-2117-g617a60974 + + An additional 3-23% speed-up when searching large files, via + increased initial buffer size. + + grep now diagnoses stack overflow. Before grep-2.6, the included + regexp code would detect it. Since 2.6, grep defaulted to using + glibc's regexp, which lost that capability. + + +* Noteworthy changes in release 3.1 (2017-07-02) [stable] + +** Improvements + + grep '[0-9]' is now just as fast as grep '[[:digit:]]' when run + in a multi-byte locale. Before, it was several times slower. + +** Changes in behavior + + Context no longer excludes selected lines omitted because of -m. + For example, 'grep "^" -m1 -A1' now outputs the first two input + lines, not just the first line. This fixes a glitch that has been + present since -m was added in grep 2.5. + + The following changes affect only MS-Windows platforms. First, the + --binary (-U) option now governs whether binary I/O is used, instead + of a heuristic that was sometimes incorrect. Second, the + --unix-byte-offsets (-u) option now has no effect on MS-Windows too. + + +* Noteworthy changes in release 3.0 (2017-02-09) [stable] + +** Bug fixes + + grep without -F no longer goes awry when given two or more patterns + that contain no special characters other than '\' and also contain a + subpattern like '\.' that escapes a character to make it ordinary. + [bug introduced in grep 2.28] + + grep no longer fails to build on PCRE versions before 8.20. + [bug introduced in grep 2.28] + + +* Noteworthy changes in release 2.28 (2017-02-06) [stable] + +** Bug fixes + + When grep -Fo finds matches of differing length, it could + mistakenly print a shorter one. Now it prints a longest one. + [bug introduced in grep-2.26] + + When standard output is /dev/null, grep no longer fails when + standard input is a file in the Linux /proc file system, or when + standard input is a pipe and standard output is in append mode. + [bugs introduced in grep-2.27] + + Fix performance regression with multiple patterns, e.g., for -Fi in + a multi-byte locale, or for -Fw in a single-byte locale. + [bugs introduced in grep-2.19, grep-2.22 and grep-2.26] + +** Improvements + + Improve performance for -E or -G pattern lists that are easily + converted to -F format. + + +* Noteworthy changes in release 2.27 (2016-12-06) [stable] + +** Bug fixes + + grep no longer reports a false match in a multibyte, non-UTF8 locale + like zh_CN.gb18030, with a regular expression like ".*7" that just + happens to match the 4-byte representation of gb18030's \uC9, the + final byte of which is the digit "7". + [bug introduced in grep-2.19] + + Unless an early-exit option like -q, -l, -L, -m, or -f /dev/null is + specified, grep now reads all of a non-seekable standard input, + even if this cannot affect grep's output or exit status. This works + better with nonportable scripts that run "PROGRAM | grep PATTERN + >/dev/null" where PROGRAM dies when writing into a broken pipe. + [bug introduced in grep-2.26] + + grep no longer mishandles ranges in nontrivial unibyte locales. + [bug introduced in grep-2.26] + + grep -P no longer attempts multiline matches. This works more + intuitively with unusual patterns, and means that grep -Pz no longer + rejects patterns containing ^ and $ and works when combined with -x. + [bugs introduced in grep-2.23] A downside is that grep -P is now + significantly slower, albeit typically still faster than pcregrep. + + grep -m0 -L PAT FILE now outputs "FILE". [bug introduced in grep-2.5] + + To output ':' and tab-align the following character C, grep -T no + longer outputs tab-backspace-':'-C, an approach that has problems if + run inside an Emacs shell window. [bug introduced in grep-2.5.2] + + grep -T now uses worst-case widths of line numbers and byte offsets + instead of guessing widths that might not work with larger files. + [bug introduced in grep-2.5.2] + + grep's use of getprogname no longer causes a build failure on HP-UX. + +** Improvements + + grep no longer reads the input in a few more cases when it is easy + to see that matching cannot succeed, e.g., 'grep -f /dev/null'. + + +* Noteworthy changes in release 2.26 (2016-10-02) [stable] + +** Bug fixes + + Grep no longer omits output merely because it follows an output line + suppressed due to encoding errors. [bug introduced in grep-2.21] + + In the Shift_JIS locale, grep no longer mistakenly matches in the + middle of a multibyte character. [bug present since "the beginning"] + +** Improvements + + grep can be much faster now when standard output is /dev/null. + + grep -F is now typically much faster when many patterns are given, + as it now uses the Aho-Corasick algorithm instead of the + Commentz-Walter algorithm in that case. + + grep -iF is typically much faster in a multibyte locale, if the + pattern and its case counterparts contain only single byte characters. + + grep with complicated expressions (e.g., back-references) and without + -i now uses the regex fastmap for better performance. + + In multibyte locales, grep now handles leading "." in patterns more + efficiently. + + grep now prints a "FILENAME:LINENO: " prefix when diagnosing an + invalid regular expression that was read from an '-f'-specified file. + + +* Noteworthy changes in release 2.25 (2016-04-21) [stable] + +** Bug fixes + + In the C or POSIX locale, grep now treats all bytes as valid + characters even if the C runtime library says otherwise. The + revised behavior is more compatible with the original intent of + POSIX, and the next release of POSIX will likely make this official. + [bug introduced in grep-2.23] + + grep -Pz no longer mistakenly diagnoses patterns like [^a] that use + negated character classes. [bug introduced in grep-2.24] + + grep -oz now uses null bytes, not newlines, to terminate output lines. + [bug introduced in grep-2.5] + +** Improvements + + grep now outputs details more consistently when reporting a write error. + E.g., "grep: write error: No space left on device" rather than just + "grep: write error". + + +* Noteworthy changes in release 2.24 (2016-03-10) [stable] + +** Bug fixes + + grep -z would match strings it should not. To trigger the bug, you'd + have to use a regular expression including an anchor (^ or $) and a + feature like a range or a back-reference, causing grep to forego its DFA + matcher and resort to using re_search. With a multibyte locale, that + matcher could mistakenly match a string containing a newline. + For example, this command: + printf 'a\nb\0' | LC_ALL=en_US.utf-8 grep -z '^[a-b]*b' + would mistakenly match and print all four input bytes. After the fix, + there is no match, as expected. + [bug introduced in grep-2.7] + + grep -Pz now diagnoses attempts to use patterns containing ^ and $, + instead of mishandling these patterns. This problem seems to be + inherent to the PCRE API; removing this limitation is on PCRE's + maint/README wish list. Patterns can continue to match literal ^ + and $ by escaping them with \ (now needed even inside [...]). + [bug introduced in grep-2.5] + + +* Noteworthy changes in release 2.23 (2016-02-04) [stable] + +** Bug fixes + + Binary files are now less likely to generate diagnostics and more + likely to yield text matches. grep now reports "Binary file FOO + matches" and suppresses further output instead of outputting a line + containing an encoding error; hence grep can now report matching text + before a later binary match. Formerly, grep reported FOO to be + binary when it found an encoding error in FOO before generating + output for FOO, which meant it never reported both matching text and + matching binary data; this was less useful for searching text + containing encoding errors in non-matching lines. + [bug introduced in grep-2.21] + + grep -c no longer stops counting when finding binary data. + [bug introduced in grep-2.21] + + grep no longer outputs encoding errors in unibyte locales. + For example, if the byte '\x81' is not a valid character in a + unibyte locale, grep treats the byte as binary data. + [bug introduced in grep-2.21] + + grep -oP is no longer susceptible to an infinite loop when processing + invalid UTF8 just before a match. + [bug introduced in grep-2.22] + + --exclude and related options are now matched against trailing + parts of command-line arguments, not against the entire arguments. + This partly reverts the --exclude-related change in 2.22. + [bug introduced in grep-2.22] + + --line-buffer is no longer ineffective when combined with -l. + [bug introduced in grep-2.5] + + -xw is now equivalent to -x more consistently, with -P and with backrefs. + [bug only partially fixed in grep-2.19] + + +* Noteworthy changes in release 2.22 (2015-11-01) [stable] + +** Improvements + + Performance has improved for patterns containing very long strings, + reducing preprocessing time for an N-byte regexp from O(N^2) to + only slightly superlinear for most patterns. Before, a command like + the following would take over a minute, but now, it takes less than + a second: + : | grep -f <(seq -s '' 99999) + + When building grep, 'configure' now uses PCRE's pkg-config module for + configuration information, rather than attempting to guess it by hand. + +** Bug fixes + + A DFA matcher bug made this command mistakenly print its input line: + echo axb | grep -E '^x|x$' + Likewise for this equivalent command: + echo axb | grep -e '^x' -e 'x$' + [bug introduced in grep-2.19 ] + + grep no longer reads from uninitialized memory or from beyond the end + of the heap-allocated input buffer. This fix addressed CVE-2015-1345. + [bug introduced in grep-2.19 ] + + With -z, '.' and '[^x]' in a pattern now consistently match newline. + Previously, they sometimes matched newline, and sometimes did not. + [bug introduced in grep-2.4] + + When the JIT stack is exhausted, grep -P now grows the stack rather + than reporting an internal PCRE error. + + 'grep -D skip PATTERN FILE' no longer hangs if FILE is a fifo. + [bug introduced in grep-2.12] + + --exclude and related options are now matched against entire + command-line arguments, not against command-line components. + [bug introduced in grep-2.6] + + Fix performance degradation of grep -Fw in unibyte locales. + [bug introduced in grep-2.19 ] + + +* Noteworthy changes in release 2.21 (2014-11-23) [stable] + +** Improvements + + Performance has been greatly improved for searching files containing + holes, on platforms where lseek's SEEK_DATA flag works efficiently. + + Performance has improved for rejecting data that cannot match even + the first part of a nontrivial pattern. + + Performance has improved for very long strings in patterns. + + If a file contains data improperly encoded for the current locale, + and this is discovered before any of the file's contents are output, + grep now treats the file as binary. + + grep -P no longer reports an error and exits when given invalid UTF-8 data. + Instead, it considers the data to be non-matching. + +** Bug fixes + + grep no longer mishandles patterns that contain \w or \W in multibyte + locales. + + grep would fail to count newlines internally when operating in non-UTF8 + multibyte locales, leading it to print potentially many lines that did + not match. E.g., the command, "seq 10 | env LC_ALL=zh_CN src/grep -n .." + would print this: + 1:1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + implying that the match, "10" was on line 1. + [bug introduced in grep-2.19] + + grep -F -x -o no longer prints an extra newline for each match. + [bug introduced in grep-2.19] + + grep in a non-UTF8 multibyte locale could mistakenly match in the middle + of a multibyte character when using a '^'-anchored alternate in a pattern, + leading it to print non-matching lines. [bug present since "the beginning"] + + grep -F Y no longer fails to match in non-UTF8 multibyte locales like + Shift-JIS, when the input contains a 2-byte character, XY, followed by + the single-byte search pattern, Y. grep would find the first, middle- + of-multibyte matching "Y", and then mistakenly advance an internal + pointer one byte too far, skipping over the target "Y" just after that. + [bug introduced in grep-2.19] + + grep -E rejected unmatched ')', instead of treating it like '\)'. + [bug present since "the beginning"] + + On NetBSD, grep -r no longer reports "Inappropriate file type or format" + when refusing to follow a symbolic link. + [bug introduced in grep-2.12] + +** Changes in behavior + + The GREP_OPTIONS environment variable is now obsolescent, and grep + now warns if it is used. Please use an alias or script instead. + + In locales with multibyte character encodings other than UTF-8, + grep -P now reports an error and exits instead of misbehaving. + + When searching binary data, grep now may treat non-text bytes as + line terminators. This can boost performance significantly. + + grep -z no longer automatically treats the byte '\200' as binary data. + +* Noteworthy changes in release 2.20 (2014-06-03) [stable] + +** Bug fixes + + grep --max-count=N FILE would no longer stop reading after the Nth match. + I.e., while grep would still print the correct output, it would continue + reading until end of input, and hence, potentially forever. + [bug introduced in grep-2.19] + + A command like echo aa|grep -E 'a(b$|c$)' would mistakenly + report the input as a matched line. + [bug introduced in grep-2.19] + +** Changes in behavior + + grep --exclude-dir='FOO/' now excludes the directory FOO. + Previously, the trailing slash meant the option was ineffective. + + +* Noteworthy changes in release 2.19 (2014-05-22) [stable] + +** Improvements + + Performance has improved, typically by 10% and in some cases by a + factor of 200. However, performance of grep -P in UTF-8 locales has + gotten worse as part of the fix for the crashes mentioned below. + +** Bug fixes + + grep no longer mishandles patterns like [a-[.z.]], and no longer + mishandles patterns like [^a] in locales that have multicharacter + collating sequences so that [^a] can match a string of two characters. + + grep no longer mishandles an empty pattern at the end of a pattern list. + [bug introduced in grep-2.5] + + grep -C NUM now outputs separators consistently even when NUM is zero, + and similarly for grep -A NUM and grep -B NUM. + [bug present since "the beginning"] + + grep -f no longer mishandles patterns containing NUL bytes. + [bug introduced in grep-2.11] + + Plain grep, grep -E, and grep -F now treat encoding errors in patterns + the same way the GNU regular expression matcher treats them, with respect + to whether the errors can match parts of multibyte characters in data. + [bug present since "the beginning"] + + grep -w no longer mishandles a potential match adjacent to a letter that + takes up two or more bytes in a multibyte encoding. + Similarly, the patterns '\<', '\>', '\b', and '\B' no longer + mishandle word-boundary matches in multibyte locales. + [bug present since "the beginning"] + + grep -P now reports an error and exits when given invalid UTF-8 data. + Previously it was unreliable, and sometimes crashed or looped. + [bug introduced in grep-2.16] + + grep -P now works with -w and -x and back-references. Before, + echo aa|grep -Pw '(.)\1' would fail to match, yet + echo aa|grep -Pw '(.)\2' would match. + + grep -Pw now works like grep -w in that the matched string has to be + preceded and followed by non-word components or the beginning and end + of the line (as opposed to word boundaries before). Before, this + echo a@@a| grep -Pw @@ would match, yet this + echo a@@a| grep -w @@ would not. Now, they both fail to match, + per the documentation on how grep's -w works. + + grep -i no longer mishandles patterns containing titlecase characters. + For example, in a locale containing the titlecase character + 'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J), + 'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ) + and 'lj' (U+01C9 LATIN SMALL LETTER LJ). + + +* Noteworthy changes in release 2.18 (2014-02-20) [stable] + +** Bug fixes + + grep no longer mishandles patterns like [^^-~] in unibyte locales. + [bug introduced in grep-2.8] + + grep -i in a multibyte, non-UTF8 locale could be up to 200 times slower + than in 2.16. [bug introduced in grep-2.17] + + +* Noteworthy changes in release 2.17 (2014-02-17) [stable] + +** Improvements + + grep -i in a multibyte locale is now typically 10 times faster + for patterns that do not contain \ or [. + + grep (without -i) in a multibyte locale is now up to 7 times faster + when processing many matched lines. + +** Maintenance + + grep's --mmap option was disabled in March of 2010, and began to + elicit a warning in January of 2012. Now it is completely gone. + + +* Noteworthy changes in release 2.16 (2014-01-01) [stable] + +** Bug fixes + + Fix gnulib-provided maint.mk so that the release procedure described + in README-release actually does what we want. Before that fix, that + procedure resulted in a grep-2.15 tarball that would lead to a grep + binary whose --version-reported version number was 2.14.51... + + The fix to make \s and \S work with multi-byte white space broke + the use of each shortcut whenever followed by a repetition operator. + For example, \s*, \s+, \s? and \s{3} would all malfunction in a + multi-byte locale. [bug introduced in grep-2.15] + + The fix to make grep -P work better with UTF-8 made it possible for + grep to evoke a larger set of PCRE errors, some of which could trigger + an abort. E.g., this would abort: + printf '\x82'|LC_ALL=en_US.UTF-8 grep -P y + Now grep handles arbitrary PCRE errors. [bug introduced in grep-2.15] + + Handle very long lines (2GiB and longer) on systems with a deficient + read system call. + +* Noteworthy changes in release 2.15 (2013-10-26) [stable] + +** Bug fixes + + grep's \s and \S failed to work with multi-byte white space characters. + For example, \s would fail to match a non-breaking space, and this + would print nothing: printf '\xc2\xa0' | LC_ALL=en_US.UTF-8 grep '\s' + A related bug is that \S would mistakenly match an invalid multibyte + character. For example, the following would match: + printf '\x82\n' | LC_ALL=en_US.UTF-8 grep '^\S$' + [bug present since grep-2.6] + + grep -i would segfault on systems using UTF-16-based wchar_t (Cygwin) + when converting an input string containing certain 4-byte UTF-8 + sequences to lower case. The conversions to wchar_t and back to + a UTF-8 multibyte string did not take surrogate pairs into account. + [bug present since at least grep-2.6, though the segfault is new with 2.13] + + grep -E would segfault when given a regexp like '([^.]*[M]){1,2}' + for any multibyte character M. [bug introduced in grep-2.6, which would + segfault, but 2.7 and 2.8 had no problem, and 2.9 through 2.14 would + hit a failed assertion. ] + + grep -F would get stuck in an infinite loop when given a search string + that is an invalid byte sequence in the current locale and that matches + the bytes of the input twice on a line. Now grep fails with exit status 1. + + grep -P could misbehave. While multi-byte mode is only supported by PCRE + with UTF-8 locales, grep did not activate it. This would cause failures + to match multibyte characters against some regular expressions, especially + those including the '.' or '\p' metacharacters. + +** New features + + grep -P can now use a just-in-time compiler to greatly speed up matches, + This feature is transparent to the user; no flag is required to enable + it. It is only available if the corresponding support in the PCRE + library is detected when grep is compiled. + + +* Noteworthy changes in release 2.14 (2012-08-20) [stable] + +** Bug fixes + + grep -i '^$' could exit 0 (i.e., report a match) in a multi-byte locale, + even though there was no match, and the command generated no output. + E.g., seq 2 | LC_ALL=en_US.utf8 grep -il '^$' would mistakenly print + "(standard input)". Related, seq 9 | LC_ALL=en_US.utf8 grep -in '^$' + would print "2:4:6:8:10:12:14:16" and exit 0. Now it prints nothing + and exits with status of 1. [bug introduced in grep-2.6] + + 'grep' no longer falsely reports text files as being binary on file + systems that compress contents or that store tiny contents in metadata. + + +* Noteworthy changes in release 2.13 (2012-07-04) [stable] + +** Bug fixes + + grep -i, in a multi-byte locale, when matching a line containing a character + like the UTF-8 Turkish I-with-dot (U+0130) (whose lower-case representation + occupies fewer bytes), would print an incomplete output line. + Similarly, with a matched line containing a character (e.g., the Latin + capital I in a Turkish UTF-8 locale), where the lower-case representation + occupies more bytes, grep could print garbage. + [bug introduced in grep-2.6] + + --include and --exclude can again be combined, and again apply to + the command line, e.g., "grep --include='*.[ch]' --exclude='system.h' + PATTERN *" again reads all *.c and *.h files except for system.h. + [bug introduced in grep-2.6] + +** New features + + 'grep' without -z now treats a sparse file as binary, if it can + easily determine that the file is sparse. + +** Dropped features + + Bootstrapping with Makefile.boot has been broken since grep 2.6, + and was removed. + + +* Noteworthy changes in release 2.12 (2012-04-23) [stable] + +** Bug fixes + + "echo P|grep --devices=skip P" once again prints P, as it did in 2.10 + [bug introduced in grep-2.11] + + grep no longer segfaults with -r --exclude-dir and no file operand. + I.e., ":|grep -r --exclude-dir=D PAT" would segfault. + [bug introduced in grep-2.11] + + Recursive grep now uses fts for directory traversal, so it can + handle much-larger directories without reporting things like "File + name too long", and it can run much faster when dealing with large + directory hierarchies. [bug present since the beginning] + + grep -E 'a{1000000000}' now reports an overflow error rather than + silently acting like grep -E 'a\{1000000000}'. + + grep -E 'a{,10}' was not treated equivalently to grep -E 'a{0,10}'. + +** New features + + The -R option now has a long-option alias --dereference-recursive. + +** Changes in behavior + + The -r (--recursive) option now follows only command-line symlinks. + Also, by default -r now reads a device only if it is named on the command + line; this can be overridden with --devices. -R acts as before, so + use -R if you prefer the old behavior of following all symlinks and + defaulting to reading all devices. + + +* Noteworthy changes in release 2.11 (2012-03-02) [stable] + +** Bug fixes + + grep no longer dumps core on lines whose lengths do not fit in 'int'. + (e.g., lines longer than 2 GiB on a typical 64-bit host). + Instead, grep either works as expected, or reports an error. + An error can occur if not enough main memory is available, or if the + GNU C library's regular expression functions cannot handle such long lines. + [bug present since "the beginning"] + + The -m, -A, -B, and -C options no longer mishandle context line + counts that do not fit in 'int'. Also, grep -c's counts are now + limited by the type 'intmax_t' (typically less than 2**63) rather + than 'int' (typically less than 2**31). + + grep no longer silently suppresses errors when reading a directory + as if it were a text file. For example, "grep x ." now reports a + read error on most systems; formerly, it ignored the error. + [bug introduced in grep-2.5] + + grep now exits with status 2 if a directory loop is found, + instead of possibly exiting with status 0 or 1. + [bug introduced in grep-2.3] + + The -s option now suppresses certain input error diagnostics that it + formerly failed to suppress. These include errors when closing the + input, when lseeking the input, and when the input is also the output. + [bug introduced in grep-2.4] + + On POSIX systems, commands like "grep PAT < FILE >> FILE" + now report an error instead of looping. + [bug present since "the beginning"] + + The --include, --exclude, and --exclude-dir options now handle + command-line arguments more consistently. --include and --exclude + apply only to non-directories and --exclude-dir applies only to + directories. "-" (standard input) is never excluded, since it is + not a file name. + [bug introduced in grep-2.5] + + grep no longer rejects "grep -qr . > out", i.e., when run with -q + and an input file is the same as the output file, since with -q + grep generates no output, so there is no risk of infinite loop or + of an output-affecting race condition. Thus, the use of the following + options also disables the input-equals-output failure: + --max-count=N (-m) (for N >= 2) + --files-with-matches (-l) + --files-without-match (-L) + [bug introduced in grep-2.10] + + grep no longer emits an error message and quits on MS-Windows when + invoked with the -r option. + + grep no longer misinterprets some alternations involving anchors + (^, $, \< \> \B, \b). For example, grep -E "(^|\B)a" no + longer reports a match for the string "x a". + [bug present since "the beginning"] + +** New features + + If no file operand is given, and a command-line -r or equivalent + option is given, grep now searches the working directory. Formerly + grep ignored the -r and searched standard input nonrecursively. + An -r found in GREP_OPTIONS does not have this new effect. + + grep now supports color highlighting of matches on MS-Windows. + +** Changes in behavior + + Use of the --mmap option now elicits a warning. It has been a no-op + since March of 2010. + + grep no longer diagnoses write errors repeatedly; it exits after + diagnosing the first write error. This is better behavior when + writing to a dangling pipe. + + Syntax errors in GREP_COLORS are now ignored, instead of sometimes + eliciting warnings. This is more consistent with programs that + (e.g.) ignore errors in termcap entries. + +* Noteworthy changes in release 2.10 (2011-11-16) [stable] + +** Bug fixes + + grep no longer mishandles high-bit-set pattern bytes on systems + where "char" is a signed type. [bug appears to affect only MS-Windows] + + On POSIX systems, grep now rejects a command like "grep -r pattern . > out", + in which the output file is also one of the inputs, + because it can result in an "infinite" disk-filling loop. + [bug present since "the beginning"] + +** Build-related + + "make dist" no longer builds .tar.gz files. + xz is portable enough and in wide-enough use that distributing + only .tar.xz files is enough. + + +* Noteworthy changes in release 2.9 (2011-06-21) [stable] + +** Bug fixes + + grep no longer clobbers heap for an ERE like '(^| )*( |$)' + [bug introduced in grep-2.6] + + grep is faster on regular expressions that match multibyte characters + in brackets (such as '[áéíóú]'). + + echo c|grep '[c]' would fail for any c in 0x80..0xff, with a uni-byte + encoding for which the byte-to-wide-char mapping is nontrivial. For + example, the ISO-88591 locales are not affected, but ru_RU.KOI8-R is. + [bug introduced in grep-2.6] + + grep -P no longer aborts when PCRE's backtracking limit is exceeded + Before, echo aaaaaaaaaaaaaab |grep -P '((a+)*)+$' would abort. Now, + it diagnoses the problem and exits with status 2. + + +* Noteworthy changes in release 2.8 (2011-05-13) [stable] + +** Bug fixes + + echo c|grep '[c]' would fail for any c in 0x80..0xff, and in many locales. + E.g., printf '\xff\n'|grep "$(printf '[\xff]')" || echo FAIL + would print FAIL rather than the required matching line. + [bug introduced in grep-2.6] + + grep's interpretation of range expression is now more consistent with + that of other tools. [bug present since multi-byte character set + support was introduced in 2.5.2, though the steps needed to reproduce + it changed in grep-2.6] + + grep erroneously returned with exit status 1 on some memory allocation + failure. [bug present since "the beginning"] + + +* Noteworthy changes in release 2.7 (2010-09-16) [stable] + +** Bug fixes + + grep --include=FILE works once again, rather than working like --exclude=FILE + [bug introduced in grep-2.6] + + Searching with grep -Fw for an empty string would not match an + empty line. [bug present since "the beginning"] + + X{0,0} is implemented correctly. It used to be a synonym of X{0,1}. + [bug present since "the beginning"] + + In multibyte locales, regular expressions including back-references + no longer exhibit quadratic complexity (i.e., they are orders + of magnitude faster). [bug present since multi-byte character set + support was introduced in 2.5.2] + + In UTF-8 locales, regular expressions including "." can be orders + of magnitude faster. For example, "grep ." is now twice as fast + as "grep -v ^$", instead of being immensely slower. It remains + slow in other multibyte locales. [bug present since multi-byte + character set support was introduced in 2.5.2] + + --mmap was meant to be ignored in 2.6.x, but it was instead + removed by mistake. [bug introduced in 2.6] + +** New features + + grep now diagnoses (and fails with exit status 2) commonly mistyped + regular expression like [:space:], [:digit:], etc. Before, those were + silently interpreted as [ac:eps] and [dgit:] respectively. Virtually + all who make that class of mistake should have used [[:space:]] or + [[:digit:]]. This new behavior is disabled when the POSIXLY_CORRECT + environment variable is set. + + On systems using glibc, grep can support equivalence classes. However, + whether they actually work depends on glibc's locale definitions. + +* Noteworthy changes in release 2.6.3 (2010-04-02) [stable] + +** Bug fixes + + Searching with grep -F for an empty string in a multibyte locale + would hang grep. [bug introduced in 2.6.2] + + PCRE support is once again detected on systems with <pcre/pcre.h> + [bug introduced in 2.6.2] + + +* Noteworthy changes in release 2.6.2 (2010-03-29) [stable] + +** Bug fixes + + grep -F no longer mistakenly reports a match when searching + for an incomplete prefix of a multibyte character. + [bug present since "the beginning"] + + grep -F no longer goes into an infinite loop when it finds a match for an + incomplete (non-prefix of a) multibyte character. [bug introduced in 2.6] + + Using any of the --include or --exclude* options would cause a NULL + dereference. [bugs introduced in 2.6] + +** Build-related + + configure no longer relies on pkg-config to detect PCRE support. + + +* Noteworthy changes in release 2.6.1 (2010-03-25) [stable] + +** Bug fixes + + Character classes could cause a segmentation fault if they included a + multibyte character. [bug introduced in 2.6] + + Character ranges would not work in single-byte character sets other + than C (for example, ISO-8859-1 or KOI8-R) and some multi-byte locales. + For example, this should print "1", but would find no match: + $ echo 1 | env -i LC_COLLATE=en_US.UTF-8 grep '[0-9]' + [bug introduced in 2.6] + + The output of grep was incorrect for whole-word (-w) matches if the + patterns included a back-reference. [bug introduced in grep-2.5.2] + +** Portability + + Avoid a link failure on Solaris 8. + + +* Noteworthy changes in release 2.6 (2010-03-23) [stable] + +** Speed improvements + + grep is much faster on multibyte character sets, especially (but not + limited to) UTF-8 character sets. The speed improvement is also very + pronounced with case-insensitive matches. + +** Bug fixes + + Character classes would malfunction in multi-byte locales when using grep -i. + Examples which would print nothing for LC_ALL=en_US.UTF-8 include: + - for ranges, echo Z | grep -i '[a-z]' + - for single characters, echo Y | grep -i '[y]' + - for character types, echo Y | grep -i '[[:lower:]]' + + grep -i -o would fail to report some matches; grep -i --color, while not + missing any line containing a match, would fail to color some matches. + + grep would fail to report a match in a multibyte character set other than + UTF-8, if another match occurred earlier in the line but started in the + middle of a multibyte character. + + Various bugs in grep -P, caused by expressions such as [^b] or \S matching + newlines, were fixed. grep -P also supports the special sequences \Z and + \z, and can be combined with the command-line option -z to perform searches + on NUL-separated records. + + grep would mistakenly exit with status 1 upon error, rather than 2, + as it is documented to do. + + Using options like -1 -2 or -1 -v -2 results in two lines of + context (the last value that appears on the command line) instead + twelve (the concatenation of all the values). This is consistent + with the behavior of options -A/-B/-C. + + Two new command-line options, --group-separator=ARGUMENT and + --no-group-separator, enable further customization of the output + when -A, -B or -C is being used. + +** Other changes + + egrep accepts the -E option and fgrep accepts the -F option. If egrep + and fgrep are given another of the -E/-F/-G options, they print a more + meaningful error message. + +* Noteworthy changes in release 2.5.4 (2009-02-10) [stable] + + - This is a bugfix release. No new features. + +Version 2.5.3 + - The new option --exclude-dir allows to specify a directory pattern that + will be excluded from recursive grep. + - Numerous bug fixes + +Version 2.5.1 + - This is a bugfix release. No new features. + +Version 2.5 + - The new option --label allows to specify a different name for input + from stdin. See the man or info pages for details. + + - The internal lib/getopt* files are no longer used on systems providing + getopt functionality in their libc (e.g. glibc 2.2.x). + If you need the old getopt files, use --with-included-getopt. + + - The new option --only-matching (-o) will print only the part of matching + lines that matches the pattern. This is useful, for example, to extract + IP addresses from log files. + + - i18n bug fixed ([A-Z0-9] wouldn't match A in locales other than C on + systems using recent glibc builds + + - GNU grep can now be built with autoconf 2.52. + + - The new option --devices controls how grep handles device files. Its usage + is analogous to --directories. + + - The new option --line-buffered fflush on everyline. There is a noticeable + slow down when forcing line buffering. + + - Back-references are now local to the regex. + grep -e '\(a\)\1' -e '\(b\)\1' + The last backref \1 in the second expression refer to \(b\) + + - The new option --include=PATTERN will search only matching files + when recursing in directories + + - The new option --exclude=PATTERN will skip matching files when + recursing in directories. + + - The new option --color will use the environment variable GREP_COLOR + (default is red) to highlight the matching string. + --color takes an optional argument specifying when to colorize a line: + --color=always, --color=tty, --color=never + + - The following changes are for POSIX conformance: + + . The -q or --quiet or --silent option now causes grep to exit + with zero status when a input line is selected, even if an error + also occurs. + + . The -s or --no-messages option no longer affects the exit status. + + . Bracket regular expressions like [a-z] are now locale-dependent. + For example, many locales sort characters in dictionary order, + and in these locales the regular expression [a-d] is not + equivalent to [abcd]; it might be equivalent to [aBbCcDd], for + example. To obtain the traditional interpretation of bracket + expressions, you can use the C locale by setting the LC_ALL + environment variable to the value "C". + + - The -C or --context option now requires an argument, partly for + consistency, and partly because POSIX recommends against + optional arguments. + + - The new -P or --perl-regexp option tells grep to interpret the pattern as + a Perl regular expression. + + - The new option --max-count=num makes grep stop reading a file after num + matching lines. + New option -m; equivalent to --max-count. + + - Translations for bg, ca, da, nb and tr have been added. + +Version 2.4.2 + + - Added more check in configure to default the grep-${version}/src/regex.c + instead of the one in GNU Lib C. + +Version 2.4.1 + + - If the final byte of an input file is not a newline, grep now silently + supplies one. + + - The new option --binary-files=TYPE makes grep assume that a binary input + file is of type TYPE. + --binary-files='binary' (the default) outputs a 1-line summary of matches. + --binary-files='without-match' assumes binary files do not match. + --binary-files='text' treats binary files as text + (equivalent to the -a or --text option). + + - New option -I; equivalent to --binary-files='without-match'. + +Version 2.4: + + - egrep is now equivalent to 'grep -E' as required by POSIX, + removing a longstanding source of confusion and incompatibility. + 'grep' is now more forgiving about stray '{'s, for backward + compatibility with traditional egrep. + + - The lower bound of an interval is not optional. + You must use an explicit zero, e.g. 'x{0,10}' instead of 'x{,10}'. + (The old documentation incorrectly claimed that it was optional.) + + - The --revert-match option has been renamed to --invert-match. + + - The --fixed-regexp option has been renamed to --fixed-strings. + + - New option -H or --with-filename. + + - New option --mmap. By default, GNU grep now uses read instead of mmap. + This is faster on some hosts, and is safer on all. + + - The new option -z or --null-data causes 'grep' to treat a zero byte + (the ASCII NUL character) as a line terminator in input data, and + to treat newlines as ordinary data. + + - The new option -Z or --null causes 'grep' to output a zero byte + instead of the normal separator after a file name. + + - These two options can be used with commands like 'find -print0', + 'perl -0', 'sort -z', and 'xargs -0' to process arbitrary file names, + even those that contain newlines. + + - The environment variable GREP_OPTIONS specifies default options; + e.g. GREP_OPTIONS='--directories=skip' reestablishes grep 2.1's + behavior of silently skipping directories. + + - You can specify a matcher multiple times without error, e.g. + 'grep -E -E' or 'fgrep -F'. It is still an error to specify + conflicting matchers. + + - -u and -U are now allowed on non-DOS hosts, and have no effect. + + - Modifications of the tests scripts to go around the "Broken Pipe" + errors from bash. See Bash FAQ. + + - New option -r or --recursive or --directories=recurse. + (This option was also in grep 2.3, but wasn't announced here.) + + - --without-included-regex disable, was causing bogus reports .i.e + doing more harm then good. + +Version 2.3: + + - When searching a binary file FOO, grep now just reports + "Binary file FOO matches" instead of outputting binary data. + This is typically more useful than the old behavior, + and it is also more consistent with other utilities like 'diff'. + A file is considered to be binary if it contains a NUL (i.e. zero) byte. + + The new -a or --text option causes 'grep' to assume that all + input is text. (This option has the same meaning as with 'diff'.) + Use it if you want binary data in your output. + + - 'grep' now searches directories just like ordinary files; it no longer + silently skips directories. This is the traditional behavior of + Unix text utilities (in particular, of traditional 'grep'). + Hence 'grep PATTERN DIRECTORY' should report + "grep: DIRECTORY: Is a directory" on hosts where the operating system + does not permit programs to read directories directly, and + "grep: DIRECTORY: Binary file matches" (or nothing) otherwise. + + The new -d ACTION or --directories=ACTION option affects directory handling. + '-d skip' causes 'grep' to silently skip directories, as in grep 2.1; + '-d read' (the default) causes 'grep' to read directories if possible, + as in earlier versions of grep. + + - The MS-DOS and Microsoft Windows ports now behave identically to the + GNU and Unix ports with respect to binary files and directories. + +Version 2.2: + +Bug fix release. + + - Status error number fix. + - Skipping directories removed. + - Many typos fix. + - -f /dev/null fix(not to consider as an empty pattern). + - Checks for wctype/wchar. + - -E was using the wrong matcher fix. + - bug in regex char class fix + - Fixes for DJGPP + +Version 2.1: + +This is a bug fix release(see Changelog) i.e. no new features. + + - More compliance to GNU standard. + - Long options. + - Internationalization. + - Use automake/autoconf. + - Directory hierarchy change. + - Sigvec with -e on Linux corrected. + - Sigvec with -f on Linux corrected. + - Sigvec with the mmap() corrected. + - Bug in kwset corrected. + - -q, -L and -l stop on first match. + - New and improve regex.[ch] from Ulrich Drepper. + - New and improve dfa.[ch] from Arnold Robbins. + - Prototypes for over zealous C compiler. + - Not scanning a file, if it's a directory + (cause problems on Sun). + - Ported to MS-DOS/MS-Windows with DJGPP tools. + +See Changelog for the full story and proper credits. + +Version 2.0: + +The most important user visible change is that egrep and fgrep have +disappeared as separate programs into the single grep program mandated +by POSIX 1003.2. New options -G, -E, and -F have been added, +selecting grep, egrep, and fgrep behavior respectively. For +compatibility with historical practice, hard links named egrep and +fgrep are also provided. See the manual page for details. + +In addition, the regular expression facilities described in Posix +draft 11.2 are now supported, except for internationalization features +related to locale-dependent collating sequence information. + +There is a new option, -L, which is like -l except it lists +files which don't contain matches. The reason this option was +added is because '-l -v' doesn't do what you expect. + +Performance has been improved; the amount of improvement is platform +dependent, but (for example) grep 2.0 typically runs at least 30% faster +than grep 1.6 on a DECstation using the MIPS compiler. Where possible, +grep now uses mmap() for file input; on a Sun 4 running SunOS 4.1 this +may cut system time by as much as half, for a total reduction in running +time by nearly 50%. On machines that don't use mmap(), the buffering +code has been rewritten to choose more favorable alignments and buffer +sizes for read(). + +Portability has been substantially cleaned up, and an automatic +configure script is now provided. + +The internals have changed in ways too numerous to mention. +People brave enough to reuse the DFA matcher in other programs +will now have their bravery amply "rewarded", for the interface +to that file has been completely changed. Some changes were +necessary to track the evolution of the regex package, and since +I was changing it anyway I decided to do a general cleanup. + +======================================================================== +Copyright (C) 1992, 1997-2002, 2004-2021 Free Software Foundation, Inc. + + Copying and distribution of this file, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.3 or +any later version published by the Free Software Foundation; with no +Invariant Sections, with no Front-Cover Texts, and with no Back-Cover +Texts. A copy of the license is included in the "GNU Free +Documentation License" file as part of this distribution. |