diff options
Diffstat (limited to 'man7/glob.7')
-rw-r--r-- | man7/glob.7 | 205 |
1 files changed, 205 insertions, 0 deletions
diff --git a/man7/glob.7 b/man7/glob.7 new file mode 100644 index 0000000..466701c --- /dev/null +++ b/man7/glob.7 @@ -0,0 +1,205 @@ +.\" Copyright (c) 1998 Andries Brouwer +.\" +.\" SPDX-License-Identifier: GPL-2.0-or-later +.\" +.\" 2003-08-24 fix for / by John Kristoff + joey +.\" +.TH glob 7 2023-03-08 "Linux man-pages 6.05.01" +.SH NAME +glob \- globbing pathnames +.SH DESCRIPTION +Long ago, in UNIX\ V6, there was a program +.I /etc/glob +that would expand wildcard patterns. +Soon afterward this became a shell built-in. +.PP +These days there is also a library routine +.BR glob (3) +that will perform this function for a user program. +.PP +The rules are as follows (POSIX.2, 3.13). +.SS Wildcard matching +A string is a wildcard pattern if it contains one of the +characters \[aq]?\[aq], \[aq]*\[aq], or \[aq][\[aq]. +Globbing is the operation +that expands a wildcard pattern into the list of pathnames +matching the pattern. +Matching is defined by: +.PP +A \[aq]?\[aq] (not between brackets) matches any single character. +.PP +A \[aq]*\[aq] (not between brackets) matches any string, +including the empty string. +.PP +.B "Character classes" +.PP +An expression "\fI[...]\fP" where the first character after the +leading \[aq][\[aq] is not an \[aq]!\[aq] matches a single character, +namely any of the characters enclosed by the brackets. +The string enclosed by the brackets cannot be empty; +therefore \[aq]]\[aq] can be allowed between the brackets, provided +that it is the first character. +(Thus, "\fI[][!]\fP" matches the +three characters \[aq][\[aq], \[aq]]\[aq], and \[aq]!\[aq].) +.PP +.B Ranges +.PP +There is one special convention: +two characters separated by \[aq]\-\[aq] denote a range. +(Thus, +"\fI[A\-Fa\-f0\-9]\fP" is equivalent to "\fI[ABCDEFabcdef0123456789]\fP".) +One may include \[aq]\-\[aq] in its literal meaning +by making it the first or last character between the brackets. +(Thus, +"\fI[]\-]\fP" matches just the two characters \[aq]]\[aq] and \[aq]\-\[aq], +and "\fI[\-\-0]\fP" matches the +three characters \[aq]\-\[aq], \[aq].\[aq], and \[aq]0\[aq], +since \[aq]/\[aq] cannot be matched.) +.PP +.B Complementation +.PP +An expression "\fI[!...]\fP" matches a single character, namely +any character that is not matched by the expression obtained +by removing the first \[aq]!\[aq] from it. +(Thus, "\fI[!]a\-]\fP" matches any +single character except \[aq]]\[aq], \[aq]a\[aq], and \[aq]\-\[aq].) +.PP +One can remove the special meaning of \[aq]?\[aq], \[aq]*\[aq], and \[aq][\[aq] +by preceding them by a backslash, +or, +in case this is part of a shell command line, +enclosing them in quotes. +Between brackets these characters stand for themselves. +Thus, "\fI[[?*\e]\fP" matches the +four characters \[aq][\[aq], \[aq]?\[aq], \[aq]*\[aq], and \[aq]\e\[aq]. +.SS Pathnames +Globbing is applied on each of the components of a pathname +separately. +A \[aq]/\[aq] in a pathname cannot be matched by a \[aq]?\[aq] or \[aq]*\[aq] +wildcard, or by a range like "\fI[.\-0]\fP". +A range containing an explicit \[aq]/\[aq] character is syntactically incorrect. +(POSIX requires that syntactically incorrect patterns are left unchanged.) +.PP +If a filename starts with a \[aq].\[aq], +this character must be matched explicitly. +(Thus, \fIrm\ *\fP will not remove .profile, and \fItar\ c\ *\fP will not +archive all your files; \fItar\ c\ .\fP is better.) +.SS Empty lists +The nice and simple rule given above: "expand a wildcard pattern +into the list of matching pathnames" was the original UNIX +definition. +It allowed one to have patterns that expand into +an empty list, as in +.PP +.nf + xv \-wait 0 *.gif *.jpg +.fi +.PP +where perhaps no *.gif files are present (and this is not +an error). +However, POSIX requires that a wildcard pattern is left +unchanged when it is syntactically incorrect, or the list of +matching pathnames is empty. +With +.I bash +one can force the classical behavior using this command: +.PP +.in +4n +.EX +shopt \-s nullglob +.EE +.in +.\" In Bash v1, by setting allow_null_glob_expansion=true +.PP +(Similar problems occur elsewhere. +For example, where old scripts have +.PP +.in +4n +.EX +rm \`find . \-name "*\[ti]"\` +.EE +.in +.PP +new scripts require +.PP +.in +4n +.EX +rm \-f nosuchfile \`find . \-name "*\[ti]"\` +.EE +.in +.PP +to avoid error messages from +.I rm +called with an empty argument list.) +.SH NOTES +.SS Regular expressions +Note that wildcard patterns are not regular expressions, +although they are a bit similar. +First of all, they match +filenames, rather than text, and secondly, the conventions +are not the same: for example, in a regular expression \[aq]*\[aq] means zero or +more copies of the preceding thing. +.PP +Now that regular expressions have bracket expressions where +the negation is indicated by a \[aq]\[ha]\[aq], POSIX has declared the +effect of a wildcard pattern "\fI[\[ha]...]\fP" to be undefined. +.SS Character classes and internationalization +Of course ranges were originally meant to be ASCII ranges, +so that "\fI[\ \-%]\fP" stands for "\fI[\ !"#$%]\fP" and "\fI[a\-z]\fP" stands +for "any lowercase letter". +Some UNIX implementations generalized this so that a range X\-Y +stands for the set of characters with code between the codes for +X and for Y. +However, this requires the user to know the +character coding in use on the local system, and moreover, is +not convenient if the collating sequence for the local alphabet +differs from the ordering of the character codes. +Therefore, POSIX extended the bracket notation greatly, +both for wildcard patterns and for regular expressions. +In the above we saw three types of items that can occur in a bracket +expression: namely (i) the negation, (ii) explicit single characters, +and (iii) ranges. +POSIX specifies ranges in an internationally +more useful way and adds three more types: +.PP +(iii) Ranges X\-Y comprise all characters that fall between X +and Y (inclusive) in the current collating sequence as defined +by the +.B LC_COLLATE +category in the current locale. +.PP +(iv) Named character classes, like +.PP +.nf +[:alnum:] [:alpha:] [:blank:] [:cntrl:] +[:digit:] [:graph:] [:lower:] [:print:] +[:punct:] [:space:] [:upper:] [:xdigit:] +.fi +.PP +so that one can say "\fI[[:lower:]]\fP" instead of "\fI[a\-z]\fP", and have +things work in Denmark, too, where there are three letters past \[aq]z\[aq] +in the alphabet. +These character classes are defined by the +.B LC_CTYPE +category +in the current locale. +.PP +(v) Collating symbols, like "\fI[.ch.]\fP" or "\fI[.a-acute.]\fP", +where the string between "\fI[.\fP" and "\fI.]\fP" is a collating +element defined for the current locale. +Note that this may +be a multicharacter element. +.PP +(vi) Equivalence class expressions, like "\fI[=a=]\fP", +where the string between "\fI[=\fP" and "\fI=]\fP" is any collating +element from its equivalence class, as defined for the +current locale. +For example, "\fI[[=a=]]\fP" might be equivalent +to "\fI[a\('a\(`a\(:a\(^a]\fP", that is, +to "\fI[a[.a-acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]\fP". +.SH SEE ALSO +.BR sh (1), +.BR fnmatch (3), +.BR glob (3), +.BR locale (7), +.BR regex (7) |