diff options
Diffstat (limited to 'upstream/mageia-cauldron/man1p/awk.1p')
-rw-r--r-- | upstream/mageia-cauldron/man1p/awk.1p | 4036 |
1 files changed, 4036 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1p/awk.1p b/upstream/mageia-cauldron/man1p/awk.1p new file mode 100644 index 00000000..14f68be2 --- /dev/null +++ b/upstream/mageia-cauldron/man1p/awk.1p @@ -0,0 +1,4036 @@ +'\" et +.TH AWK "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual" +.\" +.SH PROLOG +This manual page is part of the POSIX Programmer's Manual. +The Linux implementation of this interface may differ (consult +the corresponding Linux manual page for details of Linux behavior), +or the interface may not be implemented on Linux. +.\" +.SH NAME +awk +\(em pattern scanning and processing language +.SH SYNOPSIS +.LP +.nf +awk \fB[\fR-F \fIsepstring\fB] [\fR-v \fIassignment\fB]\fR... \fIprogram\fB [\fIargument\fR...\fB]\fR +.P +awk \fB[\fR-F \fIsepstring\fB] \fR-f \fIprogfile \fB[\fR-f \fIprogfile\fB]\fR... \fB[\fR-v \fIassignment\fB]\fR... + \fB[\fIargument\fR...\fB]\fR +.fi +.SH DESCRIPTION +The +.IR awk +utility shall execute programs written in the +.IR awk +programming language, which is specialized for textual data +manipulation. An +.IR awk +program is a sequence of patterns and corresponding actions. When +input is read that matches a pattern, the action associated with that +pattern is carried out. +.P +Input shall be interpreted as a sequence of records. By default, a +record is a line, less its terminating +<newline>, +but this can be changed by using the +.BR RS +built-in variable. Each record of input shall be matched in turn +against each pattern in the program. For each pattern matched, the +associated action shall be executed. +.P +The +.IR awk +utility shall interpret each input record as a sequence of fields +where, by default, a field is a string of non-\c +<blank> +non-\c +<newline> +characters. This default +<blank> +and +<newline> +field delimiter can be changed by using the +.BR FS +built-in variable or the +.BR \-F +.IR sepstring +option. The +.IR awk +utility shall denote the first field in a record $1, the second $2, and +so on. The symbol $0 shall refer to the entire record; setting any +other field causes the re-evaluation of $0. Assigning to $0 shall reset +the values of all other fields and the +.BR NF +built-in variable. +.SH OPTIONS +The +.IR awk +utility shall conform to the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 12.2" ", " "Utility Syntax Guidelines". +.P +The following options shall be supported: +.IP "\fB\-F\ \fIsepstring\fR" 10 +Define the input field separator. This option shall be equivalent to: +.RS 10 +.sp +.RS 4 +.nf + +-v FS=\fIsepstring +.fi +.P +.RE +.P +except that if +.BR \-F +.IR sepstring +and +.BR \-v +.IR \fRFS=\fPsepstring\fR +are both used, it is unspecified whether the +.BR FS +assignment resulting from +.BR \-F +.IR sepstring +is processed in command line order or is processed after the last +.BR \-v +.IR \fRFS=\fPsepstring\fR . +See the description of the +.BR FS +built-in variable, and how it is used, in the EXTENDED DESCRIPTION +section. +.RE +.IP "\fB\-f\ \fIprogfile\fR" 10 +Specify the pathname of the file +.IR progfile +containing an +.IR awk +program. A pathname of +.BR '\-' +shall denote the standard input. If multiple instances of this option +are specified, the concatenation of the files specified as +.IR progfile +in the order specified shall be the +.IR awk +program. The +.IR awk +program can alternatively be specified in the command line as a single +argument. +.IP "\fB\-v\ \fIassignment\fR" 10 +.br +The application shall ensure that the +.IR assignment +argument is in the same form as an +.IR assignment +operand. The specified variable assignment shall occur prior to +executing the +.IR awk +program, including the actions associated with +.BR BEGIN +patterns (if any). Multiple occurrences of this option can be +specified. +.SH OPERANDS +The following operands shall be supported: +.IP "\fIprogram\fR" 10 +If no +.BR \-f +option is specified, the first operand to +.IR awk +shall be the text of the +.IR awk +program. The application shall supply the +.IR program +operand as a single argument to +.IR awk . +If the text does not end in a +<newline>, +.IR awk +shall interpret the text as if it did. +.IP "\fIargument\fR" 10 +Either of the following two types of +.IR argument +can be intermixed: +.RS 10 +.IP "\fIfile\fR" 10 +A pathname of a file that contains the input to be read, which is +matched against the set of patterns in the program. If no +.IR file +operands are specified, or if a +.IR file +operand is +.BR '\-' , +the standard input shall be used. +.IP "\fIassignment\fR" 10 +An operand that begins with an +<underscore> +or alphabetic character from the portable character set (see the table +in the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 6.1" ", " "Portable Character Set"), +followed by a sequence of underscores, digits, and alphabetics from the +portable character set, followed by the +.BR '=' +character, shall specify a variable assignment rather than a pathname. +The characters before the +.BR '=' +represent the name of an +.IR awk +variable; if that name is an +.IR awk +reserved word (see +.IR "Grammar") +the behavior is undefined. The characters following the +<equals-sign> +shall be interpreted as if they appeared in the +.IR awk +program preceded and followed by a double-quote (\c +.BR '\&"' ) +character, as a +.BR STRING +token (see +.IR "Grammar"), +except that if the last character is an unescaped +<backslash>, +it shall be interpreted as a literal +<backslash> +rather than as the first character of the sequence +.BR \(dq\e"\(dq . +The variable shall be assigned the value of that +.BR STRING +token and, if appropriate, shall be considered a +.IR "numeric string" +(see +.IR "Expressions in awk"), +the variable shall also be assigned its numeric value. Each such +variable assignment shall occur just prior to the processing of the +following +.IR file , +if any. Thus, an assignment before the first +.IR file +argument shall be executed after the +.BR BEGIN +actions (if any), while an assignment after the last +.IR file +argument shall occur before the +.BR END +actions (if any). If there are no +.IR file +arguments, assignments shall be executed before processing the standard +input. +.RE +.SH STDIN +The standard input shall be used only if no +.IR file +operands are specified, or if a +.IR file +operand is +.BR '\-' , +or if a +.IR progfile +option-argument is +.BR '\-' ; +see the INPUT FILES section. If the +.IR awk +program contains no actions and no patterns, but is otherwise a valid +.IR awk +program, standard input and any +.IR file +operands shall not be read and +.IR awk +shall exit with a return status of zero. +.SH "INPUT FILES" +Input files to the +.IR awk +program from any of the following sources shall be text files: +.IP " *" 4 +Any +.IR file +operands or their equivalents, achieved by modifying the +.IR awk +variables +.BR ARGV +and +.BR ARGC +.IP " *" 4 +Standard input in the absence of any +.IR file +operands +.IP " *" 4 +Arguments to the +.BR getline +function +.P +Whether the variable +.BR RS +is set to a value other than a +<newline> +or not, for these files, implementations shall support records +terminated with the specified separator up to +{LINE_MAX} +bytes and may support longer records. +.P +If +.BR \-f +.IR progfile +is specified, the application shall ensure that the files named by each +of the +.IR progfile +option-arguments are text files and their concatenation, in the same +order as they appear in the arguments, is an +.IR awk +program. +.SH "ENVIRONMENT VARIABLES" +The following environment variables shall affect the execution of +.IR awk : +.IP "\fILANG\fP" 10 +Provide a default value for the internationalization variables that are +unset or null. (See the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 8.2" ", " "Internationalization Variables" +for the precedence of internationalization variables used to determine +the values of locale categories.) +.IP "\fILC_ALL\fP" 10 +If set to a non-empty string value, override the values of all the +other internationalization variables. +.IP "\fILC_COLLATE\fP" 10 +.br +Determine the locale for the behavior of ranges, equivalence classes, +and multi-character collating elements within regular expressions and +in comparisons of string values. +.IP "\fILC_CTYPE\fP" 10 +Determine the locale for the interpretation of sequences of bytes of +text data as characters (for example, single-byte as opposed to +multi-byte characters in arguments and input files), the behavior of +character classes within regular expressions, the identification of +characters as letters, and the mapping of uppercase and lowercase +characters for the +.BR toupper +and +.BR tolower +functions. +.IP "\fILC_MESSAGES\fP" 10 +.br +Determine the locale that should be used to affect the format and +contents of diagnostic messages written to standard error. +.IP "\fILC_NUMERIC\fP" 10 +.br +Determine the radix character used when interpreting numeric input, +performing conversions between numeric and string values, and +formatting numeric output. Regardless of locale, the +<period> +character (the decimal-point character of the POSIX locale) is the +decimal-point character recognized in processing +.IR awk +programs (including assignments in command line arguments). +.IP "\fINLSPATH\fP" 10 +Determine the location of message catalogs for the processing of +.IR LC_MESSAGES . +.IP "\fIPATH\fP" 10 +Determine the search path when looking for commands executed by +\fIsystem\fR(\fIexpr\fR), or input and output pipes; see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 8" ", " "Environment Variables". +.P +In addition, all environment variables shall be visible via the +.IR awk +variable +.BR ENVIRON . +.SH "ASYNCHRONOUS EVENTS" +Default. +.SH STDOUT +The nature of the output files depends on the +.IR awk +program. +.SH STDERR +The standard error shall be used only for diagnostic messages. +.SH "OUTPUT FILES" +The nature of the output files depends on the +.IR awk +program. +.br +.SH "EXTENDED DESCRIPTION" +.SS "Overall Program Structure" +.P +An +.IR awk +program is composed of pairs of the form: +.sp +.RS 4 +.nf + +\fIpattern\fR { \fIaction\fR } +.fi +.P +.RE +.P +Either the pattern or the action (including the enclosing brace +characters) can be omitted. +.P +A missing pattern shall match any record of input, and a missing action +shall be equivalent to: +.sp +.RS 4 +.nf + +{ print } +.fi +.P +.RE +.P +Execution of the +.IR awk +program shall start by first executing the actions associated with all +.BR BEGIN +patterns in the order they occur in the program. Then each +.IR file +operand (or standard input if no files were specified) shall be +processed in turn by reading data from the file until a record +separator is seen (\c +<newline> +by default). Before the first reference to a field in the record is +evaluated, the record shall be split into fields, according to the +rules in +.IR "Regular Expressions", +using the value of +.BR FS +that was current at the time the record was read. Each pattern in the +program then shall be evaluated in the order of occurrence, and the +action associated with each pattern that matches the current record +executed. The action for a matching pattern shall be executed before +evaluating subsequent patterns. Finally, the actions associated with +all +.BR END +patterns shall be executed in the order they occur in the program. +.SS "Expressions in awk" +.P +Expressions describe computations used in +.IR patterns +and +.IR actions . +In the following table, valid expression operations are given in groups +from highest precedence first to lowest precedence last, with +equal-precedence operators grouped between horizontal lines. In +expression evaluation, where the grammar is formally ambiguous, higher +precedence operators shall be evaluated before lower precedence +operators. In this table +.IR expr , +.IR expr1 , +.IR expr2 , +and +.IR expr3 +represent any expression, while lvalue represents any entity that can +be assigned to (that is, on the left side of an assignment operator). +The precise syntax of expressions is given in +.IR "Grammar". +.sp +.ce 1 +\fBTable 4-1: Expressions in Decreasing Precedence in \fIawk\fP\fR +.TS +box tab(@) center; +cB | cB | cB | cB +l1f5 | l1 | l1 | l. +Syntax@Name@Type of Result@Associativity +_ +( \fIexpr\fP )@Grouping@Type of \fIexpr\fP@N/A +_ +$\fIexpr\fP@Field reference@String@N/A +_ +lvalue ++@Post-increment@Numeric@N/A +lvalue \-\|\-@Post-decrement@Numeric@N/A +_ +++ lvalue@Pre-increment@Numeric@N/A +\-\|\- lvalue@Pre-decrement@Numeric@N/A +_ +\fIexpr\fP ^ \fIexpr\fP@Exponentiation@Numeric@Right +_ +! \fIexpr\fP@Logical not@Numeric@N/A ++ \fIexpr\fP@Unary plus@Numeric@N/A +\- \fIexpr\fP@Unary minus@Numeric@N/A +_ +\fIexpr\fP * \fIexpr\fP@Multiplication@Numeric@Left +\fIexpr\fP / \fIexpr\fP@Division@Numeric@Left +\fIexpr\fP % \fIexpr\fP@Modulus@Numeric@Left +_ +\fIexpr\fP + \fIexpr\fP@Addition@Numeric@Left +\fIexpr\fP \- \fIexpr\fP@Subtraction@Numeric@Left +_ +\fIexpr\fP \fIexpr\fP@String concatenation@String@Left +_ +\fIexpr\fP < \fIexpr\fP@Less than@Numeric@None +\fIexpr\fP <= \fIexpr\fP@Less than or equal to@Numeric@None +\fIexpr\fP != \fIexpr\fP@Not equal to@Numeric@None +\fIexpr\fP == \fIexpr\fP@Equal to@Numeric@None +\fIexpr\fP > \fIexpr\fP@Greater than@Numeric@None +\fIexpr\fP >= \fIexpr\fP@Greater than or equal to@Numeric@None +_ +\fIexpr\fP ~ \fIexpr\fP@ERE match@Numeric@None +\fIexpr\fP !~ \fIexpr\fP@ERE non-match@Numeric@None +_ +\fIexpr\fP in array@Array membership@Numeric@Left +( \fIindex\fP ) in \fIarray\fP@Multi-dimension array@Numeric@Left +@membership +_ +\fIexpr\fP && \fIexpr\fP@Logical AND@Numeric@Left +_ +\fIexpr\fP || \fIexpr\fP@Logical OR@Numeric@Left +_ +\fIexpr1\fP ? \fIexpr2\fP : \fIexpr3\fP@Conditional expression@Type of selected@Right +@@\fIexpr2\fP or \fIexpr3\fP +_ +lvalue ^= \fIexpr\fP@Exponentiation assignment@Numeric@Right +lvalue %= \fIexpr\fP@Modulus assignment@Numeric@Right +lvalue *= \fIexpr\fP@Multiplication assignment@Numeric@Right +lvalue /= \fIexpr\fP@Division assignment@Numeric@Right +lvalue += \fIexpr\fP@Addition assignment@Numeric@Right +lvalue \-= \fIexpr\fP@Subtraction assignment@Numeric@Right +lvalue = \fIexpr\fP@Assignment@Type of \fIexpr\fP@Right +.TE +.P +Each expression shall have either a string value, a numeric value, or +both. Except as stated for specific contexts, the value of an expression +shall be implicitly converted to the type needed for the context in which +it is used. A string value shall be converted to a numeric value either by +the equivalent of the following calls to functions defined by the ISO\ C standard: +.sp +.RS 4 +.nf + +setlocale(LC_NUMERIC, ""); +\fInumeric_value\fR = atof(\fIstring_value\fR); +.fi +.P +.RE +.P +or by converting the initial portion of the string to type +.BR double +representation as follows: +.sp +.RS +The input string is decomposed into two parts: an initial, possibly empty, +sequence of white-space characters (as specified by +\fIisspace\fR()) +and a subject sequence interpreted as a floating-point constant. +.P +The expected form of the subject sequence is an optional +.BR '+' +or +.BR '\-' +sign, then a non-empty sequence of digits optionally containing a +<period>, +then an optional exponent part. An exponent part consists of +.BR 'e' +or +.BR 'E' , +followed by an optional sign, followed by one or more decimal digits. +.P +The sequence starting with the first digit or the +<period> +(whichever occurs first) is interpreted as a floating constant of the +C language, and if neither an exponent part nor a +<period> +appears, a +<period> +is assumed to follow the last digit in the string. If the subject +sequence begins with a +<hyphen-minus>, +the value resulting from the conversion is negated. +.RE +.P +A numeric value that is exactly equal to the value of an integer (see +.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard") +shall be converted to a string by the equivalent of a call to the +.BR sprintf +function (see +.IR "String Functions") +with the string +.BR \(dq%d\(dq +as the +.IR fmt +argument and the numeric value being converted as the first and only +.IR expr +argument. Any other numeric value shall be converted to a string by the +equivalent of a call to the +.BR sprintf +function with the value of the variable +.BR CONVFMT +as the +.IR fmt +argument and the numeric value being converted as the first and only +.IR expr +argument. The result of the conversion is unspecified if the value of +.BR CONVFMT +is not a floating-point format specification. This volume of POSIX.1\(hy2017 specifies no +explicit conversions between numbers and strings. An application can +force an expression to be treated as a number by adding zero to it, or +can force it to be treated as a string by concatenating the null string +(\c +.BR \(dq\^\(dq ) +to it. +.P +A string value shall be considered a +.IR "numeric string" +if it comes from one of the following: +.IP " 1." 4 +Field variables +.IP " 2." 4 +Input from the +\fIgetline\fR() +function +.IP " 3." 4 +.BR FILENAME +.IP " 4." 4 +.BR ARGV +array elements +.IP " 5." 4 +.BR ENVIRON +array elements +.IP " 6." 4 +Array elements created by the +\fIsplit\fR() +function +.IP " 7." 4 +A command line variable assignment +.IP " 8." 4 +Variable assignment from another numeric string variable +.P +and an implementation-dependent condition corresponding to either +case (a) or (b) below is met. +.IP " a." 4 +After the equivalent of the following calls to functions defined by +the ISO\ C standard, +.IR string_value_end +would differ from +.IR string_value , +and any characters before the terminating null character in +.IR string_value_end +would be +<blank> +characters: +.RS 4 +.sp +.RS 4 +.nf + +char *string_value_end; +setlocale(LC_NUMERIC, ""); +numeric_value = strtod (string_value, &string_value_end); +.fi +.P +.RE +.RE +.IP " b." 4 +After all the following conversions have been applied, the resulting +string would lexically be recognized as a +.BR NUMBER +token as described by the lexical conventions in +.IR "Grammar": +.RS 4 +.IP -- 4 +All leading and trailing +<blank> +characters are discarded. +.IP -- 4 +If the first non-\c +<blank> +is +.BR '\(pl' +or +.BR '\-' , +it is discarded. +.IP -- 4 +Each occurrence of the decimal point character from the current locale +is changed to a +<period>. +.RE +In case (a) the numeric value of the +.IR "numeric string" +shall be the value that would be returned by the +\fIstrtod\fR() +call. In case (b) if the first non-\c +<blank> +is +.BR '\-' , +the numeric value of the +.IR "numeric string" +shall be the negation of the numeric value of the recognized +.BR NUMBER +token; otherwise, the numeric value of the +.IR "numeric string" +shall be the numeric value of the recognized +.BR NUMBER +token. Whether or not a string is a +.IR "numeric string" +shall be relevant only in contexts where that term is used in this +section. +.P +When an expression is used in a Boolean context, if it has a numeric +value, a value of zero shall be treated as false and any other value +shall be treated as true. Otherwise, a string value of the null string +shall be treated as false and any other value shall be treated as true. +A Boolean context shall be one of the following: +.IP " *" 4 +The first subexpression of a conditional expression +.IP " *" 4 +An expression operated on by logical NOT, logical AND, or logical OR +.IP " *" 4 +The second expression of a +.BR for +statement +.IP " *" 4 +The expression of an +.BR if +statement +.IP " *" 4 +The expression of the +.BR while +clause in either a +.BR while +or +.BR do .\|.\|.\c +.BR while +statement +.IP " *" 4 +An expression used as a pattern (as in Overall Program Structure) +.P +All arithmetic shall follow the semantics of floating-point arithmetic as +specified by the ISO\ C standard (see +.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"). +.P +The value of the expression: +.sp +.RS 4 +.nf + +\fIexpr1\fR \(ha \fIexpr2\fR +.fi +.P +.RE +.P +shall be equivalent to the value returned by the ISO\ C standard function call: +.sp +.RS 4 +.nf + +\fRpow(\fIexpr1\fR, \fIexpr2\fR) +.fi +.P +.RE +.P +The expression: +.sp +.RS 4 +.nf + +lvalue \(ha= \fIexpr\fR +.fi +.P +.RE +.P +shall be equivalent to the ISO\ C standard expression: +.sp +.RS 4 +.nf + +lvalue = pow(lvalue, \fIexpr\fR) +.fi +.P +.RE +.P +except that lvalue shall be evaluated only once. The value of the +expression: +.sp +.RS 4 +.nf + +\fIexpr1\fR % \fIexpr2\fR +.fi +.P +.RE +.P +shall be equivalent to the value returned by the ISO\ C standard function call: +.sp +.RS 4 +.nf + +fmod(\fIexpr1\fR, \fIexpr2\fR) +.fi +.P +.RE +.P +The expression: +.sp +.RS 4 +.nf + +lvalue %= \fIexpr\fR +.fi +.P +.RE +.P +shall be equivalent to the ISO\ C standard expression: +.sp +.RS 4 +.nf + +lvalue = fmod(lvalue, \fIexpr\fR) +.fi +.P +.RE +.P +except that lvalue shall be evaluated only once. +.P +Variables and fields shall be set by the assignment statement: +.sp +.RS 4 +.nf + +lvalue = \fIexpression\fR +.fi +.P +.RE +.P +and the type of +.IR expression +shall determine the resulting variable type. The assignment includes +the arithmetic assignments (\c +.BR \(dq+=\(dq , +.BR \(dq-=\(dq , +.BR \(dq*=\(dq , +.BR \(dq/=\(dq , +.BR \(dq%=\(dq , +.BR \(dq\(ha=\(dq , +.BR \(dq++\(dq , +.BR \(dq--\(dq ) +all of which shall produce a numeric result. The left-hand side of an +assignment and the target of increment and decrement operators can be +one of a variable, an array with index, or a field selector. +.P +The +.IR awk +language supplies arrays that are used for storing numbers or strings. +Arrays need not be declared. They shall initially be empty, and their +sizes shall change dynamically. The subscripts, or element identifiers, +are strings, providing a type of associative array capability. An array +name followed by a subscript within square brackets can be used as an +lvalue and thus as an expression, as described in the grammar; see +.IR "Grammar". +Unsubscripted array names can be used in only the following contexts: +.IP " *" 4 +A parameter in a function definition or function call +.IP " *" 4 +The +.BR NAME +token following any use of the keyword +.BR in +as specified in the grammar (see +.IR "Grammar"); +if the name used in this context is not an array name, the behavior is +undefined +.P +A valid array +.IR index +shall consist of one or more +<comma>-separated +expressions, similar to the way in which multi-dimensional arrays are +indexed in some programming languages. Because +.IR awk +arrays are really one-dimensional, such a +<comma>-separated +list shall be converted to a single string by concatenating the string +values of the separate expressions, each separated from the other by +the value of the +.BR SUBSEP +variable. Thus, the following two index operations shall be +equivalent: +.sp +.RS 4 +.nf + +\fIvar\fB[\fIexpr1\fR, \fIexpr2\fR, ... \fIexprn\fB] +.P +\fIvar\fB[\fIexpr1\fR SUBSEP \fIexpr2\fR SUBSEP ... \fRSUBSEP \fIexprn\fB]\fR +.fi +.P +.RE +.P +The application shall ensure that a multi-dimensioned +.IR index +used with the +.BR in +operator is parenthesized. The +.BR in +operator, which tests for the existence of a particular array element, +shall not cause that element to exist. Any other reference to a +nonexistent array element shall automatically create it. +.P +Comparisons (with the +.BR '<' , +.BR \(dq<=\(dq , +.BR \(dq!=\(dq , +.BR \(dq==\(dq , +.BR '>' , +and +.BR \(dq>=\(dq +operators) shall be made numerically if both operands are numeric, if +one is numeric and the other has a string value that is a numeric +string, or if one is numeric and the other has the uninitialized value. +Otherwise, operands shall be converted to strings as required and a +string comparison shall be made as follows: +.IP " *" 4 +For the +.BR \(dq!=\(dq +and +.BR \(dq==\(dq +operators, the strings should be compared to check if they are +identical but may be compared using the locale-specific collation +sequence to check if they collate equally. +.IP " *" 4 +For the other operators, the strings shall be compared using the +locale-specific collation sequence. +.P +The value of the comparison expression shall be 1 if the relation is +true, or 0 if the relation is false. +.SS "Variables and Special Variables" +.P +Variables can be used in an +.IR awk +program by referencing them. With the exception of function parameters +(see +.IR "User-Defined Functions"), +they are not explicitly declared. Function parameter names shall be +local to the function; all other variable names shall be global. The +same name shall not be used as both a function parameter name and as +the name of a function or a special +.IR awk +variable. The same name shall not be used both as a variable name with +global scope and as the name of a function. The same name shall not be +used within the same scope both as a scalar variable and as an array. +Uninitialized variables, including scalar variables, array elements, +and field variables, shall have an uninitialized value. An +uninitialized value shall have both a numeric value of zero and a +string value of the empty string. Evaluation of variables with an +uninitialized value, to either string or numeric, shall be determined +by the context in which they are used. +.P +Field variables shall be designated by a +.BR '$' +followed by a number or numerical expression. The effect of the field +number +.IR expression +evaluating to anything other than a non-negative integer is +unspecified; uninitialized variables or string values need not be +converted to numeric values in this context. New field variables can be +created by assigning a value to them. References to nonexistent fields +(that is, fields after $\fBNF\fP), shall evaluate to the uninitialized +value. Such references shall not create new fields. However, assigning +to a nonexistent field (for example, $(\fBNF\fP+2)=5) shall increase +the value of +.BR NF ; +create any intervening fields with the uninitialized value; and cause +the value of $0 to be recomputed, with the fields being separated by +the value of +.BR OFS . +Each field variable shall have a string value or an uninitialized value +when created. Field variables shall have the uninitialized value when +created from $0 using +.BR FS +and the variable does not contain any characters. If appropriate, the +field variable shall be considered a numeric string (see +.IR "Expressions in awk"). +.P +Implementations shall support the following other special variables +that are set by +.IR awk : +.IP "\fBARGC\fR" 10 +The number of elements in the +.BR ARGV +array. +.IP "\fBARGV\fR" 10 +An array of command line arguments, excluding options and the +.IR program +argument, numbered from zero to +.BR ARGC \-1. +.RS 10 +.P +The arguments in +.BR ARGV +can be modified or added to; +.BR ARGC +can be altered. As each input file ends, +.IR awk +shall treat the next non-null element of +.BR ARGV , +up to the current value of +.BR ARGC \-1, +inclusive, as the name of the next input file. Thus, setting an element +of +.BR ARGV +to null means that it shall not be treated as an input file. The name +.BR '\-' +indicates the standard input. If an argument matches the format of an +.IR assignment +operand, this argument shall be treated as an +.IR assignment +rather than a +.IR file +argument. +.RE +.IP "\fBCONVFMT\fR" 10 +The +.BR printf +format for converting numbers to strings (except for output statements, +where +.BR OFMT +is used); +.BR \(dq%.6g\(dq +by default. +.IP "\fBENVIRON\fR" 10 +An array representing the value of the environment, as described in the +.IR exec +functions defined in the System Interfaces volume of POSIX.1\(hy2017. The indices of the array shall be +strings consisting of the names of the environment variables, and the +value of each array element shall be a string consisting of the value +of that variable. If appropriate, the environment variable shall be +considered a +.IR "numeric string" +(see +.IR "Expressions in awk"); +the array element shall also have its numeric value. +.RS 10 +.P +In all cases where the behavior of +.IR awk +is affected by environment variables (including the environment of any +commands that +.IR awk +executes via the +.BR system +function or via pipeline redirections with the +.BR print +statement, the +.BR printf +statement, or the +.BR getline +function), the environment used shall be the environment at the time +.IR awk +began executing; it is implementation-defined whether any +modification of +.BR ENVIRON +affects this environment. +.RE +.IP "\fBFILENAME\fR" 10 +A pathname of the current input file. Inside a +.BR BEGIN +action the value is undefined. Inside an +.BR END +action the value shall be the name of the last input file processed. +.IP "\fBFNR\fR" 10 +The ordinal number of the current record in the current file. Inside a +.BR BEGIN +action the value shall be zero. Inside an +.BR END +action the value shall be the number of the last record processed in +the last file processed. +.IP "\fBFS\fR" 10 +Input field separator regular expression; a +<space> +by default. +.IP "\fBNF\fR" 10 +The number of fields in the current record. Inside a +.BR BEGIN +action, the use of +.BR NF +is undefined unless a +.BR getline +function without a +.IR var +argument is executed previously. Inside an +.BR END +action, +.BR NF +shall retain the value it had for the last record read, unless a +subsequent, redirected, +.BR getline +function without a +.IR var +argument is performed prior to entering the +.BR END +action. +.IP "\fBNR\fR" 10 +The ordinal number of the current record from the start of input. +Inside a +.BR BEGIN +action the value shall be zero. Inside an +.BR END +action the value shall be the number of the last record processed. +.IP "\fBOFMT\fR" 10 +The +.BR printf +format for converting numbers to strings in output statements (see +.IR "Output Statements"); +.BR \(dq%.6g\(dq +by default. The result of the conversion is unspecified if the value of +.BR OFMT +is not a floating-point format specification. +.IP "\fBOFS\fR" 10 +The +.BR print +statement output field separator; +<space> +by default. +.IP "\fBORS\fR" 10 +The +.BR print +statement output record separator; a +<newline> +by default. +.IP "\fBRLENGTH\fR" 10 +The length of the string matched by the +.BR match +function. +.IP "\fBRS\fR" 10 +The first character of the string value of +.BR RS +shall be the input record separator; a +<newline> +by default. If +.BR RS +contains more than one character, the results are unspecified. If +.BR RS +is null, then records are separated by sequences consisting of a +<newline> +plus one or more blank lines, leading or trailing blank lines shall not +result in empty records at the beginning or end of the input, and a +<newline> +shall always be a field separator, no matter what the value of +.BR FS +is. +.IP "\fBRSTART\fR" 10 +The starting position of the string matched by the +.BR match +function, numbering from 1. This shall always be equivalent to the +return value of the +.BR match +function. +.IP "\fBSUBSEP\fR" 10 +The subscript separator string for multi-dimensional arrays; the +default value is implementation-defined. +.SS "Regular Expressions" +.P +The +.IR awk +utility shall make use of the extended regular expression notation +(see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 9.4" ", " "Extended Regular Expressions") +except that it shall allow the use of C-language conventions +for escaping special characters within the EREs, as specified in the +table in the Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 5" ", " "File Format Notation" +(\c +.BR '\e\e' , +.BR '\ea' , +.BR '\eb' , +.BR '\ef' , +.BR '\en' , +.BR '\er' , +.BR '\et' , +.BR '\ev' ) +and the following table; these escape sequences shall be recognized +both inside and outside bracket expressions. Note that records need not +be separated by +<newline> +characters and string constants can contain +<newline> +characters, so even the +.BR \(dq\en\(dq +sequence is valid in +.IR awk +EREs. Using a +<slash> +character within an ERE requires the escaping shown in the following +table. +.br +.sp +.ce 1 +\fBTable 4-2: Escape Sequences in \fIawk\fP\fR +.ad l +.TS +center tab(@) box; +cB | cB | cB +cB | cB | cB +lf5 | lw(34) | lw(34). +Escape +Sequence@Description@Meaning +_ +\e"@T{ +<backslash> <quotation-mark> +T}@T{ +<quotation-mark> character +T} +_ +\e/@T{ +<backslash> <slash> +T}@T{ +<slash> character +T} +_ +\eddd@T{ +A +<backslash> +character followed by the longest sequence of one, two, or +three octal-digit characters (01234567). If all of the digits are 0 +(that is, representation of the NUL character), the behavior is +undefined. +T}@T{ +The character whose encoding is represented by the one, two, or +three-digit octal integer. Multi-byte characters require +multiple, concatenated escape sequences of this type, including the +leading +<backslash> +for each byte. +T} +_ +\ec@T{ +A +<backslash> +character followed by any character not described in this +table or in the table in the Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 5" ", " "File Format Notation" +(\c +.BR '\e\e' , +.BR '\ea' , +.BR '\eb' , +.BR '\ef' , +.BR '\en' , +.BR '\er' , +.BR '\et' , +.BR '\ev' ). +T}@Undefined +.TE +.ad b +.P +A regular expression can be matched against a specific field or string +by using one of the two regular expression matching operators, +.BR '\(ti' +and +.BR \(dq!\(ti\(dq . +These operators shall interpret their right-hand operand as a regular +expression and their left-hand operand as a string. If the regular +expression matches the string, the +.BR '\(ti' +expression shall evaluate to a value of 1, and the +.BR \(dq!\(ti\(dq +expression shall evaluate to a value of 0. (The regular expression +matching operation is as defined by the term matched in the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 9.1" ", " "Regular Expression Definitions", +where a match occurs on any part of the string unless the regular +expression is limited with the +<circumflex> +or +<dollar-sign> +special characters.) If the regular expression does not match the +string, the +.BR '\(ti' +expression shall evaluate to a value of 0, and the +.BR \(dq!\(ti\(dq +expression shall evaluate to a value of 1. If the right-hand operand is +any expression other than the lexical token +.BR ERE , +the string value of the expression shall be interpreted as an extended +regular expression, including the escape conventions described above. +Note that these same escape conventions shall also be applied in +determining the value of a string literal (the lexical token +.BR STRING ), +and thus shall be applied a second time when a string literal is used +in this context. +.P +When an +.BR ERE +token appears as an expression in any context other than as the +right-hand of the +.BR '\(ti' +or +.BR \(dq!\(ti\(dq +operator or as one of the built-in function arguments described below, +the value of the resulting expression shall be the equivalent of: +.sp +.RS 4 +.nf + +$0 \(ti /\fIere\fR/ +.fi +.P +.RE +.P +The +.IR ere +argument to the +.BR gsub , +.BR match , +.BR sub +functions, and the +.IR fs +argument to the +.BR split +function (see +.IR "String Functions") +shall be interpreted as extended regular expressions. These can be +either +.BR ERE +tokens or arbitrary expressions, and shall be interpreted in the same +manner as the right-hand side of the +.BR '\(ti' +or +.BR \(dq!\(ti\(dq +operator. +.P +An extended regular expression can be used to separate fields by assigning +a string containing the expression to the built-in variable +.BR FS , +either directly or as a consequence of using the +.BR \-F +.IR sepstring +option. +The default value of the +.BR FS +variable shall be a single +<space>. +The following describes +.BR FS +behavior: +.IP " 1." 4 +If +.BR FS +is a null string, the behavior is unspecified. +.IP " 2." 4 +If +.BR FS +is a single character: +.RS 4 +.IP " a." 4 +If +.BR FS +is +<space>, +skip leading and trailing +<blank> +and +<newline> +characters; fields shall be delimited by sets of one or more +<blank> +or +<newline> +characters. +.IP " b." 4 +Otherwise, if +.BR FS +is any other character +.IR c , +fields shall be delimited by each single occurrence of +.IR c . +.RE +.IP " 3." 4 +Otherwise, the string value of +.BR FS +shall be considered to be an extended regular expression. Each +occurrence of a sequence matching the extended regular expression shall +delimit fields. +.P +Except for the +.BR '\(ti' +and +.BR \(dq!\(ti\(dq +operators, and in the +.BR gsub , +.BR match , +.BR split , +and +.BR sub +built-in functions, ERE matching shall be based on input records; that +is, record separator characters (the first character of the value of +the variable +.BR RS , +<newline> +by default) cannot be embedded in the expression, and no expression +shall match the record separator character. If the record separator is +not +<newline>, +<newline> +characters embedded in the expression can be matched. For the +.BR '\(ti' +and +.BR \(dq!\(ti\(dq +operators, and in those four built-in functions, ERE matching shall be +based on text strings; that is, any character (including +<newline> +and the record separator) can be embedded in the pattern, and an +appropriate pattern shall match any character. However, in all +.IR awk +ERE matching, the use of one or more NUL characters in the pattern, +input record, or text string produces undefined results. +.SS "Patterns" +.P +A +.IR pattern +is any valid +.IR expression , +a range specified by two expressions separated by a comma, or one of the +two special patterns +.BR BEGIN +or +.BR END . +.SS "Special Patterns" +.P +The +.IR awk +utility shall recognize two special patterns, +.BR BEGIN +and +.BR END . +Each +.BR BEGIN +pattern shall be matched once and its associated action executed before +the first record of input is read\(emexcept possibly by use of the +.BR getline +function (see +.IR "Input/Output and General Functions") +in a prior +.BR BEGIN +action\(emand before command line assignment is done. Each +.BR END +pattern shall be matched once and its associated action executed after +the last record of input has been read. These two patterns shall have +associated actions. +.P +.BR BEGIN +and +.BR END +shall not combine with other patterns. Multiple +.BR BEGIN +and +.BR END +patterns shall be allowed. The actions associated with the +.BR BEGIN +patterns shall be executed in the order specified in the program, as +are the +.BR END +actions. An +.BR END +pattern can precede a +.BR BEGIN +pattern in a program. +.P +If an +.IR awk +program consists of only actions with the pattern +.BR BEGIN , +and the +.BR BEGIN +action contains no +.BR getline +function, +.IR awk +shall exit without reading its input when the last statement in the +last +.BR BEGIN +action is executed. If an +.IR awk +program consists of only actions with the pattern +.BR END +or only actions with the patterns +.BR BEGIN +and +.BR END , +the input shall be read before the statements in the +.BR END +actions are executed. +.SS "Expression Patterns" +.P +An expression pattern shall be evaluated as if it were an expression in +a Boolean context. If the result is true, the pattern shall be +considered to match, and the associated action (if any) shall be +executed. If the result is false, the action shall not be executed. +.SS "Pattern Ranges" +.P +A pattern range consists of two expressions separated by a comma; in +this case, the action shall be performed for all records between a +match of the first expression and the following match of the second +expression, inclusive. At this point, the pattern range can be repeated +starting at input records subsequent to the end of the matched range. +.SS "Actions" +.P +An action is a sequence of statements as shown in the grammar in +.IR "Grammar". +Any single statement can be replaced by a statement list enclosed in +curly braces. The application shall ensure that statements in a +statement list are separated by +<newline> +or +<semicolon> +characters. Statements in a statement list shall be executed sequentially +in the order that they appear. +.P +The +.IR expression +acting as the conditional in an +.BR if +statement shall be evaluated and if it is non-zero or non-null, the +following statement shall be executed; otherwise, if +.BR else +is present, the statement following the +.BR else +shall be executed. +.P +The +.BR if , +.BR while , +.BR do .\|.\|.\c +.BR while , +.BR for , +.BR break , +and +.BR continue +statements are based on the ISO\ C standard (see +.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"), +except that the Boolean expressions shall be treated as described in +.IR "Expressions in awk", +and except in the case of: +.sp +.RS 4 +.nf + +for (\fIvariable\fR in \fIarray\fR) +.fi +.P +.RE +.P +which shall iterate, assigning each +.IR index +of +.IR array +to +.IR variable +in an unspecified order. The results of adding new elements to +.IR array +within such a +.BR for +loop are undefined. If a +.BR break +or +.BR continue +statement occurs outside of a loop, the behavior is undefined. +.P +The +.BR delete +statement shall remove an individual array element. Thus, the following +code deletes an entire array: +.sp +.RS 4 +.nf + +for (index in array) + delete array[index] +.fi +.P +.RE +.P +The +.BR next +statement shall cause all further processing of the current input +record to be abandoned. The behavior is undefined if a +.BR next +statement appears or is invoked in a +.BR BEGIN +or +.BR END +action. +.P +The +.BR exit +statement shall invoke all +.BR END +actions in the order in which they occur in the program source and then +terminate the program without reading further input. An +.BR exit +statement inside an +.BR END +action shall terminate the program without further execution of +.BR END +actions. If an expression is specified in an +.BR exit +statement, its numeric value shall be the exit status of +.IR awk , +unless subsequent errors are encountered or a subsequent +.BR exit +statement with an expression is executed. +.SS "Output Statements" +.P +Both +.BR print +and +.BR printf +statements shall write to standard output by default. The output shall +be written to the location specified by +.IR output_redirection +if one is supplied, as follows: +.sp +.RS 4 +.nf + +> \fIexpression\fR +>> \fIexpression\fR +| \fIexpression\fR +.fi +.P +.RE +.P +In all cases, the +.IR expression +shall be evaluated to produce a string that is used as a pathname +into which to write (for +.BR '>' +or +.BR \(dq>>\(dq ) +or as a command to be executed (for +.BR '|' ). +Using the first two forms, if the file of that name is not currently +open, it shall be opened, creating it if necessary and using the first +form, truncating the file. The output then shall be appended to the +file. As long as the file remains open, subsequent calls in which +.IR expression +evaluates to the same string value shall simply append output to the +file. The file remains open until the +.BR close +function (see +.IR "Input/Output and General Functions") +is called with an expression that evaluates to the same string value. +.P +The third form shall write output onto a stream piped to the input of a +command. The stream shall be created if no stream is currently open +with the value of +.IR expression +as its command name. The stream created shall be equivalent to one +created by a call to the +\fIpopen\fR() +function defined in the System Interfaces volume of POSIX.1\(hy2017 with the value of +.IR expression +as the +.IR command +argument and a value of +.IR w +as the +.IR mode +argument. As long as the stream remains open, subsequent calls in which +.IR expression +evaluates to the same string value shall write output to the existing +stream. The stream shall remain open until the +.BR close +function (see +.IR "Input/Output and General Functions") +is called with an expression that evaluates to the same string value. +At that time, the stream shall be closed as if by a call to the +\fIpclose\fR() +function defined in the System Interfaces volume of POSIX.1\(hy2017. +.P +As described in detail by the grammar in +.IR "Grammar", +these output statements shall take a +<comma>-separated +list of +.IR expression s +referred to in the grammar by the non-terminal symbols +.BR expr_list , +.BR print_expr_list , +or +.BR print_expr_list_opt . +This list is referred to here as the +.IR "expression list" , +and each member is referred to as an +.IR "expression argument" . +.P +The +.BR print +statement shall write the value of each expression argument onto the +indicated output stream separated by the current output field separator +(see variable +.BR OFS +above), and terminated by the output record separator (see variable +.BR ORS +above). All expression arguments shall be taken as strings, being +converted if necessary; this conversion shall be as described in +.IR "Expressions in awk", +with the exception that the +.BR printf +format in +.BR OFMT +shall be used instead of the value in +.BR CONVFMT . +An empty expression list shall stand for the whole input record ($0). +.P +The +.BR printf +statement shall produce output based on a notation similar to the +File Format Notation used to describe file formats in this volume of POSIX.1\(hy2017 (see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 5" ", " "File Format Notation"). +Output shall be produced as specified with the first +.IR expression +argument as the string +.IR format +and subsequent +.IR expression +arguments as the strings +.IR arg1 +to +.IR argn , +inclusive, with the following exceptions: +.IP " 1." 4 +The +.IR format +shall be an actual character string rather than a graphical +representation. Therefore, it cannot contain empty character +positions. The +<space> +in the +.IR format +string, in any context other than a +.IR flag +of a conversion specification, shall be treated as an ordinary +character that is copied to the output. +.IP " 2." 4 +If the character set contains a +.BR ' ' +character and that character appears in the +.IR format +string, it shall be treated as an ordinary character that is copied to +the output. +.IP " 3." 4 +The +.IR "escape sequences" +beginning with a +<backslash> +character shall be treated as sequences of ordinary characters that are +copied to the output. Note that these same sequences shall be interpreted +lexically by +.IR awk +when they appear in literal strings, but they shall not be treated +specially by the +.BR printf +statement. +.IP " 4." 4 +A +.IR "field width" +or +.IR precision +can be specified as the +.BR '*' +character instead of a digit string. In this case the next argument +from the expression list shall be fetched and its numeric value taken +as the field width or precision. +.IP " 5." 4 +The implementation shall not precede or follow output from the +.BR d +or +.BR u +conversion specifier characters with +<blank> +characters not specified by the +.IR format +string. +.IP " 6." 4 +The implementation shall not precede output from the +.BR o +conversion specifier character with leading zeros not specified by the +.IR format +string. +.IP " 7." 4 +For the +.BR c +conversion specifier character: if the argument has a numeric value, the +character whose encoding is that value shall be output. If the value is +zero or is not the encoding of any character in the character set, the +behavior is undefined. If the argument does not have a numeric value, +the first character of the string value shall be output; if the string +does not contain any characters, the behavior is undefined. +.IP " 8." 4 +For each conversion specification that consumes an argument, the next +expression argument shall be evaluated. With the exception of the +.BR c +conversion specifier character, the value shall be converted (according +to the rules specified in +.IR "Expressions in awk") +to the appropriate type for the conversion specification. +.IP " 9." 4 +If there are insufficient expression arguments to satisfy all the +conversion specifications in the +.IR format +string, the behavior is undefined. +.IP 10. 4 +If any character sequence in the +.IR format +string begins with a +.BR '%' +character, but does not form a valid conversion specification, the +behavior is unspecified. +.P +Both +.BR print +and +.BR printf +can output at least +{LINE_MAX} +bytes. +.SS "Functions" +.P +The +.IR awk +language has a variety of built-in functions: arithmetic, string, +input/output, and general. +.SS "Arithmetic Functions" +.P +The arithmetic functions, except for +.BR int , +shall be based on the ISO\ C standard (see +.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"). +The behavior is undefined in cases where the ISO\ C standard specifies that an +error be returned or that the behavior is undefined. Although the +grammar (see +.IR "Grammar") +permits built-in functions to appear with no arguments or parentheses, +unless the argument or parentheses are indicated as optional in the +following list (by displaying them within the +.BR \(dq[]\(dq +brackets), such use is undefined. +.IP "\fBatan2\fR(\fIy\fR,\fIx\fR)" 10 +Return arctangent of \fIy\fP/\fIx\fR in radians in the range +[\-\(*p,\(*p]. +.IP "\fBcos\fR(\fIx\fR)" 10 +Return cosine of \fIx\fP, where \fIx\fP is in radians. +.IP "\fBsin\fR(\fIx\fR)" 10 +Return sine of \fIx\fP, where \fIx\fP is in radians. +.IP "\fBexp\fR(\fIx\fR)" 10 +Return the exponential function of \fIx\fP. +.IP "\fBlog\fR(\fIx\fR)" 10 +Return the natural logarithm of \fIx\fP. +.IP "\fBsqrt\fR(\fIx\fR)" 10 +Return the square root of \fIx\fP. +.IP "\fBint\fR(\fIx\fR)" 10 +Return the argument truncated to an integer. Truncation shall +be toward 0 when \fIx\fP>0. +.IP "\fBrand\fP(\|)" 10 +Return a random number \fIn\fP, such that 0\(<=\fIn\fP<1. +.IP "\fBsrand\fR(\fB[\fIexpr\fB]\fR)" 10 +Set the seed value for +.IR rand +to +.IR expr +or use the time of day if +.IR expr +is omitted. The previous seed value shall be returned. +.SS "String Functions" +.P +The string functions in the following list shall be supported. +Although the grammar (see +.IR "Grammar") +permits built-in functions to appear with no arguments or parentheses, +unless the argument or parentheses are indicated as optional in the +following list (by displaying them within the +.BR \(dq[]\(dq +brackets), such use is undefined. +.IP "\fBgsub\fR(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\fB]\fR)" 10 +.br +Behave like +.BR sub +(see below), except that it shall replace all occurrences of the +regular expression (like the +.IR ed +utility global substitute) in $0 or in the +.IR in +argument, when specified. +.IP "\fBindex\fR(\fIs\fR,\ \fIt\fR)" 10 +Return the position, in characters, numbering from 1, in string +.IR s +where string +.IR t +first occurs, or zero if it does not occur at all. +.IP "\fBlength[\fR(\fB[\fIs\fB]\fR)\fB]\fR" 10 +Return the length, in characters, of its argument taken as a string, or +of the whole record, $0, if there is no argument. +.IP "\fBmatch\fR(\fIs\fR,\ \fIere\fR)" 10 +Return the position, in characters, numbering from 1, in string +.IR s +where the extended regular expression +.IR ere +occurs, or zero if it does not occur at all. RSTART shall be set to the +starting position (which is the same as the returned value), zero if no +match is found; RLENGTH shall be set to the length of the matched +string, \-1 if no match is found. +.IP "\fBsplit\fR(\fIs\fR,\ \fIa\fB[\fR,\ \fIfs\ \fB]\fR)" 10 +.br +Split the string +.IR s +into array elements +.IR a [1], +.IR a [2], +\&.\|.\|., +.IR a [ n ], +and return +.IR n . +All elements of the array shall be deleted before the split is +performed. The separation shall be done with the ERE +.IR fs +or with the field separator +.BR FS +if +.IR fs +is not given. Each array element shall have a string value when created +and, if appropriate, the array element shall be considered a numeric +string (see +.IR "Expressions in awk"). +The effect of a null string as the value of +.IR fs +is unspecified. +.IP "\fBsprintf\fR(\fIfmt\fR,\ \fIexpr\fR,\ \fIexpr\fR,\ .\|.\|.)" 10 +.br +Format the expressions according to the +.BR printf +format given by +.IR fmt +and return the resulting string. +.IP "\fBsub(\fIere\fR,\ \fIrepl\fB[\fR,\ \fIin\ \fB]\fR)" 10 +.br +Substitute the string +.IR repl +in place of the first instance of the extended regular expression +.IR ERE +in string +.IR in +and return the number of substitutions. An +<ampersand> +(\c +.BR '&' ) +appearing in the string +.IR repl +shall be replaced by the string from +.IR in +that matches the ERE. An +<ampersand> +preceded with a +<backslash> +shall be interpreted as the literal +<ampersand> +character. An occurrence of two consecutive +<backslash> +characters shall be interpreted as just a single literal +<backslash> +character. Any other occurrence of a +<backslash> +(for example, preceding any other character) shall be treated as a +literal +<backslash> +character. Note that if +.IR repl +is a string literal (the lexical token +.BR STRING ; +see +.IR "Grammar"), +the handling of the +<ampersand> +character occurs after any lexical processing, including any lexical +<backslash>-escape +sequence processing. If +.IR in +is specified and it is not an lvalue (see +.IR "Expressions in awk"), +the behavior is undefined. If +.IR in +is omitted, +.IR awk +shall use the current record ($0) in its place. +.IP "\fBsubstr\fR(\fIs\fR,\ \fIm\fB[\fR,\ \fIn\ \fB]\fR)" 10 +.br +Return the at most +.IR n -character +substring of +.IR s +that begins at position +.IR m , +numbering from 1. If +.IR n +is omitted, or if +.IR n +specifies more characters than are left in the string, the length of +the substring shall be limited by the length of the string +.IR s . +.IP "\fBtolower\fR(\fIs\fR)" 10 +Return a string based on the string +.IR s . +Each character in +.IR s +that is an uppercase letter specified to have a +.BR tolower +mapping by the +.IR LC_CTYPE +category of the current locale shall be replaced in the returned string +by the lowercase letter specified by the mapping. Other characters in +.IR s +shall be unchanged in the returned string. +.IP "\fBtoupper\fR(\fIs\fR)" 10 +Return a string based on the string +.IR s . +Each character in +.IR s +that is a lowercase letter specified to have a +.BR toupper +mapping by the +.IR LC_CTYPE +category of the current locale is replaced in the returned string by +the uppercase letter specified by the mapping. Other characters in +.IR s +are unchanged in the returned string. +.P +All of the preceding functions that take +.IR ERE +as a parameter expect a pattern or a string valued expression that is a +regular expression as defined in +.IR "Regular Expressions". +.SS "Input/Output and General Functions" +.P +The input/output and general functions are: +.IP "\fBclose\fR(\fIexpression\fR)" 10 +.br +Close the file or pipe opened by a +.BR print +or +.BR printf +statement or a call to +.BR getline +with the same string-valued +.IR expression . +The limit on the number of open +.IR expression +arguments is implementation-defined. If the close was successful, the +function shall return zero; otherwise, it shall return non-zero. +.IP "\fIexpression\ |\ \fBgetline\ [\fIvar\fB]\fR" 10 +.br +Read a record of input from a stream piped from the output of a +command. The stream shall be created if no stream is currently open +with the value of +.IR expression +as its command name. The stream created shall be equivalent to one +created by a call to the +\fIpopen\fR() +function with the value of +.IR expression +as the +.IR command +argument and a value of +.IR r +as the +.IR mode +argument. As long as the stream remains open, subsequent calls in which +.IR expression +evaluates to the same string value shall read subsequent records from +the stream. The stream shall remain open until the +.BR close +function is called with an expression that evaluates to the same string +value. At that time, the stream shall be closed as if by a call to the +\fIpclose\fR() +function. If +.IR var +is omitted, $0 and +.BR NF +shall be set; otherwise, +.IR var +shall be set and, if appropriate, it shall be considered a numeric +string (see +.IR "Expressions in awk"). +.RS 10 +.P +The +.BR getline +operator can form ambiguous constructs when there are unparenthesized +operators (including concatenate) to the left of the +.BR '|' +(to the beginning of the expression containing +.BR getline ). +In the context of the +.BR '$' +operator, +.BR '|' +shall behave as if it had a lower precedence than +.BR '$' . +The result of evaluating other operators is unspecified, and conforming +applications shall parenthesize properly all such usages. +.RE +.IP "\fBgetline\fR" 10 +Set $0 to the next input record from the current input file. This form +of +.BR getline +shall set the +.BR NF , +.BR NR , +and +.BR FNR +variables. +.IP "\fBgetline\ \fIvar\fR" 10 +Set variable +.IR var +to the next input record from the current input file and, if +appropriate, +.IR var +shall be considered a numeric string (see +.IR "Expressions in awk"). +This form of +.BR getline +shall set the +.BR FNR +and +.BR NR +variables. +.IP "\fBgetline\ \fB[\fIvar\fB]\ \fR<\ \fIexpression\fR" 10 +.br +Read the next record of input from a named file. The +.IR expression +shall be evaluated to produce a string that is used as a pathname. +If the file of that name is not currently open, it shall be opened. As +long as the stream remains open, subsequent calls in which +.IR expression +evaluates to the same string value shall read subsequent records from +the file. The file shall remain open until the +.BR close +function is called with an expression that evaluates to the same string +value. If +.IR var +is omitted, $0 and +.BR NF +shall be set; otherwise, +.IR var +shall be set and, if appropriate, it shall be considered a numeric +string (see +.IR "Expressions in awk"). +.RS 10 +.P +The +.BR getline +operator can form ambiguous constructs when there are unparenthesized +binary operators (including concatenate) to the right of the +.BR '<' +(up to the end of the expression containing the +.BR getline ). +The result of evaluating such a construct is unspecified, and conforming +applications shall parenthesize properly all such usages. +.RE +.IP "\fBsystem\fR(\fIexpression\fR)" 10 +.br +Execute the command given by +.IR expression +in a manner equivalent to the +\fIsystem\fR() +function defined in the System Interfaces volume of POSIX.1\(hy2017 and return the exit status of the +command. +.P +All forms of +.BR getline +shall return 1 for successful input, zero for end-of-file, and \-1 +for an error. +.P +Where strings are used as the name of a file or pipeline, the +application shall ensure that the strings are textually identical. The +terminology ``same string value'' implies that ``equivalent strings'', +even those that differ only by +<space> +characters, represent different files. +.SS "User-Defined Functions" +.P +The +.IR awk +language also provides user-defined functions. Such functions can be +defined as: +.sp +.RS 4 +.nf + +function \fIname\fR(\fB[\fIparameter\fR, ...\fB]\fR) { \fIstatements\fR } +.fi +.P +.RE +.P +A function can be referred to anywhere in an +.IR awk +program; in particular, its use can precede its definition. The scope +of a function is global. +.P +Function parameters, if present, can be either scalars or arrays; the +behavior is undefined if an array name is passed as a parameter that +the function uses as a scalar, or if a scalar expression is passed as a +parameter that the function uses as an array. Function parameters shall +be passed by value if scalar and by reference if array name. +.P +The number of parameters in the function definition need not match the +number of parameters in the function call. Excess formal parameters can +be used as local variables. If fewer arguments are supplied in a +function call than are in the function definition, the extra parameters +that are used in the function body as scalars shall evaluate to the +uninitialized value until they are otherwise initialized, and the extra +parameters that are used in the function body as arrays shall be +treated as uninitialized arrays where each element evaluates to the +uninitialized value until otherwise initialized. +.P +When invoking a function, no white space can be placed between the +function name and the opening parenthesis. Function calls can be nested +and recursive calls can be made upon functions. Upon return from any +nested or recursive function call, the values of all of the calling +function's parameters shall be unchanged, except for array parameters +passed by reference. The +.BR return +statement can be used to return a value. If a +.BR return +statement appears outside of a function definition, the behavior is +undefined. +.P +In the function definition, +<newline> +characters shall be optional before the opening brace and after the +closing brace. Function definitions can appear anywhere in the program +where a +.IR pattern-action +pair is allowed. +.SS "Grammar" +.P +The grammar in this section and the lexical conventions in the +following section shall together describe the syntax for +.IR awk +programs. The general conventions for this style of grammar are +described in +.IR "Section 1.3" ", " "Grammar Conventions". +A valid program can be represented as the non-terminal symbol +.IR program +in the grammar. This formal syntax shall take precedence over the +preceding text syntax description. +.sp +.RS 4 +.nf + +%token NAME NUMBER STRING ERE +%token FUNC_NAME /* Name followed by \(aq(\(aq without white space. */ +.P +/* Keywords */ +%token Begin End +/* \(aqBEGIN\(aq \(aqEND\(aq */ +.P +%token Break Continue Delete Do Else +/* \(aqbreak\(aq \(aqcontinue\(aq \(aqdelete\(aq \(aqdo\(aq \(aqelse\(aq */ +.P +%token Exit For Function If In +/* \(aqexit\(aq \(aqfor\(aq \(aqfunction\(aq \(aqif\(aq \(aqin\(aq */ +.P +%token Next Print Printf Return While +/* \(aqnext\(aq \(aqprint\(aq \(aqprintf\(aq \(aqreturn\(aq \(aqwhile\(aq */ +.P +/* Reserved function names */ +%token BUILTIN_FUNC_NAME + /* One token for the following: + * atan2 cos sin exp log sqrt int rand srand + * gsub index length match split sprintf sub + * substr tolower toupper close system + */ +%token GETLINE + /* Syntactically different from other built-ins. */ +.P +/* Two-character tokens. */ +%token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN +/* \(aq+=\(aq \(aq-=\(aq \(aq*=\(aq \(aq/=\(aq \(aq%=\(aq \(aq\(ha=\(aq */ +.P +%token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND +/* \(aq||\(aq \(aq&&\(aq \(aq!\^\(ti\(aq \(aq==\(aq \(aq<=\(aq \(aq>=\(aq \(aq!=\(aq \(aq++\(aq \(aq--\(aq \(aq>>\(aq */ +.P +/* One-character tokens. */ +%token \(aq{\(aq \(aq}\(aq \(aq(\(aq \(aq)\(aq \(aq[\(aq \(aq]\(aq \(aq,\(aq \(aq;\(aq NEWLINE +%token \(aq+\(aq \(aq-\(aq \(aq*\(aq \(aq%\(aq \(aq\(ha\(aq \(aq!\(aq \(aq>\(aq \(aq<\(aq \(aq|\(aq \(aq?\(aq \(aq:\(aq \(aq\(ti\(aq \(aq$\(aq \(aq=\(aq +.P +%start program +%% +.P +program : item_list + | item_list item + ; +.P +item_list : /* empty */ + | item_list item terminator + ; +.P +item : action + | pattern action + | normal_pattern + | Function NAME \(aq(\(aq param_list_opt \(aq)\(aq + newline_opt action + | Function FUNC_NAME \(aq(\(aq param_list_opt \(aq)\(aq + newline_opt action + ; +.P +param_list_opt : /* empty */ + | param_list + ; +.P +param_list : NAME + | param_list \(aq,\(aq NAME + ; +.P +pattern : normal_pattern + | special_pattern + ; +.P +normal_pattern : expr + | expr \(aq,\(aq newline_opt expr + ; +.P +special_pattern : Begin + | End + ; +.P +action : \(aq{\(aq newline_opt \(aq}\(aq + | \(aq{\(aq newline_opt terminated_statement_list \(aq}\(aq + | \(aq{\(aq newline_opt unterminated_statement_list \(aq}\(aq + ; +.P +terminator : terminator NEWLINE + | \(aq;\(aq + | NEWLINE + ; +.P +terminated_statement_list : terminated_statement + | terminated_statement_list terminated_statement + ; +.P +unterminated_statement_list : unterminated_statement + | terminated_statement_list unterminated_statement + ; +.P +terminated_statement : action newline_opt + | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement + | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement + Else newline_opt terminated_statement + | While \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement + | For \(aq(\(aq simple_statement_opt \(aq;\(aq + expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt + terminated_statement + | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt + terminated_statement + | \(aq;\(aq newline_opt + | terminatable_statement NEWLINE newline_opt + | terminatable_statement \(aq;\(aq newline_opt + ; +.P +unterminated_statement : terminatable_statement + | If \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement + | If \(aq(\(aq expr \(aq)\(aq newline_opt terminated_statement + Else newline_opt unterminated_statement + | While \(aq(\(aq expr \(aq)\(aq newline_opt unterminated_statement + | For \(aq(\(aq simple_statement_opt \(aq;\(aq + expr_opt \(aq;\(aq simple_statement_opt \(aq)\(aq newline_opt + unterminated_statement + | For \(aq(\(aq NAME In NAME \(aq)\(aq newline_opt + unterminated_statement + ; +.P +terminatable_statement : simple_statement + | Break + | Continue + | Next + | Exit expr_opt + | Return expr_opt + | Do newline_opt terminated_statement While \(aq(\(aq expr \(aq)\(aq + ; +.P +simple_statement_opt : /* empty */ + | simple_statement + ; +.P +simple_statement : Delete NAME \(aq[\(aq expr_list \(aq]\(aq + | expr + | print_statement + ; +.P +print_statement : simple_print_statement + | simple_print_statement output_redirection + ; +.P +simple_print_statement : Print print_expr_list_opt + | Print \(aq(\(aq multiple_expr_list \(aq)\(aq + | Printf print_expr_list + | Printf \(aq(\(aq multiple_expr_list \(aq)\(aq + ; +.P +output_redirection : \(aq>\(aq expr + | APPEND expr + | \(aq|\(aq expr + ; +.P +expr_list_opt : /* empty */ + | expr_list + ; +.P +expr_list : expr + | multiple_expr_list + ; +.P +multiple_expr_list : expr \(aq,\(aq newline_opt expr + | multiple_expr_list \(aq,\(aq newline_opt expr + ; +.P +expr_opt : /* empty */ + | expr + ; +.P +expr : unary_expr + | non_unary_expr + ; +.P +unary_expr : \(aq+\(aq expr + | \(aq-\(aq expr + | unary_expr \(aq\(ha\(aq expr + | unary_expr \(aq*\(aq expr + | unary_expr \(aq/\(aq expr + | unary_expr \(aq%\(aq expr + | unary_expr \(aq+\(aq expr + | unary_expr \(aq-\(aq expr + | unary_expr non_unary_expr + | unary_expr \(aq<\(aq expr + | unary_expr LE expr + | unary_expr NE expr + | unary_expr EQ expr + | unary_expr \(aq>\(aq expr + | unary_expr GE expr + | unary_expr \(aq\(ti\(aq expr + | unary_expr NO_MATCH expr + | unary_expr In NAME + | unary_expr AND newline_opt expr + | unary_expr OR newline_opt expr + | unary_expr \(aq?\(aq expr \(aq:\(aq expr + | unary_input_function + ; +.P +non_unary_expr : \(aq(\(aq expr \(aq)\(aq + | \(aq!\(aq expr + | non_unary_expr \(aq\(ha\(aq expr + | non_unary_expr \(aq*\(aq expr + | non_unary_expr \(aq/\(aq expr + | non_unary_expr \(aq%\(aq expr + | non_unary_expr \(aq+\(aq expr + | non_unary_expr \(aq-\(aq expr + | non_unary_expr non_unary_expr + | non_unary_expr \(aq<\(aq expr + | non_unary_expr LE expr + | non_unary_expr NE expr + | non_unary_expr EQ expr + | non_unary_expr \(aq>\(aq expr + | non_unary_expr GE expr + | non_unary_expr \(aq\(ti\(aq expr + | non_unary_expr NO_MATCH expr + | non_unary_expr In NAME + | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME + | non_unary_expr AND newline_opt expr + | non_unary_expr OR newline_opt expr + | non_unary_expr \(aq?\(aq expr \(aq:\(aq expr + | NUMBER + | STRING + | lvalue + | ERE + | lvalue INCR + | lvalue DECR + | INCR lvalue + | DECR lvalue + | lvalue POW_ASSIGN expr + | lvalue MOD_ASSIGN expr + | lvalue MUL_ASSIGN expr + | lvalue DIV_ASSIGN expr + | lvalue ADD_ASSIGN expr + | lvalue SUB_ASSIGN expr + | lvalue \(aq=\(aq expr + | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq + /* no white space allowed before \(aq(\(aq */ + | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq + | BUILTIN_FUNC_NAME + | non_unary_input_function + ; +.P +print_expr_list_opt : /* empty */ + | print_expr_list + ; +.P +print_expr_list : print_expr + | print_expr_list \(aq,\(aq newline_opt print_expr + ; +.P +print_expr : unary_print_expr + | non_unary_print_expr + ; +.P +unary_print_expr : \(aq+\(aq print_expr + | \(aq-\(aq print_expr + | unary_print_expr \(aq\(ha\(aq print_expr + | unary_print_expr \(aq*\(aq print_expr + | unary_print_expr \(aq/\(aq print_expr + | unary_print_expr \(aq%\(aq print_expr + | unary_print_expr \(aq+\(aq print_expr + | unary_print_expr \(aq-\(aq print_expr + | unary_print_expr non_unary_print_expr + | unary_print_expr \(aq\(ti\(aq print_expr + | unary_print_expr NO_MATCH print_expr + | unary_print_expr In NAME + | unary_print_expr AND newline_opt print_expr + | unary_print_expr OR newline_opt print_expr + | unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr + ; +.P +non_unary_print_expr : \(aq(\(aq expr \(aq)\(aq + | \(aq!\(aq print_expr + | non_unary_print_expr \(aq\(ha\(aq print_expr + | non_unary_print_expr \(aq*\(aq print_expr + | non_unary_print_expr \(aq/\(aq print_expr + | non_unary_print_expr \(aq%\(aq print_expr + | non_unary_print_expr \(aq+\(aq print_expr + | non_unary_print_expr \(aq-\(aq print_expr + | non_unary_print_expr non_unary_print_expr + | non_unary_print_expr \(aq\(ti\(aq print_expr + | non_unary_print_expr NO_MATCH print_expr + | non_unary_print_expr In NAME + | \(aq(\(aq multiple_expr_list \(aq)\(aq In NAME + | non_unary_print_expr AND newline_opt print_expr + | non_unary_print_expr OR newline_opt print_expr + | non_unary_print_expr \(aq?\(aq print_expr \(aq:\(aq print_expr + | NUMBER + | STRING + | lvalue + | ERE + | lvalue INCR + | lvalue DECR + | INCR lvalue + | DECR lvalue + | lvalue POW_ASSIGN print_expr + | lvalue MOD_ASSIGN print_expr + | lvalue MUL_ASSIGN print_expr + | lvalue DIV_ASSIGN print_expr + | lvalue ADD_ASSIGN print_expr + | lvalue SUB_ASSIGN print_expr + | lvalue \(aq=\(aq print_expr + | FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq + /* no white space allowed before \(aq(\(aq */ + | BUILTIN_FUNC_NAME \(aq(\(aq expr_list_opt \(aq)\(aq + | BUILTIN_FUNC_NAME + ; +.P +lvalue : NAME + | NAME \(aq[\(aq expr_list \(aq]\(aq + | \(aq$\(aq expr + ; +.P +non_unary_input_function : simple_get + | simple_get \(aq<\(aq expr + | non_unary_expr \(aq|\(aq simple_get + ; +.P +unary_input_function : unary_expr \(aq|\(aq simple_get + ; +.P +simple_get : GETLINE + | GETLINE lvalue + ; +.P +newline_opt : /* empty */ + | newline_opt NEWLINE + ; +.fi +.P +.RE +.P +This grammar has several ambiguities that shall be resolved as +follows: +.IP " *" 4 +Operator precedence and associativity shall be as described in +.IR "Table 4-1, Expressions in Decreasing Precedence in \fIawk\fP". +.IP " *" 4 +In case of ambiguity, an +.BR else +shall be associated with the most immediately preceding +.BR if +that would satisfy the grammar. +.IP " *" 4 +In some contexts, a +<slash> +(\c +.BR '/' ) +that is used to surround an ERE could also be the division operator. +This shall be resolved in such a way that wherever the division +operator could appear, a +<slash> +is assumed to be the division operator. (There is no unary division +operator.) +.P +Each expression in an +.IR awk +program shall conform to the precedence and associativity rules, even +when this is not needed to resolve an ambiguity. For example, because +.BR '$' +has higher precedence than +.BR '++' , +the string +.BR \(dq$x++--\(dq +is not a valid +.IR awk +expression, even though it is unambiguously parsed by the grammar as +.BR \(dq$(x++)--\(dq . +.P +One convention that might not be obvious from the formal grammar is +where +<newline> +characters are acceptable. There are several obvious placements such as +terminating a statement, and a +<backslash> +can be used to escape +<newline> +characters between any lexical tokens. In addition, +<newline> +characters without +<backslash> +characters can follow a comma, an open brace, logical AND operator (\c +.BR \(dq&&\(dq ), +logical OR operator (\c +.BR \(dq||\(dq ), +the +.BR do +keyword, the +.BR else +keyword, and the closing parenthesis of an +.BR if , +.BR for , +or +.BR while +statement. For example: +.sp +.RS 4 +.nf + +{ print $1, + $2 } +.fi +.P +.RE +.SS "Lexical Conventions" +.P +The lexical conventions for +.IR awk +programs, with respect to the preceding grammar, shall be as follows: +.IP " 1." 4 +Except as noted, +.IR awk +shall recognize the longest possible token or delimiter beginning at a +given point. +.IP " 2." 4 +A comment shall consist of any characters beginning with the +<number-sign> +character and terminated by, but excluding the next occurrence of, a +<newline>. +Comments shall have no effect, except to delimit lexical tokens. +.IP " 3." 4 +The +<newline> +shall be recognized as the token +.BR NEWLINE . +.IP " 4." 4 +A +<backslash> +character immediately followed by a +<newline> +shall have no effect. +.IP " 5." 4 +The token +.BR STRING +shall represent a string constant. A string constant shall begin with +the character +.BR '\&"' . +Within a string constant, a +<backslash> +character shall be considered to begin an escape sequence as specified +in the table in the Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 5" ", " "File Format Notation" +(\c +.BR '\e\e' , +.BR '\ea' , +.BR '\eb' , +.BR '\ef' , +.BR '\en' , +.BR '\er' , +.BR '\et' , +.BR '\ev' ). +In addition, the escape sequences in +.IR "Table 4-2, Escape Sequences in \fIawk\fP" +shall be recognized. A +<newline> +shall not occur within a string constant. A string constant shall be +terminated by the first unescaped occurrence of the character +.BR '\&"' +after the one that begins the string constant. The value of the string +shall be the sequence of all unescaped characters and values of escape +sequences between, but not including, the two delimiting +.BR '\&"' +characters. +.IP " 6." 4 +The token +.BR ERE +represents an extended regular expression constant. An ERE constant +shall begin with the +<slash> +character. Within an ERE constant, a +<backslash> +character shall be considered to begin an escape sequence as +specified in the table in the Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 5" ", " "File Format Notation". +In addition, the escape sequences in +.IR "Table 4-2, Escape Sequences in \fIawk\fP" +shall be recognized. The application shall ensure that a +<newline> +does not occur within an ERE constant. An ERE constant shall be +terminated by the first unescaped occurrence of the +<slash> +character after the one that begins the ERE constant. The extended regular +expression represented by the ERE constant shall be the sequence of all +unescaped characters and values of escape sequences between, but not +including, the two delimiting +<slash> +characters. +.IP " 7." 4 +A +<blank> +shall have no effect, except to delimit lexical tokens or within +.BR STRING +or +.BR ERE +tokens. +.IP " 8." 4 +The token +.BR NUMBER +shall represent a numeric constant. Its form and numeric value shall +either be equivalent to the +.BR decimal-floating-constant +token as specified by the ISO\ C standard, or it shall be a sequence of decimal +digits and shall be evaluated as an integer constant in decimal. In +addition, implementations may accept numeric constants with the form +and numeric value equivalent to the +.BR hexadecimal-constant +and +.BR hexadecimal-floating-constant +tokens as specified by the ISO\ C standard. +.RS 4 +.P +If the value is too large or too small to be representable (see +.IR "Section 1.1.2" ", " "Concepts Derived from the ISO C Standard"), +the behavior is undefined. +.RE +.IP " 9." 4 +A sequence of underscores, digits, and alphabetics from the portable +character set (see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 6.1" ", " "Portable Character Set"), +beginning with an +<underscore> +or alphabetic character, shall be considered a word. +.IP 10. 4 +The following words are keywords that shall be recognized as individual +tokens; the name of the token is the same as the keyword: +.TS +tab(@); +lw(0.6i)eB leB leB leB leB leB. +T{ +.nf +BEGIN +break +continue +T}@T{ +.nf +delete +do +else +T}@T{ +.nf +END +exit +for +T}@T{ +.nf +function +getline +if +T}@T{ +.nf +in +next +print +T}@T{ +.nf +printf +return +while +T} +.TE +.IP 11. 4 +The following words are names of built-in functions and shall be +recognized as the token +.BR BUILTIN_FUNC_NAME : +.TS +tab(@); +lw(0.6i)eB leB leB leB leB leB. +T{ +.nf +atan2 +close +cos +exp +T}@T{ +.nf +gsub +index +int +length +T}@T{ +.nf +log +match +rand +sin +T}@T{ +.nf +split +sprintf +sqrt +srand +T}@T{ +.nf +sub +substr +system +tolower +T}@T{ +.nf +toupper +.fi +T} +.TE +.RS 4 +.P +The above-listed keywords and names of built-in functions are +considered reserved words. +.RE +.IP 12. 4 +The token +.BR NAME +shall consist of a word that is not a keyword or a name of a built-in +function and is not followed immediately (without any delimiters) by +the +.BR '(' +character. +.IP 13. 4 +The token +.BR FUNC_NAME +shall consist of a word that is not a keyword or a name of a built-in +function, followed immediately (without any delimiters) by the +.BR '(' +character. The +.BR '(' +character shall not be included as part of the token. +.IP 14. 4 +The following two-character sequences shall be recognized as the named +tokens: +.TS +box center tab(@); +cB | cB | cB | cB +lB | cf5 | lB | cf5. +Token Name@Sequence@Token Name@Sequence +_ +ADD_ASSIGN@+=@NO_MATCH@!~ +SUB_ASSIGN@\-=@EQ@== +MUL_ASSIGN@*=@LE@<= +DIV_ASSIGN@/=@GE@>= +MOD_ASSIGN@%=@NE@!= +POW_ASSIGN@^=@INCR@++ +OR@||@DECR@\-\|\- +AND@&&@APPEND@>> +.TE +.IP 15. 4 +The following single characters shall be recognized as tokens whose +names are the character: +.RS 4 +.sp +.RS 4 +.nf + +<newline> { } ( ) [ ] , ; + - * % \(ha ! > < | ? : \(ti $ = +.fi +.P +.RE +.RE +.P +There is a lexical ambiguity between the token +.BR ERE +and the tokens +.BR '/' +and +.BR DIV_ASSIGN . +When an input sequence begins with a +<slash> +character in any syntactic context where the token +.BR '/' +or +.BR DIV_ASSIGN +could appear as the next token in a valid program, the longer of those +two tokens that can be recognized shall be recognized. In any other +syntactic context where the token +.BR ERE +could appear as the next token in a valid program, the token +.BR ERE +shall be recognized. +.SH "EXIT STATUS" +The following exit values shall be returned: +.IP "\00" 6 +All input files were processed successfully. +.IP >0 6 +An error occurred. +.P +The exit status can be altered within the program by using an +.BR exit +expression. +.SH "CONSEQUENCES OF ERRORS" +If any +.IR file +operand is specified and the named file cannot be accessed, +.IR awk +shall write a diagnostic message to standard error and terminate +without any further action. +.P +If the program specified by either the +.IR program +operand or a +.IR progfile +operand is not a valid +.IR awk +program (as specified in the EXTENDED DESCRIPTION section), the +behavior is undefined. +.LP +.IR "The following sections are informative." +.SH "APPLICATION USAGE" +The +.BR index , +.BR length , +.BR match , +and +.BR substr +functions should not be confused with similar functions in the ISO\ C standard; +the +.IR awk +versions deal with characters, while the ISO\ C standard deals with bytes. +.P +Because the concatenation operation is represented by adjacent +expressions rather than an explicit operator, it is often necessary to +use parentheses to enforce the proper evaluation precedence. +.P +When using +.IR awk +to process pathnames, it is recommended that LC_ALL, or at least +LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment, +since pathnames can contain byte sequences that do not form valid +characters in some locales, in which case the utility's behavior would +be undefined. In the POSIX locale each byte is a valid single-byte +character, and therefore this problem is avoided. +.P +On implementations where the +.BR \(dq==\(dq +operator checks if strings collate equally, applications needing to +check whether strings are identical can use: +.sp +.RS 4 +.nf + +length(a) == length(b) && index(a,b) == 1 +.fi +.P +.RE +.P +On implementations where the +.BR \(dq==\(dq +operator checks if strings are identical, applications needing to +check whether strings collate equally can use: +.sp +.RS 4 +.nf + +a <= b && a >= b +.fi +.P +.RE +.SH EXAMPLES +The +.IR awk +program specified in the command line is most easily specified within +single-quotes (for example, \(aq\fIprogram\fP\(aq) for applications using +.IR sh , +because +.IR awk +programs commonly contain characters that are special to the shell, +including double-quotes. In the cases where an +.IR awk +program contains single-quote characters, it is usually easiest to +specify most of the program as strings within single-quotes +concatenated by the shell with quoted single-quote characters. For +example: +.sp +.RS 4 +.nf + +awk \(aq/\(aq\e\(aq\(aq/ { print "quote:", $0 }\(aq +.fi +.P +.RE +.P +prints all lines from the standard input containing a single-quote +character, prefixed with +.IR quote :. +.P +The following are examples of simple +.IR awk +programs: +.IP " 1." 4 +Write to the standard output all input lines for which field 3 is +greater than 5: +.RS 4 +.sp +.RS 4 +.nf + +$3 > 5 +.fi +.P +.RE +.RE +.IP " 2." 4 +Write every tenth line: +.RS 4 +.sp +.RS 4 +.nf + +(NR % 10) == 0 +.fi +.P +.RE +.RE +.IP " 3." 4 +Write any line with a substring matching the regular expression: +.RS 4 +.sp +.RS 4 +.nf + +/(G|D)(2[0-9][[:alpha:]]*)/ +.fi +.P +.RE +.RE +.IP " 4." 4 +Print any line with a substring containing a +.BR 'G' +or +.BR 'D' , +followed by a sequence of digits and characters. This example uses +character classes +.BR digit +and +.BR alpha +to match language-independent digit and alphabetic characters +respectively: +.RS 4 +.sp +.RS 4 +.nf + +/(G|D)([[:digit:][:alpha:]]*)/ +.fi +.P +.RE +.RE +.IP " 5." 4 +Write any line in which the second field matches the regular expression +and the fourth field does not: +.RS 4 +.sp +.RS 4 +.nf + +$2 \(ti /xyz/ && $4 !\(ti /xyz/ +.fi +.P +.RE +.RE +.IP " 6." 4 +Write any line in which the second field contains a +<backslash>: +.RS 4 +.sp +.RS 4 +.nf + +$2 \(ti /\e\e/ +.fi +.P +.RE +.RE +.IP " 7." 4 +Write any line in which the second field contains a +<backslash>. +Note that +<backslash>-escapes +are interpreted twice; once in lexical processing of the string and once +in processing the regular expression: +.RS 4 +.sp +.RS 4 +.nf + +$2 \(ti "\e\e\e\e" +.fi +.P +.RE +.RE +.IP " 8." 4 +Write the second to the last and the last field in each line. Separate +the fields by a +<colon>: +.RS 4 +.sp +.RS 4 +.nf + +{OFS=":";print $(NF-1), $NF} +.fi +.P +.RE +.RE +.IP " 9." 4 +Write the line number and number of fields in each line. The three +strings representing the line number, the +<colon>, +and the number of fields are concatenated and that string is written to +standard output: +.RS 4 +.sp +.RS 4 +.nf + +{print NR ":" NF} +.fi +.P +.RE +.RE +.IP 10. 4 +Write lines longer than 72 characters: +.RS 4 +.sp +.RS 4 +.nf + +length($0) > 72 +.fi +.P +.RE +.RE +.IP 11. 4 +Write the first two fields in opposite order separated by +.BR OFS : +.RS 4 +.sp +.RS 4 +.nf + +{ print $2, $1 } +.fi +.P +.RE +.RE +.IP 12. 4 +Same, with input fields separated by a +<comma> +or +<space> +and +<tab> +characters, or both: +.RS 4 +.sp +.RS 4 +.nf + +BEGIN { FS = ",[ \et]*|[ \et]+" } + { print $2, $1 } +.fi +.P +.RE +.RE +.IP 13. 4 +Add up the first column, print sum, and average: +.RS 4 +.sp +.RS 4 +.nf + + {s += $1 } +END {print "sum is ", s, " average is", s/NR} +.fi +.P +.RE +.RE +.IP 14. 4 +Write fields in reverse order, one per line (many lines out for each +line in): +.RS 4 +.sp +.RS 4 +.nf + +{ for (i = NF; i > 0; --i) print $i } +.fi +.P +.RE +.RE +.IP 15. 4 +Write all lines between occurrences of the strings +.BR start +and +.BR stop : +.RS 4 +.sp +.RS 4 +.nf + +/start/, /stop/ +.fi +.P +.RE +.RE +.IP 16. 4 +Write all lines whose first field is different from the previous one: +.RS 4 +.sp +.RS 4 +.nf + +$1 != prev { print; prev = $1 } +.fi +.P +.RE +.RE +.IP 17. 4 +Simulate +.IR echo : +.RS 4 +.sp +.RS 4 +.nf + +BEGIN { + for (i = 1; i < ARGC; ++i) + printf("%s%s", ARGV[i], i==ARGC-1?"\en":" ") +} +.fi +.P +.RE +.RE +.IP 18. 4 +Write the path prefixes contained in the +.IR PATH +environment variable, one per line: +.RS 4 +.sp +.RS 4 +.nf + +BEGIN { + n = split (ENVIRON["PATH"], path, ":") + for (i = 1; i <= n; ++i) + print path[i] +} +.fi +.P +.RE +.RE +.IP 19. 4 +If there is a file named +.BR input +containing page headers of the form: +Page # +.RS 4 +.P +and a file named +.BR program +that contains: +.sp +.RS 4 +.nf + +/Page/ { $2 = n++; } + { print } +.fi +.P +.RE +then the command line: +.sp +.RS 4 +.nf + +awk -f program n=5 input +.fi +.P +.RE +.P +prints the file +.BR input , +filling in page numbers starting at 5. +.RE +.SH RATIONALE +This description is based on the new +.IR awk , +``nawk'', (see the referenced \fIThe AWK Programming Language\fP), which introduced a number of new features to +the historical +.IR awk : +.IP " 1." 4 +New keywords: +.BR delete , +.BR do , +.BR function , +.BR return +.IP " 2." 4 +New built-in functions: +.BR atan2 , +.BR close , +.BR cos , +.BR gsub , +.BR match , +.BR rand , +.BR sin , +.BR srand , +.BR sub , +.BR system +.IP " 3." 4 +New predefined variables: +.BR FNR , +.BR ARGC , +.BR ARGV , +.BR RSTART , +.BR RLENGTH , +.BR SUBSEP +.IP " 4." 4 +New expression operators: +.BR ? , +.BR : , +.BR , , +.BR ^ +.IP " 5." 4 +The +.BR FS +variable and the third argument to +.BR split , +now treated as extended regular expressions. +.IP " 6." 4 +The operator precedence, changed to more closely match the C language. +Two examples of code that operate differently are: +.RS 4 +.sp +.RS 4 +.nf + +while ( n /= 10 > 1) ... +if (!"wk" \(ti /bwk/) ... +.fi +.P +.RE +.RE +.P +Several features have been added based on newer implementations of +.IR awk : +.IP " *" 4 +Multiple instances of +.BR \-f +.IR progfile +are permitted. +.IP " *" 4 +The new option +.BR \-v +.IR assignment. +.IP " *" 4 +The new predefined variable +.BR ENVIRON . +.IP " *" 4 +New built-in functions +.BR toupper +and +.BR tolower . +.IP " *" 4 +More formatting capabilities are added to +.BR printf +to match the ISO\ C standard. +.P +Earlier versions of this standard required implementations to +support multiple adjacent +<semicolon>s, +lines with one or more +<semicolon> +before a rule (\c +.IR pattern-action +pairs), and lines with only +<semicolon>(s). +These are not required by this standard and are considered poor +programming practice, but can be accepted by an implementation of +.IR awk +as an extension. +.P +The overall +.IR awk +syntax has always been based on the C language, with a few features +from the shell command language and other sources. Because of this, it +is not completely compatible with any other language, which has caused +confusion for some users. It is not the intent of the standard +developers to address such issues. A few relatively minor changes +toward making the language more compatible with the ISO\ C standard were +made; most of these changes are based on similar changes in recent +implementations, as described above. There remain several C-language +conventions that are not in +.IR awk . +One of the notable ones is the +<comma> +operator, which is commonly used to specify multiple expressions in the +C language +.BR for +statement. Also, there are various places where +.IR awk +is more restrictive than the C language regarding the type of +expression that can be used in a given context. These limitations are +due to the different features that the +.IR awk +language does provide. +.P +Regular expressions in +.IR awk +have been extended somewhat from historical implementations to make +them a pure superset of extended regular expressions, as defined by +POSIX.1\(hy2008 (see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 9.4" ", " "Extended Regular Expressions"). +The main extensions are internationalization +features and interval expressions. Historical implementations of +.IR awk +have long supported +<backslash>-escape +sequences as an extension to extended regular expressions, and +this extension has been retained despite inconsistency with other +utilities. The number of escape sequences recognized in both extended +regular expressions and strings has varied (generally increasing with +time) among implementations. The set specified by POSIX.1\(hy2008 includes most +sequences known to be supported by popular implementations and by the +ISO\ C standard. One sequence that is not supported is hexadecimal value escapes +beginning with +.BR '\ex' . +This would allow values expressed in more than 9 bits to be used within +.IR awk +as in the ISO\ C standard. However, because this syntax has a non-deterministic +length, it does not permit the subsequent character to be a hexadecimal +digit. This limitation can be dealt with in the C language by the use +of lexical string concatenation. In the +.IR awk +language, concatenation could also be a solution for strings, but not +for extended regular expressions (either lexical ERE tokens or strings +used dynamically as regular expressions). Because of this limitation, +the feature has not been added to POSIX.1\(hy2008. +.P +When a string variable is used in a context where an extended regular +expression normally appears (where the lexical token ERE is used in the +grammar) the string does not contain the literal +<slash> +characters. +.P +Some versions of +.IR awk +allow the form: +.sp +.RS 4 +.nf + +func name(args, ... ) { statements } +.fi +.P +.RE +.P +This has been deprecated by the authors of the language, who asked that +it not be specified. +.P +Historical implementations of +.IR awk +produce an error if a +.BR next +statement is executed in a +.BR BEGIN +action, and cause +.IR awk +to terminate if a +.BR next +statement is executed in an +.BR END +action. This behavior has not been documented, and it was not believed +that it was necessary to standardize it. +.P +The specification of conversions between string and numeric values is +much more detailed than in the documentation of historical +implementations or in the referenced \fIThe AWK Programming Language\fP. Although most of the behavior is +designed to be intuitive, the details are necessary to ensure +compatible behavior from different implementations. This is especially +important in relational expressions since the types of the operands +determine whether a string or numeric comparison is performed. From the +perspective of an application developer, it is usually sufficient to +expect intuitive behavior and to force conversions (by adding zero or +concatenating a null string) when the type of an expression does not +obviously match what is needed. The intent has been to specify +historical practice in almost all cases. The one exception is that, in +historical implementations, variables and constants maintain both +string and numeric values after their original value is converted by +any use. This means that referencing a variable or constant can have +unexpected side-effects. For example, with historical implementations +the following program: +.sp +.RS 4 +.nf + +{ + a = "+2" + b = 2 + if (NR % 2) + c = a + b + if (a == b) + print "numeric comparison" + else + print "string comparison" +} +.fi +.P +.RE +.P +would perform a numeric comparison (and output numeric comparison) for +each odd-numbered line, but perform a string comparison (and output +string comparison) for each even-numbered line. POSIX.1\(hy2008 ensures that +comparisons will be numeric if necessary. With historical +implementations, the following program: +.sp +.RS 4 +.nf + +BEGIN { + OFMT = "%e" + print 3.14 + OFMT = "%f" + print 3.14 +} +.fi +.P +.RE +.P +would output +.BR \(dq3.140000e+00\(dq +twice, because in the second +.BR print +statement the constant +.BR \(dq3.14\(dq +would have a string value from the previous conversion. POSIX.1\(hy2008 requires +that the output of the second +.BR print +statement be +.BR \(dq3.140000\(dq . +The behavior of historical implementations was seen as too unintuitive +and unpredictable. +.P +It was pointed out that with the rules contained in early drafts, the +following script would print nothing: +.sp +.RS 4 +.nf + +BEGIN { + y[1.5] = 1 + OFMT = "%e" + print y[1.5] +} +.fi +.P +.RE +.P +Therefore, a new variable, +.BR CONVFMT , +was introduced. The +.BR OFMT +variable is now restricted to affecting output conversions of numbers +to strings and +.BR CONVFMT +is used for internal conversions, such as comparisons or array +indexing. The default value is the same as that for +.BR OFMT , +so unless a program changes +.BR CONVFMT +(which no historical program would do), it will receive the historical +behavior associated with internal string conversions. +.P +The POSIX +.IR awk +lexical and syntactic conventions are specified more formally than in +other sources. Again the intent has been to specify historical +practice. One convention that may not be obvious from the formal +grammar as in other verbal descriptions is where +<newline> +characters are acceptable. There are several obvious placements such as +terminating a statement, and a +<backslash> +can be used to escape +<newline> +characters between any lexical tokens. In addition, +<newline> +characters without +<backslash> +characters can follow a comma, an open brace, a logical AND operator (\c +.BR \(dq&&\(dq ), +a logical OR operator (\c +.BR \(dq||\(dq ), +the +.BR do +keyword, the +.BR else +keyword, and the closing parenthesis of an +.BR if , +.BR for , +or +.BR while +statement. For example: +.sp +.RS 4 +.nf + +{ print $1, + $2 } +.fi +.P +.RE +.P +The requirement that +.IR awk +add a trailing +<newline> +to the program argument text is to simplify the grammar, making it +match a text file in form. There is no way for an application or test +suite to determine whether a literal +<newline> +is added or whether +.IR awk +simply acts as if it did. +.P +POSIX.1\(hy2008 requires several changes from historical implementations in order +to support internationalization. Probably the most subtle of these is +the use of the decimal-point character, defined by the +.IR LC_NUMERIC +category of the locale, in representations of floating-point numbers. +This locale-specific character is used in recognizing numeric input, in +converting between strings and numeric values, and in formatting +output. However, regardless of locale, the +<period> +character (the decimal-point character of the POSIX locale) is the +decimal-point character recognized in processing +.IR awk +programs (including assignments in command line arguments). This is +essentially the same convention as the one used in the ISO\ C standard. The +difference is that the C language includes the +\fIsetlocale\fR() +function, which permits an application to modify its locale. Because of +this capability, a C application begins executing with its locale set +to the C locale, and only executes in the environment-specified locale +after an explicit call to +\fIsetlocale\fR(). +However, adding such an elaborate new feature to the +.IR awk +language was seen as inappropriate for POSIX.1\(hy2008. It is possible to execute +an +.IR awk +program explicitly in any desired locale by setting the environment in +the shell. +.P +The undefined behavior resulting from NULs in extended regular +expressions allows future extensions for the GNU +.IR gawk +program to process binary data. +.P +The behavior in the case of invalid +.IR awk +programs (including lexical, syntactic, and semantic errors) is +undefined because it was considered overly limiting on implementations +to specify. In most cases such errors can be expected to produce a +diagnostic and a non-zero exit status. However, some implementations +may choose to extend the language in ways that make use of certain +invalid constructs. Other invalid constructs might be deemed worthy of +a warning, but otherwise cause some reasonable behavior. Still other +constructs may be very difficult to detect in some implementations. +Also, different implementations might detect a given error during an +initial parsing of the program (before reading any input files) while +others might detect it when executing the program after reading some +input. Implementors should be aware that diagnosing errors as early as +possible and producing useful diagnostics can ease debugging of +applications, and thus make an implementation more usable. +.P +The unspecified behavior from using multi-character +.BR RS +values is to allow possible future extensions based on extended regular +expressions used for record separators. Historical implementations take +the first character of the string and ignore the others. +.P +Unspecified behavior when +.IR split (\c +.IR string ,\c +.IR array ,\c +<null>) +is used is to allow a proposed future extension that would split up a +string into an array of individual characters. +.P +In the context of the +.BR getline +function, equally good arguments for different precedences of the +.BR | +and +.BR < +operators can be made. Historical practice has been that: +.sp +.RS 4 +.nf + +getline < "a" "b" +.fi +.P +.RE +.P +is parsed as: +.sp +.RS 4 +.nf + +( getline < "a" ) "b" +.fi +.P +.RE +.P +although many would argue that the intent was that the file +.BR ab +should be read. However: +.sp +.RS 4 +.nf + +getline < "x" + 1 +.fi +.P +.RE +.P +parses as: +.sp +.RS 4 +.nf + +getline < ( "x" + 1 ) +.fi +.P +.RE +.P +Similar problems occur with the +.BR | +version of +.BR getline , +particularly in combination with +.BR $ . +For example: +.sp +.RS 4 +.nf + +$"echo hi" | getline +.fi +.P +.RE +.P +(This situation is particularly problematic when used in a +.BR print +statement, where the +.BR |getline +part might be a redirection of the +.BR print .) +.P +Since in most cases such constructs are not (or at least should not) be +used (because they have a natural ambiguity for which there is no +conventional parsing), the meaning of these constructs has been made +explicitly unspecified. (The effect is that a conforming application that +runs into the problem must parenthesize to resolve the ambiguity.) +There appeared to be few if any actual uses of such constructs. +.P +Grammars can be written that would cause an error under these +circumstances. Where backwards-compatibility is not a large +consideration, implementors may wish to use such grammars. +.P +Some historical implementations have allowed some built-in functions to +be called without an argument list, the result being a default argument +list chosen in some ``reasonable'' way. Use of +.BR length +as a synonym for +.BR "length($0)" +is the only one of these forms that is thought to be widely known or +widely used; this particular form is documented in various places (for +example, most historical +.IR awk +reference pages, although not in the referenced \fIThe AWK Programming Language\fP) as legitimate practice. +With this exception, default argument lists have always been +undocumented and vaguely defined, and it is not at all clear how (or +if) they should be generalized to user-defined functions. They add no +useful functionality and preclude possible future extensions that might +need to name functions without calling them. Not standardizing them +seems the simplest course. The standard developers considered that +.BR length +merited special treatment, however, since it has been documented in the +past and sees possibly substantial use in historical programs. +Accordingly, this usage has been made legitimate, but Issue\ 5 +removed the obsolescent marking for XSI-conforming implementations +and many otherwise conforming applications depend on this feature. +.P +In +.BR sub +and +.BR gsub , +if +.IR repl +is a string literal (the lexical token +.BR STRING ), +then two consecutive +<backslash> +characters should be used in the string to ensure a single +<backslash> +will precede the +<ampersand> +when the resultant string is passed to the function. (For example, +to specify one literal +<ampersand> +in the replacement string, use +.BR gsub (\c +.BR ERE , +.BR \(dq\e\e&\(dq ).) +.P +Historically, the only special character in the +.IR repl +argument of +.BR sub +and +.BR gsub +string functions was the +<ampersand> +(\c +.BR '&' ) +character and preceding it with the +<backslash> +character was used to turn off its special meaning. +.P +The description in the ISO\ POSIX\(hy2:\|1993 standard introduced behavior such that the +<backslash> +character was another special character and it was unspecified whether +there were any other special characters. This description introduced +several portability problems, some of which are described below, and so +it has been replaced with the more historical description. Some of the +problems include: +.IP " *" 4 +Historically, to create the replacement string, a script could use +.BR gsub (\c +.BR ERE , +.BR \(dq\e\e&\(dq ), +but with the ISO\ POSIX\(hy2:\|1993 standard wording, it was necessary to use +.BR gsub (\c +.BR ERE , +.BR \(dq\e\e\e\e&\(dq ). +The +<backslash> +characters are doubled here because all string literals are subject to +lexical analysis, which would reduce each pair of +<backslash> +characters to a single +<backslash> +before being passed to +.BR gsub . +.IP " *" 4 +Since it was unspecified what the special characters were, for portable +scripts to guarantee that characters are printed literally, each +character had to be preceded with a +<backslash>. +(For example, a portable script had to use +.BR gsub (\c +.BR ERE , +.BR \(dq\e\eh\e\ei\(dq ) +to produce a replacement string of +.BR \(dqhi\(dq .) +.P +The description for comparisons in the ISO\ POSIX\(hy2:\|1993 standard did not properly describe +historical practice because of the way numeric strings are compared as +numbers. The current rules cause the following code: +.sp +.RS 4 +.nf + +if (0 == "000") + print "strange, but true" +else + print "not true" +.fi +.P +.RE +.P +to do a numeric comparison, causing the +.BR if +to succeed. It should be intuitively obvious that this is incorrect +behavior, and indeed, no historical implementation of +.IR awk +actually behaves this way. +.P +To fix this problem, the definition of +.IR "numeric string" +was enhanced to include only those values obtained from specific +circumstances (mostly external sources) where it is not possible to +determine unambiguously whether the value is intended to be a string or +a numeric. +.P +Variables that are assigned to a numeric string shall also be treated +as a numeric string. (For example, the notion of a numeric string can +be propagated across assignments.) In comparisons, all variables having +the uninitialized value are to be treated as a numeric operand +evaluating to the numeric value zero. +.P +Uninitialized variables include all types of variables including +scalars, array elements, and fields. The definition of an uninitialized +value in +.IR "Variables and Special Variables" +is necessary to describe the value placed on uninitialized variables +and on fields that are valid (for example, +.BR < +.BR $NF ) +but have no characters in them and to describe how these variables are +to be used in comparisons. A valid field, such as +.BR $1 , +that has no characters in it can be obtained from an input line of +.BR \(dq\et\et\(dq +when +.BR FS= \c +.BR '\et' . +Historically, the comparison (\c +.BR $1< 10) +was done numerically after evaluating +.BR $1 +to the value zero. +.P +The phrase ``.\|.\|. also shall have the numeric value of the numeric +string'' was removed from several sections of the ISO\ POSIX\(hy2:\|1993 standard because is +specifies an unnecessary implementation detail. It is not necessary for +POSIX.1\(hy2008 to specify that these objects be assigned two different values. +It is only necessary to specify that these objects may evaluate to two +different values depending on context. +.P +Historical implementations of +.IR awk +did not parse hexadecimal integer or floating constants like +.BR \(dq0xa\(dq +and +.BR \(dq0xap0\(dq . +Due to an oversight, the 2001 through 2004 editions of this standard +required support for hexadecimal floating constants. This was due to +the reference to +\fIatof\fR(). +This version of the standard allows but does not require implementations +to use +\fIatof\fR() +and includes a description of how floating-point numbers are recognized +as an alternative to match historic behavior. The intent of this change +is to allow implementations to recognize floating-point constants +according to either the ISO/IEC\ 9899:\|1990 standard or ISO/IEC\ 9899:\|1999 standard, and to allow (but not require) +implementations to recognize hexadecimal integer constants. +.P +Historical implementations of +.IR awk +did not support floating-point infinities and NaNs in +.IR "numeric strings" ; +e.g., +.BR \(dq-INF\(dq +and +.BR \(dqNaN\(dq . +However, implementations that use the +\fIatof\fR() +or +\fIstrtod\fR() +functions to do the conversion picked up support for these values if they +used a ISO/IEC\ 9899:\|1999 standard version of the function instead of a ISO/IEC\ 9899:\|1990 standard version. Due to +an oversight, the 2001 through 2004 editions of this standard did not +allow support for infinities and NaNs, but in this revision support is +allowed (but not required). This is a silent change to the behavior of +.IR awk +programs; for example, in the POSIX locale the expression: +.sp +.RS 4 +.nf + +("-INF" + 0 < 0) +.fi +.P +.RE +.P +formerly had the value 0 because +.BR \(dq-INF\(dq +converted to 0, but now it may have the value 0 or 1. +.SH "FUTURE DIRECTIONS" +A future version of this standard may require the +.BR \(dq!=\(dq +and +.BR \(dq==\(dq +operators to perform string comparisons by checking if the strings are +identical (and not by checking if they collate equally). +.SH "SEE ALSO" +.IR "Section 1.3" ", " "Grammar Conventions", +.IR "\fIgrep\fR\^", +.IR "\fIlex\fR\^", +.IR "\fIsed\fR\^" +.P +The Base Definitions volume of POSIX.1\(hy2017, +.IR "Chapter 5" ", " "File Format Notation", +.IR "Section 6.1" ", " "Portable Character Set", +.IR "Chapter 8" ", " "Environment Variables", +.IR "Chapter 9" ", " "Regular Expressions", +.IR "Section 12.2" ", " "Utility Syntax Guidelines" +.P +The System Interfaces volume of POSIX.1\(hy2017, +.IR "\fIatof\fR\^(\|)", +.IR "\fIexec\fR\^", +.IR "\fIisspace\fR\^(\|)", +.IR "\fIpopen\fR\^(\|)", +.IR "\fIsetlocale\fR\^(\|)", +.IR "\fIstrtod\fR\^(\|)" +.\" +.SH COPYRIGHT +Portions of this text are reprinted and reproduced in electronic form +from IEEE Std 1003.1-2017, Standard for Information Technology +-- Portable Operating System Interface (POSIX), The Open Group Base +Specifications Issue 7, 2018 Edition, +Copyright (C) 2018 by the Institute of +Electrical and Electronics Engineers, Inc and The Open Group. +In the event of any discrepancy between this version and the original IEEE and +The Open Group Standard, the original IEEE and The Open Group Standard +is the referee document. The original Standard can be obtained online at +http://www.opengroup.org/unix/online.html . +.PP +Any typographical or formatting errors that appear +in this page are most likely +to have been introduced during the conversion of the source files to +man page format. To report such errors, see +https://www.kernel.org/doc/man-pages/reporting_bugs.html . |