diff options
Diffstat (limited to 'upstream/mageia-cauldron/man1p/sort.1p')
-rw-r--r-- | upstream/mageia-cauldron/man1p/sort.1p | 853 |
1 files changed, 853 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man1p/sort.1p b/upstream/mageia-cauldron/man1p/sort.1p new file mode 100644 index 00000000..19b3183b --- /dev/null +++ b/upstream/mageia-cauldron/man1p/sort.1p @@ -0,0 +1,853 @@ +'\" et +.TH SORT "1P" 2017 "IEEE/The Open Group" "POSIX Programmer's Manual" +.\" +.SH PROLOG +This manual page is part of the POSIX Programmer's Manual. +The Linux implementation of this interface may differ (consult +the corresponding Linux manual page for details of Linux behavior), +or the interface may not be implemented on Linux. +.\" +.SH NAME +sort +\(em sort, merge, or sequence check text files +.SH SYNOPSIS +.LP +.nf +sort \fB[\fR-m\fB] [\fR-o \fIoutput\fB] [\fR-bdfinru\fB] [\fR-t \fIchar\fB] [\fR-k \fIkeydef\fB]\fR... \fB[\fIfile\fR...\fB]\fR +.P +sort \fB[\fR-c|-C\fB] [\fR-bdfinru\fB] [\fR-t \fIchar\fB] [\fR-k \fIkeydef\fB] [\fIfile\fB]\fR +.fi +.SH DESCRIPTION +The +.IR sort +utility shall perform one of the following functions: +.IP " 1." 4 +Sort lines of all the named files together and write the result to +the specified output. +.IP " 2." 4 +Merge lines of all the named (presorted) files together and write the +result to the specified output. +.IP " 3." 4 +Check that a single input file is correctly presorted. +.P +Comparisons shall be based on one or more sort keys extracted from each +line of input (or, if no sort keys are specified, the entire line up +to, but not including, the terminating +<newline>), +and shall be performed using the collating sequence of the current +locale. If this collating sequence does not have a total ordering of +all characters (see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 7.3.2" ", " "LC_COLLATE"), +any lines of input that collate equally should be further compared +byte-by-byte using the collating sequence for the POSIX locale. +.SH OPTIONS +The +.IR sort +utility shall conform to the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 12.2" ", " "Utility Syntax Guidelines", +except for Guideline 9, and the +.BR \-k +.IR keydef +option should follow the +.BR \-b , +.BR \-d , +.BR \-f , +.BR \-i , +.BR \-n , +and +.BR \-r +options. In addition, +.BR '\(pl' +may be recognized as an option delimiter as well as +.BR '\-' . +.P +The following options shall be supported: +.IP "\fB\-c\fP" 10 +Check that the single input file is ordered as specified by the +arguments and the collating sequence of the current locale. Output +shall not be sent to standard output. The exit code shall indicate +whether or not disorder was detected or an error occurred. If +disorder (or, with +.BR \-u , +a duplicate key) is detected, a warning message shall be sent to +standard error indicating where the disorder or duplicate key +was found. +.IP "\fB\-C\fP" 10 +Same as +.BR \-c , +except that a warning message shall not be sent to standard error +if disorder or, with +.BR \-u , +a duplicate key is detected. +.IP "\fB\-m\fP" 10 +Merge only; the input file shall be assumed to be already sorted. +.IP "\fB\-o\ \fIoutput\fR" 10 +Specify the name of an output file to be used instead of the standard +output. This file can be the same as one of the input +.IR file s. +.IP "\fB\-u\fP" 10 +Unique: suppress all but one in each set of lines having equal keys. +If used with the +.BR \-c +option, check that there are no lines with duplicate keys, in addition +to checking that the input file is sorted. +.P +The following options shall override the default ordering rules. When +ordering options appear independent of any key field specifications, +the requested field ordering rules shall be applied globally to all +sort keys. When attached to a specific key (see +.BR \-k ), +the specified ordering options shall override all global ordering +options for that key. +.IP "\fB\-d\fP" 10 +Specify that only +<blank> +characters and alphanumeric characters, according to the current +setting of +.IR LC_CTYPE , +shall be significant in comparisons. The behavior is undefined for a +sort key to which +.BR \-i +or +.BR \-n +also applies. +.IP "\fB\-f\fP" 10 +Consider all lowercase characters that have uppercase equivalents, +according to the current setting of +.IR LC_CTYPE , +to be the uppercase equivalent for the purposes of comparison. +.IP "\fB\-i\fP" 10 +Ignore all characters that are non-printable, according to the current +setting of +.IR LC_CTYPE . +The behavior is undefined for a sort key for which +.BR \-n +also applies. +.IP "\fB\-n\fP" 10 +Restrict the sort key to an initial numeric string, consisting of +optional +<blank> +characters, optional +<hyphen-minus> +character, and zero or more digits with an +optional radix character and thousands separators (as defined in the +current locale), which shall be sorted by arithmetic value. An empty +digit string shall be treated as zero. Leading zeros and signs on zeros +shall not affect ordering. +.IP "\fB\-r\fP" 10 +Reverse the sense of comparisons. +.P +The treatment of field separators can be altered using the options: +.IP "\fB\-b\fP" 10 +Ignore leading +<blank> +characters when determining the starting and ending positions of a +restricted sort key. If the +.BR \-b +option is specified before the first +.BR \-k +option, it shall be applied to all +.BR \-k +options. Otherwise, the +.BR \-b +option can be attached independently to each +.BR \-k +.IR field_start +or +.IR field_end +option-argument (see below). +.IP "\fB\-t\ \fIchar\fR" 10 +Use +.IR char +as the field separator character; +.IR char +shall not be considered to be part of a field (although it can be +included in a sort key). Each occurrence of +.IR char +shall be significant (for example, <\fIchar\fR><\fIchar\fR> delimits an +empty field). If +.BR \-t +is not specified, +<blank> +characters shall be used as default field separators; each maximal +non-empty sequence of +<blank> +characters that follows a non-\c +<blank> +shall be a field separator. +.P +Sort keys can be specified using the options: +.IP "\fB\-k\ \fIkeydef\fR" 10 +The +.IR keydef +argument is a restricted sort key field definition. The format of this +definition is: +.RS 10 +.sp +.RS 4 +.nf + +\fIfield_start\fB[\fItype\fB][\fR,\fIfield_end\fB[\fItype\fB]]\fR +.fi +.P +.RE +.P +where +.IR field_start +and +.IR field_end +define a key field restricted to a portion of the line (see the +EXTENDED DESCRIPTION section), and +.IR type +is one or more modifiers from the list of characters +.BR 'b' , +.BR 'd' , +.BR 'f' , +.BR 'i' , +.BR 'n' , +.BR 'r' . +The +.BR 'b' +modifier shall behave like the +.BR \-b +option, but shall apply only to the +.IR field_start +or +.IR field_end +to which it is attached. The other modifiers shall behave like the +corresponding options, but shall apply only to the key field to which +they are attached; they shall have this effect if specified with +.IR field_start , +.IR field_end , +or both. If any modifier is attached to a +.IR field_start +or to a +.IR field_end , +no option shall apply to either. Implementations shall support at +least nine occurrences of the +.BR \-k +option, which shall be significant in command line order. If no +.BR \-k +option is specified, a default sort key of the entire line shall be +used. +.P +When there are multiple key fields, later keys shall be compared only +after all earlier keys compare equal. Except when the +.BR \-u +option is specified, lines that otherwise compare equal shall be +ordered as if none of the options +.BR \-d , +.BR \-f , +.BR \-i , +.BR \-n , +or +.BR \-k +were present (but with +.BR \-r +still in effect, if it was specified) and with all bytes in the lines +significant to the comparison. The order in which lines that still +compare equal are written is unspecified. +.RE +.SH OPERANDS +The following operand shall be supported: +.IP "\fIfile\fR" 10 +A pathname of a file to be sorted, merged, or checked. If no +.IR file +operands are specified, or if a +.IR file +operand is +.BR '\-' , +the standard input shall be used. If +.IR sort +encounters an error when opening or reading a +.IR file +operand, it may exit without writing any output to standard output or +processing later operands. +.SH STDIN +The standard input shall be used only if no +.IR file +operands are specified, or if a +.IR file +operand is +.BR '\-' . +See the INPUT FILES section. +.SH "INPUT FILES" +The input files shall be text files, except that the +.IR sort +utility shall add a +<newline> +to the end of a file ending with an incomplete last line. +.SH "ENVIRONMENT VARIABLES" +The following environment variables shall affect the execution of +.IR sort : +.IP "\fILANG\fP" 10 +Provide a default value for the internationalization variables that are +unset or null. (See the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 8.2" ", " "Internationalization Variables" +for the precedence of internationalization variables used to determine +the values of locale categories.) +.IP "\fILC_ALL\fP" 10 +If set to a non-empty string value, override the values of all the +other internationalization variables. +.IP "\fILC_COLLATE\fP" 10 +.br +Determine the locale for ordering rules. +.IP "\fILC_CTYPE\fP" 10 +Determine the locale for the interpretation of sequences of bytes of +text data as characters (for example, single-byte as opposed to +multi-byte characters in arguments and input files) and the behavior of +character classification for the +.BR \-b , +.BR \-d , +.BR \-f , +.BR \-i , +and +.BR \-n +options. +.IP "\fILC_MESSAGES\fP" 10 +.br +Determine the locale that should be used to affect the format and +contents of diagnostic messages written to standard error. +.IP "\fILC_NUMERIC\fP" 10 +.br +Determine the locale for the definition of the radix character and +thousands separator for the +.BR \-n +option. +.IP "\fINLSPATH\fP" 10 +Determine the location of message catalogs for the processing of +.IR LC_MESSAGES . +.SH "ASYNCHRONOUS EVENTS" +Default. +.SH STDOUT +Unless the +.BR \-o +or +.BR \-c +options are in effect, the standard output shall contain the sorted +input. +.SH STDERR +The standard error shall be used for diagnostic messages. When +.BR \-c +is specified, if disorder is detected (or if +.BR \-u +is also specified and a duplicate key is detected), a message shall +be written to the standard error which identifies the input line at +which disorder (or a duplicate key) was detected. A warning +message about correcting an incomplete last line of an input file +may be generated, but need not affect the final exit status. +.SH "OUTPUT FILES" +If the +.BR \-o +option is in effect, the sorted input shall be written to the file +.IR output . +.SH "EXTENDED DESCRIPTION" +The notation: +.sp +.RS 4 +.nf + +-k \fIfield_start\fB[\fItype\fB][\fR,\fIfield_end\fB[\fItype\fB]]\fR +.fi +.P +.RE +.P +shall define a key field that begins at +.IR field_start +and ends at +.IR field_end +inclusive, unless +.IR field_start +falls beyond the end of the line or after +.IR field_end , +in which case the key field is empty. A missing +.IR field_end +shall mean the last character of the line. +.P +A field comprises a maximal sequence of non-separating characters and, +in the absence of option +.BR \-t , +any preceding field separator. +.P +The +.IR field_start +portion of the +.IR keydef +option-argument shall have the form: +.sp +.RS 4 +.nf + +\fIfield_number\fB[\fR.\fIfirst_character\fB]\fR +.fi +.P +.RE +.P +Fields and characters within fields shall be numbered starting with 1. +The +.IR field_number +and +.IR first_character +pieces, interpreted as positive decimal integers, shall specify the +first character to be used as part of a sort key. If +.IR .first_character +is omitted, it shall refer to the first character of the field. +.P +The +.IR field_end +portion of the +.IR keydef +option-argument shall have the form: +.sp +.RS 4 +.nf + +\fIfield_number\fB[\fR.\fIlast_character\fB]\fR +.fi +.P +.RE +.P +The +.IR field_number +shall be as described above for +.IR field_start. +The +.IR last_character +piece, interpreted as a non-negative decimal integer, shall specify the +last character to be used as part of the sort key. If +.IR last_character +evaluates to zero or +.IR .last_character +is omitted, it shall refer to the last character of the field specified +by +.IR field_number . +.P +If the +.BR \-b +option or +.BR b +type modifier is in effect, characters within a field shall be counted +from the first non-\c +<blank> +in the field. (This shall apply separately to +.IR first_character +and +.IR last_character .) +.SH "EXIT STATUS" +The following exit values shall be returned: +.IP "\00" 6 +All input files were output successfully, or +.BR \-c +was specified and the input file was correctly sorted. +.IP "\01" 6 +Under the +.BR \-c +option, the file was not ordered as specified, or if the +.BR \-c +and +.BR \-u +options were both specified, two input lines were found with equal +keys. +.IP >1 6 +An error occurred. +.SH "CONSEQUENCES OF ERRORS" +The default requirements shall apply, except that if +.IR sort +encounters an error when opening or reading a +.IR file +operand, it may exit without writing any output to standard output or +processing later operands. +.LP +.IR "The following sections are informative." +.SH "APPLICATION USAGE" +The default value for +.BR \-t , +<blank>, +has different properties from, for example, +.BR \-t \c +"<space>". If a line contains: +.sp +.RS 4 +.nf + +<space><space>foo +.fi +.P +.RE +.P +the following treatment would occur with default separation as opposed +to specifically selecting a +<space>: +.TS +center box tab(@); +cB | cB | cB +n | l | l. +Field@Default@\-t "<space>" +_ +1@<space><space>foo@\fIempty\fP +2@\fIempty\fP@\fIempty\fP +3@\fIempty\fP@foo +.TE +.P +The leading field separator itself is included in a field when +.BR \-t +is not used. For example, this command returns an exit status of zero, +meaning the input was already sorted: +.sp +.RS 4 +.nf + +sort -c -k 2 <<eof +y<tab>b +x<space>a +eof +.fi +.P +.RE +.P +(assuming that a +<tab> +precedes the +<space> +in the current collating sequence). The field separator is not included +in a field when it is explicitly set via +.BR \-t . +This is historical practice and allows usage such as: +.sp +.RS 4 +.nf + +sort -t "|" -k 2n <<eof +Atlanta|425022|Georgia +Birmingham|284413|Alabama +Columbia|100385|South Carolina +eof +.fi +.P +.RE +.P +where the second field can be correctly sorted numerically without +regard to the non-numeric field separator. +.P +The wording in the OPTIONS section clarifies that the +.BR \-b , +.BR \-d , +.BR \-f , +.BR \-i , +.BR \-n , +and +.BR \-r +options have to come before the first sort key specified if they are +intended to apply to all specified keys. The way it is described in +\&this volume of POSIX.1\(hy2017 matches historical practice, not historical documentation. +The results are unspecified if these options are specified after a +.BR \-k +option. +.P +The +.BR \-f +option might not work as expected in locales where there is not a +one-to-one mapping between an uppercase and a lowercase letter. +.P +When using +.IR sort +to process pathnames, it is recommended that LC_ALL, or at least +LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environment, +since pathnames can contain byte sequences that do not form valid +characters in some locales, in which case the utility's behavior would +be undefined. In the POSIX locale each byte is a valid single-byte +character, and therefore this problem is avoided. +.P +If the collating sequence of the current locale does not have a total +ordering of all characters, this can affect the behavior of +.IR sort +in the following ways: +.IP " *" 4 +As +.IR sort +.BR \-u +suppresses lines with duplicate keys, it suppresses lines that collate +equally but are not identical. +.IP " *" 4 +The output of +.IR sort +(without +.BR \-u ) +can contain identical lines that are not adjacent, if it does not +implement the recommended further byte-by-byte comparison of lines +that collate equally. This affects the use of +.IR sort +with +.IR comm +and +.IR uniq ; +see the APPLICATION USAGE for those utilities. +.SH EXAMPLES +.IP " 1." 4 +The following command sorts the contents of +.BR infile +with the second field as the sort key: +.RS 4 +.sp +.RS 4 +.nf + +sort -k 2,2 infile +.fi +.P +.RE +.RE +.IP " 2." 4 +The following command sorts, in reverse order, the contents of +.BR infile1 +and +.BR infile2 , +placing the output in +.BR outfile +and using the second character of the second field as the sort key +(assuming that the first character of the second field is the field +separator): +.RS 4 +.sp +.RS 4 +.nf + +sort -r -o outfile -k 2.2,2.2 infile1 infile2 +.fi +.P +.RE +.RE +.IP " 3." 4 +The following command sorts the contents of +.BR infile1 +and +.BR infile2 +using the second non-\c +<blank> +of the second field as the sort key: +.RS 4 +.sp +.RS 4 +.nf + +sort -k 2.2b,2.2b infile1 infile2 +.fi +.P +.RE +.RE +.IP " 4." 4 +The following command prints the System\ V password file (user +database) sorted by the numeric user ID (the third +<colon>-separated +field): +.RS 4 +.sp +.RS 4 +.nf + +sort -t : -k 3,3n /etc/passwd +.fi +.P +.RE +.RE +.IP " 5." 4 +The following command prints the lines of the already sorted file +.BR infile , +suppressing all but one occurrence of lines having the same third +field: +.RS 4 +.sp +.RS 4 +.nf + +sort -um -k 3.1,3.0 infile +.fi +.P +.RE +.RE +.SH RATIONALE +Examples in some historical documentation state that options +.BR \-um +with one input file keep the first in each set of lines with equal +keys. This behavior was deemed to be an implementation artifact and +was not standardized. +.P +The +.BR \-z +option was omitted; it is not standard practice on most systems and is +inconsistent with using +.IR sort +to sort several files individually and then merge them together. The +text concerning +.BR \-z +in historical documentation appeared to require implementations to +determine the proper buffer length during the sort phase of operation, +but not during the merge. +.P +The +.BR \-y +option was omitted because of non-portability. The +.BR \-M +option, present in System V, was omitted because of non-portability in +international usage. +.P +An undocumented +.BR \-T +option exists in some implementations. It is used to specify a +directory for intermediate files. Implementations are encouraged to +support the use of the +.IR TMPDIR +environment variable instead of adding an option to support this +functionality. +.P +The +.BR \-k +option was added to satisfy two objections. First, the zero-based +counting used by +.IR sort +is not consistent with other utility conventions. Second, it did not +meet syntax guideline requirements. +.P +Historical documentation indicates that ``setting +.BR \-n +implies +.BR \-b ''. +The description of +.BR \-n +already states that optional leading <blank>s are tolerated in doing +the comparison. If +.BR \-b +is enabled, rather than implied, by +.BR \-n , +this has unusual side-effects. When a character offset is used in a +column of numbers (for example, to sort modulo 100), that offset is +measured relative to the most significant digit, not to the column. +Based upon a recommendation from the author of the original +.IR sort +utility, the +.BR \-b +implication has been omitted from this volume of POSIX.1\(hy2017, and an application wishing to +achieve the previously mentioned side-effects has to code the +.BR \-b +flag explicitly. +.P +Earlier versions of this standard allowed the +.BR \-o +option to appear after operands. Historical practice allowed all +options to be interspersed with operands. This version of the +standard allows implementations to accept options after operands +but conforming applications should not use this form. +.P +Earlier versions of this standard also allowed the +.BR \- \c +.IR number +and +.BR \(pl \c +.IR number +options. These options are no longer specified by POSIX.1\(hy2008 but may +be present in some implementations. +.P +Historical implementations produced a message on standard error when +.BR \-c +was specified and disorder was detected, and when +.BR \-c +and +.BR \-u +were specified and a duplicate key was detected. An earlier version of +this standard contained wording that did not make it clear that this +message was allowed and some implementations removed this message to +be sure that they conformed to the standard's requirements. Confronted +with this difference in behavior, interactive users that wanted to be +sure that they got visual feedback instead of just exit code 1 could +have used a command like: +.sp +.RS 4 +.nf + +sort -c file || echo disorder +.fi +.P +.RE +.P +whether or not the +.IR sort +utility provided a message in this case. But, it was not easy for a user +to find where the disorder or duplicate key occurred on implementations +that do not produce a message, especially when some parts of the input +line were not part of the key and when one or more of the +.BR \-b , +.BR \-d , +.BR \-f , +.BR \-i , +.BR \-n , +or +.BR \- r +options or +.IR keydef +type modifiers were in use. POSIX.1\(hy2008 requires a message to be +produced in this case. POSIX.1\(hy2008 also contains the +.BR \-C +option giving users the ability to choose either behavior. +.P +When a disorder or duplicate is found when the +.BR \-c +option is specified, some implementations print a message containing +the first line that is out of order or contains a duplicate key; others +print a message specifying the line number of the offending line. This +standard allows either type of message. +.P +Implementations are encouraged to perform the recommended further +byte-by-byte comparison of lines that collate equally, even though +this may affect efficiency. The impact on efficiency can be mitigated +by only performing the additional comparison if the current locale's +collating sequence does not have a total ordering of all characters +(if the implementation provides a way to query this) or by only +performing the additional comparison if the locale name associated +with the LC_COLLATE category has an +.BR '@' +modifier in the name (since locales without an +.BR '@' +modifier should have a total ordering of all characters \(em see the Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 7.3.2" ", " "LC_COLLATE"). +Note that if the implementation provides a +.IR "stable sort" +option as an extension (usually +.BR \-s ), +the additional comparison should not be performed when this option has +been specified. +.SH "FUTURE DIRECTIONS" +A future version of this standard may require that if the collating +sequence of the current locale does not have a total ordering of all +characters, any lines of input that collate equally when comparing +them as whole lines are further compared byte-by-byte using the +collating sequence for the POSIX locale. +.SH "SEE ALSO" +.IR "\fIcomm\fR\^", +.IR "\fIjoin\fR\^", +.IR "\fIuniq\fR\^" +.P +The Base Definitions volume of POSIX.1\(hy2017, +.IR "Section 7.3.2" ", " "LC_COLLATE", +.IR "Chapter 8" ", " "Environment Variables", +.IR "Section 12.2" ", " "Utility Syntax Guidelines" +.P +The System Interfaces volume of POSIX.1\(hy2017, +.IR "\fItoupper\fR\^(\|)" +.\" +.SH COPYRIGHT +Portions of this text are reprinted and reproduced in electronic form +from IEEE Std 1003.1-2017, Standard for Information Technology +-- Portable Operating System Interface (POSIX), The Open Group Base +Specifications Issue 7, 2018 Edition, +Copyright (C) 2018 by the Institute of +Electrical and Electronics Engineers, Inc and The Open Group. +In the event of any discrepancy between this version and the original IEEE and +The Open Group Standard, the original IEEE and The Open Group Standard +is the referee document. The original Standard can be obtained online at +http://www.opengroup.org/unix/online.html . +.PP +Any typographical or formatting errors that appear +in this page are most likely +to have been introduced during the conversion of the source files to +man page format. To report such errors, see +https://www.kernel.org/doc/man-pages/reporting_bugs.html . |