diff options
Diffstat (limited to '')
-rw-r--r-- | man3/sscanf.3 | 742 |
1 files changed, 742 insertions, 0 deletions
diff --git a/man3/sscanf.3 b/man3/sscanf.3 new file mode 100644 index 0000000..223f4f5 --- /dev/null +++ b/man3/sscanf.3 @@ -0,0 +1,742 @@ +'\" t +.\" Copyright (c) 1990, 1991 The Regents of the University of California. +.\" All rights reserved. +.\" +.\" This code is derived from software contributed to Berkeley by +.\" Chris Torek and the American National Standards Committee X3, +.\" on Information Processing Systems. +.\" +.\" SPDX-License-Identifier: BSD-4-Clause-UC +.\" +.\" @(#)scanf.3 6.14 (Berkeley) 1/8/93 +.\" +.\" Converted for Linux, Mon Nov 29 15:22:01 1993, faith@cs.unc.edu +.\" modified to resemble the GNU libio setup used in the Linux libc +.\" used in versions 4.x (x>4) and 5 Helmut.Geyer@iwr.uni-heidelberg.de +.\" Modified, aeb, 970121 +.\" 2005-07-14, mtk, added description of %n$ form; various text +.\" incorporated from the GNU C library documentation ((C) The +.\" Free Software Foundation); other parts substantially rewritten. +.\" +.\" 2008-06-23, mtk +.\" Add ERRORS section. +.\" Document the 'a' and 'm' modifiers for dynamic string allocation. +.\" +.TH sscanf 3 2023-07-20 "Linux man-pages 6.05.01" +.SH NAME +sscanf, vsscanf \- input string format conversion +.SH LIBRARY +Standard C library +.RI ( libc ", " \-lc ) +.SH SYNOPSIS +.nf +.B #include <stdio.h> +.PP +.BI "int sscanf(const char *restrict " str , +.BI " const char *restrict " format ", ...);" +.PP +.B #include <stdarg.h> +.PP +.BI "int vsscanf(const char *restrict " str , +.BI " const char *restrict " format ", va_list " ap ); +.fi +.PP +.RS -4 +Feature Test Macro Requirements for glibc (see +.BR feature_test_macros (7)): +.RE +.PP +.BR vsscanf (): +.nf + _ISOC99_SOURCE || _POSIX_C_SOURCE >= 200112L +.fi +.SH DESCRIPTION +The +.BR sscanf () +family of functions scans input according to +.I format +as described below. +This format may contain +.IR "conversion specifications" ; +the results from such conversions, if any, +are stored in the locations pointed to by the +.I pointer +arguments that follow +.IR format . +Each +.I pointer +argument must be of a type that is appropriate for the value returned +by the corresponding conversion specification. +.PP +If the number of conversion specifications in +.I format +exceeds the number of +.I pointer +arguments, the results are undefined. +If the number of +.I pointer +arguments exceeds the number of conversion specifications, then the excess +.I pointer +arguments are evaluated, but are otherwise ignored. +.PP +.BR sscanf () +These functions +read their input from the string pointed to by +.IR str . +.PP +The +.BR vsscanf () +function is analogous to +.BR vsprintf (3). +.PP +The +.I format +string consists of a sequence of +.I directives +which describe how to process the sequence of input characters. +If processing of a directive fails, no further input is read, and +.BR sscanf () +returns. +A "failure" can be either of the following: +.IR "input failure" , +meaning that input characters were unavailable, or +.IR "matching failure" , +meaning that the input was inappropriate (see below). +.PP +A directive is one of the following: +.TP +\[bu] +A sequence of white-space characters (space, tab, newline, etc.; see +.BR isspace (3)). +This directive matches any amount of white space, +including none, in the input. +.TP +\[bu] +An ordinary character (i.e., one other than white space or \[aq]%\[aq]). +This character must exactly match the next character of input. +.TP +\[bu] +A conversion specification, +which commences with a \[aq]%\[aq] (percent) character. +A sequence of characters from the input is converted according to +this specification, and the result is placed in the corresponding +.I pointer +argument. +If the next item of input does not match the conversion specification, +the conversion fails\[em]this is a +.IR "matching failure" . +.PP +Each +.I conversion specification +in +.I format +begins with either the character \[aq]%\[aq] or the character sequence +"\fB%\fP\fIn\fP\fB$\fP" +(see below for the distinction) followed by: +.TP +\[bu] +An optional \[aq]*\[aq] assignment-suppression character: +.BR sscanf () +reads input as directed by the conversion specification, +but discards the input. +No corresponding +.I pointer +argument is required, and this specification is not +included in the count of successful assignments returned by +.BR scanf (). +.TP +\[bu] +For decimal conversions, an optional quote character (\[aq]). +This specifies that the input number may include thousands' +separators as defined by the +.B LC_NUMERIC +category of the current locale. +(See +.BR setlocale (3).) +The quote character may precede or follow the \[aq]*\[aq] +assignment-suppression character. +.TP +\[bu] +An optional \[aq]m\[aq] character. +This is used with string conversions +.RI ( %s , +.IR %c , +.IR %[ ), +and relieves the caller of the +need to allocate a corresponding buffer to hold the input: instead, +.BR sscanf () +allocates a buffer of sufficient size, +and assigns the address of this buffer to the corresponding +.I pointer +argument, which should be a pointer to a +.I "char\ *" +variable (this variable does not need to be initialized before the call). +The caller should subsequently +.BR free (3) +this buffer when it is no longer required. +.TP +\[bu] +An optional decimal integer which specifies the +.IR "maximum field width" . +Reading of characters stops either when this maximum is reached or +when a nonmatching character is found, whichever happens first. +Most conversions discard initial white space characters (the exceptions +are noted below), +and these discarded characters don't count toward the maximum field width. +String input conversions store a terminating null byte (\[aq]\e0\[aq]) +to mark the end of the input; +the maximum field width does not include this terminator. +.TP +\[bu] +An optional +.IR "type modifier character" . +For example, the +.B l +type modifier is used with integer conversions such as +.B %d +to specify that the corresponding +.I pointer +argument refers to a +.I "long" +rather than a pointer to an +.IR int . +.TP +\[bu] +A +.I "conversion specifier" +that specifies the type of input conversion to be performed. +.PP +The conversion specifications in +.I format +are of two forms, either beginning with \[aq]%\[aq] or beginning with +"\fB%\fP\fIn\fP\fB$\fP". +The two forms should not be mixed in the same +.I format +string, except that a string containing +"\fB%\fP\fIn\fP\fB$\fP" +specifications can include +.B %% +and +.BR %* . +If +.I format +contains \[aq]%\[aq] +specifications, then these correspond in order with successive +.I pointer +arguments. +In the +"\fB%\fP\fIn\fP\fB$\fP" +form (which is specified in POSIX.1-2001, but not C99), +.I n +is a decimal integer that specifies that the converted input should +be placed in the location referred to by the +.IR n -th +.I pointer +argument following +.IR format . +.SS Conversions +The following +.I "type modifier characters" +can appear in a conversion specification: +.TP +.B h +Indicates that the conversion will be one of +\fBd\fP, \fBi\fP, \fBo\fP, \fBu\fP, \fBx\fP, \fBX\fP, or \fBn\fP +and the next pointer is a pointer to a +.I short +or +.I unsigned short +(rather than +.IR int ). +.TP +.B hh +As for +.BR h , +but the next pointer is a pointer to a +.I signed char +or +.IR "unsigned char" . +.TP +.B j +As for +.BR h , +but the next pointer is a pointer to an +.I intmax_t +or a +.IR uintmax_t . +This modifier was introduced in C99. +.TP +.B l +Indicates either that the conversion will be one of +\fBd\fP, \fBi\fP, \fBo\fP, \fBu\fP, \fBx\fP, \fBX\fP, or \fBn\fP +and the next pointer is a pointer to a +.I long +or +.I unsigned long +(rather than +.IR int ), +or that the conversion will be one of +\fBe\fP, \fBf\fP, or \fBg\fP +and the next pointer is a pointer to +.I double +(rather than +.IR float ). +If used with +.B %c +or +.BR %s , +the corresponding parameter is considered +as a pointer to a wide character or wide-character string respectively. +.\" This use of l was introduced in Amendment 1 to ISO C90. +.TP +.B ll +(ell-ell) +Indicates that the conversion will be one of +.BR b , +.BR d , +.BR i , +.BR o , +.BR u , +.BR x , +.BR X , +or +.B n +and the next pointer is a pointer to a +.I long long +or +.I unsigned long long +(rather than +.IR int ). +.TP +.B L +Indicates that the conversion will be either +\fBe\fP, \fBf\fP, or \fBg\fP +and the next pointer is a pointer to +.I "long double" +or +(as a GNU extension) +the conversion will be +\fBd\fP, \fBi\fP, \fBo\fP, \fBu\fP, or \fBx\fP +and the next pointer is a pointer to +.IR "long long" . +.\" MTK, Jul 05: The following is no longer true for modern +.\" ANSI C (i.e., C99): +.\" (Note that long long is not an +.\" ANSI C +.\" type. Any program using this will not be portable to all +.\" architectures). +.TP +.B q +equivalent to +.BR L . +This specifier does not exist in ANSI C. +.TP +.B t +As for +.BR h , +but the next pointer is a pointer to a +.IR ptrdiff_t . +This modifier was introduced in C99. +.TP +.B z +As for +.BR h , +but the next pointer is a pointer to a +.IR size_t . +This modifier was introduced in C99. +.PP +The following +.I "conversion specifiers" +are available: +.TP +.B % +Matches a literal \[aq]%\[aq]. +That is, +.B %\&% +in the format string matches a +single input \[aq]%\[aq] character. +No conversion is done (but initial white space characters are discarded), +and assignment does not occur. +.TP +.B d +.IR Deprecated . +Matches an optionally signed decimal integer; +the next pointer must be a pointer to +.IR int . +.\" .TP +.\" .B D +.\" Equivalent to +.\" .IR ld ; +.\" this exists only for backward compatibility. +.\" (Note: thus only in libc4 +.\" In libc5 and glibc the +.\" .B %D +.\" is silently ignored, causing old programs to fail mysteriously.) +.TP +.B i +.IR Deprecated . +Matches an optionally signed integer; the next pointer must be a pointer to +.IR int . +The integer is read in base 16 if it begins with +.I 0x +or +.IR 0X , +in base 8 if it begins with +.IR 0 , +and in base 10 otherwise. +Only characters that correspond to the base are used. +.TP +.B o +.IR Deprecated . +Matches an unsigned octal integer; the next pointer must be a pointer to +.IR "unsigned int" . +.TP +.B u +.IR Deprecated . +Matches an unsigned decimal integer; the next pointer must be a +pointer to +.IR "unsigned int" . +.TP +.B x +.IR Deprecated . +Matches an unsigned hexadecimal integer +(that may optionally begin with a prefix of +.I 0x +or +.IR 0X , +which is discarded); the next pointer must +be a pointer to +.IR "unsigned int" . +.TP +.B X +.IR Deprecated . +Equivalent to +.BR x . +.TP +.B f +.IR Deprecated . +Matches an optionally signed floating-point number; the next pointer must +be a pointer to +.IR float . +.TP +.B e +.IR Deprecated . +Equivalent to +.BR f . +.TP +.B g +.IR Deprecated . +Equivalent to +.BR f . +.TP +.B E +.IR Deprecated . +Equivalent to +.BR f . +.TP +.B a +.IR Deprecated . +(C99) Equivalent to +.BR f . +.TP +.B s +Matches a sequence of non-white-space characters; +the next pointer must be a pointer to the initial element of a +character array that is long enough to hold the input sequence and +the terminating null byte (\[aq]\e0\[aq]), which is added automatically. +The input string stops at white space or at the maximum field +width, whichever occurs first. +.TP +.B c +Matches a sequence of characters whose length is specified by the +.I maximum field width +(default 1); the next pointer must be a pointer to +.IR char , +and there must be enough room for all the characters +(no terminating null byte is added). +The usual skip of leading white space is suppressed. +To skip white space first, use an explicit space in the format. +.TP +.B \&[ +Matches a nonempty sequence of characters from the specified set of +accepted characters; the next pointer must be a pointer to +.IR char , +and there must be enough room for all the characters in the string, plus a +terminating null byte. +The usual skip of leading white space is suppressed. +The string is to be made up of characters in (or not in) a particular set; +the set is defined by the characters between the open bracket +.B [ +character and a close bracket +.B ] +character. +The set +.I excludes +those characters if the first character after the open bracket is a +circumflex +.RB ( \[ha] ). +To include a close bracket in the set, make it the first character after +the open bracket or the circumflex; any other position will end the set. +The hyphen character +.B \- +is also special; when placed between two other characters, it adds all +intervening characters to the set. +To include a hyphen, make it the last +character before the final close bracket. +For instance, +.B [\[ha]]0\-9\-] +means +the set "everything except close bracket, zero through nine, and hyphen". +The string ends with the appearance of a character not in the (or, with a +circumflex, in) set or when the field width runs out. +.TP +.B p +Matches a pointer value (as printed by +.B %p +in +.BR printf (3)); +the next pointer must be a pointer to a pointer to +.IR void . +.TP +.B n +Nothing is expected; instead, the number of characters consumed thus far +from the input is stored through the next pointer, which must be a pointer +to +.IR int , +or variant whose size matches the (optionally) +supplied integer length modifier. +This is +.I not +a conversion and does +.I not +increase the count returned by the function. +The assignment can be suppressed with the +.B * +assignment-suppression character, but the effect on the +return value is undefined. +Therefore +.B %*n +conversions should not be used. +.SH RETURN VALUE +On success, these functions return the number of input items +successfully matched and assigned; +this can be fewer than provided for, +or even zero, in the event of an early matching failure. +.PP +The value +.B EOF +is returned if the end of input is reached before either the first +successful conversion or a matching failure occurs. +.SH ERRORS +.TP +.B EILSEQ +Input byte sequence does not form a valid character. +.TP +.B EINVAL +Not enough arguments; or +.I format +is NULL. +.TP +.B ENOMEM +Out of memory. +.SH ATTRIBUTES +For an explanation of the terms used in this section, see +.BR attributes (7). +.TS +allbox; +lbx lb lb +l l l. +Interface Attribute Value +T{ +.na +.nh +.BR sscanf (), +.BR vsscanf () +T} Thread safety MT-Safe locale +.TE +.sp 1 +.SH STANDARDS +C11, POSIX.1-2008. +.SH HISTORY +C89, POSIX.1-2001. +.PP +The +.B q +specifier is the 4.4BSD notation for +.IR "long long" , +while +.B ll +or the usage of +.B L +in integer conversions is the GNU notation. +.PP +The Linux version of these functions is based on the +.I GNU +.I libio +library. +Take a look at the +.I info +documentation of +.I GNU +.I libc (glibc-1.08) +for a more concise description. +.SH NOTES +.SS The 'a' assignment-allocation modifier +Originally, the GNU C library supported dynamic allocation for string inputs +(as a nonstandard extension) via the +.B a +character. +(This feature is present at least as far back as glibc 2.0.) +Thus, one could write the following to have +.BR sscanf () +allocate a buffer for a string, +with a pointer to that buffer being returned in +.IR *buf : +.PP +.in +4n +.EX +char *buf; +sscanf(str, "%as", &buf); +.EE +.in +.PP +The use of the letter +.B a +for this purpose was problematic, since +.B a +is also specified by the ISO C standard as a synonym for +.B f +(floating-point input). +POSIX.1-2008 instead specifies the +.B m +modifier for assignment allocation (as documented in DESCRIPTION, above). +.PP +Note that the +.B a +modifier is not available if the program is compiled with +.I gcc\~\-std=c99 +or +.I gcc\~\-D_ISOC99_SOURCE +(unless +.B _GNU_SOURCE +is also specified), in which case the +.B a +is interpreted as a specifier for floating-point numbers (see above). +.PP +Support for the +.B m +modifier was added to glibc 2.7, +and new programs should use that modifier instead of +.BR a . +.PP +As well as being standardized by POSIX, the +.B m +modifier has the following further advantages over +the use of +.BR a : +.IP \[bu] 3 +It may also be applied to +.B %c +conversion specifiers (e.g., +.BR %3mc ). +.IP \[bu] +It avoids ambiguity with respect to the +.B %a +floating-point conversion specifier (and is unaffected by +.I gcc\~\-std=c99 +etc.). +.SH BUGS +.SS Numeric conversion specifiers +Use of the numeric conversion specifiers produces Undefined Behavior +for invalid input. +See +.UR https://port70.net/\:%7Ensz/\:c/\:c11/\:n1570.html\:#7.21.6.2p10 +C11 7.21.6.2/10 +.UE . +This is a bug in the ISO C standard, +and not an inherent design issue with the API. +However, +current implementations are not safe from that bug, +so it is not recommended to use them. +Instead, +programs should use functions such as +.BR strtol (3) +to parse numeric input. +This manual page deprecates use of the numeric conversion specifiers +until they are fixed by ISO C. +.SS Nonstandard modifiers +These functions are fully C99 conformant, but provide the +additional modifiers +.B q +and +.B a +as well as an additional behavior of the +.B L +and +.B ll +modifiers. +The latter may be considered to be a bug, as it changes the +behavior of modifiers defined in C99. +.PP +Some combinations of the type modifiers and conversion +specifiers defined by C99 do not make sense +(e.g., +.BR "%Ld" ). +While they may have a well-defined behavior on Linux, this need not +to be so on other architectures. +Therefore it usually is better to use +modifiers that are not defined by C99 at all, that is, use +.B q +instead of +.B L +in combination with +\fBd\fP, \fBi\fP, \fBo\fP, \fBu\fP, \fBx\fP, and \fBX\fP +conversions or +.BR ll . +.PP +The usage of +.B q +is not the same as on 4.4BSD, +as it may be used in float conversions equivalently to +.BR L . +.SH EXAMPLES +To use the dynamic allocation conversion specifier, specify +.B m +as a length modifier (thus +.B %ms +or +\fB%m[\fP\fIrange\fP\fB]\fP). +The caller must +.BR free (3) +the returned string, as in the following example: +.PP +.in +4n +.EX +char *p; +int n; +\& +errno = 0; +n = sscanf(str, "%m[a\-z]", &p); +if (n == 1) { + printf("read: %s\en", p); + free(p); +} else if (errno != 0) { + perror("sscanf"); +} else { + fprintf(stderr, "No matching characters\en"); +} +.EE +.in +.PP +As shown in the above example, it is necessary to call +.BR free (3) +only if the +.BR sscanf () +call successfully read a string. +.SH SEE ALSO +.BR getc (3), +.BR printf (3), +.BR setlocale (3), +.BR strtod (3), +.BR strtol (3), +.BR strtoul (3) |