diff options
Diffstat (limited to 'Documentation/parse-date.txt')
-rw-r--r-- | Documentation/parse-date.txt | 468 |
1 files changed, 468 insertions, 0 deletions
diff --git a/Documentation/parse-date.txt b/Documentation/parse-date.txt new file mode 100644 index 0000000..b23abe7 --- /dev/null +++ b/Documentation/parse-date.txt @@ -0,0 +1,468 @@ + +NAME + parse_date - parses a date string into a timespec struct. + +SYNOPSIS + #include "timeutils.h" + + int parse_date(struct timespec *result, char const *p, + struct timespec const *now) + + LDADD libcommon.la + +DESCRIPTION + Parse a date/time string, storing the resulting time value into *result. + The string itself is pointed to by *p. Return 1 if successful. + *p can be an incomplete or relative time specification; if so, use + *now as the basis for the returned time. + + +This function is based upon gnulib's parse-datetime.y-dd7a871. + +Below is a plain text version of the gnulib parse-datetime.texi-dd7a871 manual +describing the input strings that are recognized. + +Any future modifications to the util-linux parser that affect input strings +should be noted below. + + +1 Date input formats +******************** + +First, a quote: + + Our units of temporal measurement, from seconds on up to months, + are so complicated, asymmetrical and disjunctive so as to make + coherent mental reckoning in time all but impossible. Indeed, had + some tyrannical god contrived to enslave our minds to time, to + make it all but impossible for us to escape subjection to sodden + routines and unpleasant surprises, he could hardly have done + better than handing down our present system. It is like a set of + trapezoidal building blocks, with no vertical or horizontal + surfaces, like a language in which the simplest thought demands + ornate constructions, useless particles and lengthy + circumlocutions. Unlike the more successful patterns of language + and science, which enable us to face experience boldly or at least + level-headedly, our system of temporal calculation silently and + persistently encourages our terror of time. + + ... It is as though architects had to measure length in feet, + width in meters and height in ells; as though basic instruction + manuals demanded a knowledge of five different languages. It is + no wonder then that we often look into our own immediate past or + future, last Tuesday or a week from Sunday, with feelings of + helpless confusion. ... + + --Robert Grudin, `Time and the Art of Living'. + + This section describes the textual date representations that GNU +programs accept. These are the strings you, as a user, can supply as +arguments to the various programs. The C interface (via the +`parse_datetime' function) is not described here. + +1.1 General date syntax +======================= + +A "date" is a string, possibly empty, containing many items separated +by whitespace. The whitespace may be omitted when no ambiguity arises. +The empty string means the beginning of today (i.e., midnight). Order +of the items is immaterial. A date string may contain many flavors of +items: + + * calendar date items + + * time of day items + + * time zone items + + * combined date and time of day items + + * day of the week items + + * relative items + + * pure numbers. + +We describe each of these item types in turn, below. + + A few ordinal numbers may be written out in words in some contexts. +This is most useful for specifying day of the week items or relative +items (see below). Among the most commonly used ordinal numbers, the +word `last' stands for -1, `this' stands for 0, and `first' and `next' +both stand for 1. Because the word `second' stands for the unit of +time there is no way to write the ordinal number 2, but for convenience +`third' stands for 3, `fourth' for 4, `fifth' for 5, `sixth' for 6, +`seventh' for 7, `eighth' for 8, `ninth' for 9, `tenth' for 10, +`eleventh' for 11 and `twelfth' for 12. + + When a month is written this way, it is still considered to be +written numerically, instead of being "spelled in full"; this changes +the allowed strings. + + In the current implementation, only English is supported for words +and abbreviations like `AM', `DST', `EST', `first', `January', +`Sunday', `tomorrow', and `year'. + + The output of the `date' command is not always acceptable as a date +string, not only because of the language problem, but also because +there is no standard meaning for time zone items like `IST'. When using +`date' to generate a date string intended to be parsed later, specify a +date format that is independent of language and that does not use time +zone items other than `UTC' and `Z'. Here are some ways to do this: + + $ LC_ALL=C TZ=UTC0 date + Mon Mar 1 00:21:42 UTC 2004 + $ TZ=UTC0 date +'%Y-%m-%d %H:%M:%SZ' + 2004-03-01 00:21:42Z + $ date --rfc-3339=ns # --rfc-3339 is a GNU extension. + 2004-02-29 16:21:42.692722128-08:00 + $ date --rfc-2822 # a GNU extension + Sun, 29 Feb 2004 16:21:42 -0800 + $ date +'%Y-%m-%d %H:%M:%S %z' # %z is a GNU extension. + 2004-02-29 16:21:42 -0800 + $ date +'@%s.%N' # %s and %N are GNU extensions. + @1078100502.692722128 + + Alphabetic case is completely ignored in dates. Comments may be +introduced between round parentheses, as long as included parentheses +are properly nested. Hyphens not followed by a digit are currently +ignored. Leading zeros on numbers are ignored. + + Invalid dates like `2005-02-29' or times like `24:00' are rejected. +In the typical case of a host that does not support leap seconds, a +time like `23:59:60' is rejected even if it corresponds to a valid leap +second. + +1.2 Calendar date items +======================= + +A "calendar date item" specifies a day of the year. It is specified +differently, depending on whether the month is specified numerically or +literally. All these strings specify the same calendar date: + + 1972-09-24 # ISO 8601. + 72-9-24 # Assume 19xx for 69 through 99, + # 20xx for 00 through 68. + 72-09-24 # Leading zeros are ignored. + 9/24/72 # Common U.S. writing. + 24 September 1972 + 24 Sept 72 # September has a special abbreviation. + 24 Sep 72 # Three-letter abbreviations always allowed. + Sep 24, 1972 + 24-sep-72 + 24sep72 + + The year can also be omitted. In this case, the last specified year +is used, or the current year if none. For example: + + 9/24 + sep 24 + + Here are the rules. + + For numeric months, the ISO 8601 format `YEAR-MONTH-DAY' is allowed, +where YEAR is any positive number, MONTH is a number between 01 and 12, +and DAY is a number between 01 and 31. A leading zero must be present +if a number is less than ten. If YEAR is 68 or smaller, then 2000 is +added to it; otherwise, if YEAR is less than 100, then 1900 is added to +it. The construct `MONTH/DAY/YEAR', popular in the United States, is +accepted. Also `MONTH/DAY', omitting the year. + + Literal months may be spelled out in full: `January', `February', +`March', `April', `May', `June', `July', `August', `September', +`October', `November' or `December'. Literal months may be abbreviated +to their first three letters, possibly followed by an abbreviating dot. +It is also permitted to write `Sept' instead of `September'. + + When months are written literally, the calendar date may be given as +any of the following: + + DAY MONTH YEAR + DAY MONTH + MONTH DAY YEAR + DAY-MONTH-YEAR + + Or, omitting the year: + + MONTH DAY + +1.3 Time of day items +===================== + +A "time of day item" in date strings specifies the time on a given day. +Here are some examples, all of which represent the same time: + + 20:02:00.000000 + 20:02 + 8:02pm + 20:02-0500 # In EST (U.S. Eastern Standard Time). + + More generally, the time of day may be given as +`HOUR:MINUTE:SECOND', where HOUR is a number between 0 and 23, MINUTE +is a number between 0 and 59, and SECOND is a number between 0 and 59 +possibly followed by `.' or `,' and a fraction containing one or more +digits. Alternatively, `:SECOND' can be omitted, in which case it is +taken to be zero. On the rare hosts that support leap seconds, SECOND +may be 60. + + If the time is followed by `am' or `pm' (or `a.m.' or `p.m.'), HOUR +is restricted to run from 1 to 12, and `:MINUTE' may be omitted (taken +to be zero). `am' indicates the first half of the day, `pm' indicates +the second half of the day. In this notation, 12 is the predecessor of +1: midnight is `12am' while noon is `12pm'. (This is the zero-oriented +interpretation of `12am' and `12pm', as opposed to the old tradition +derived from Latin which uses `12m' for noon and `12pm' for midnight.) + + The time may alternatively be followed by a time zone correction, +expressed as `SHHMM', where S is `+' or `-', HH is a number of zone +hours and MM is a number of zone minutes. The zone minutes term, MM, +may be omitted, in which case the one- or two-digit correction is +interpreted as a number of hours. You can also separate HH from MM +with a colon. When a time zone correction is given this way, it forces +interpretation of the time relative to Coordinated Universal Time +(UTC), overriding any previous specification for the time zone or the +local time zone. For example, `+0530' and `+05:30' both stand for the +time zone 5.5 hours ahead of UTC (e.g., India). This is the best way to +specify a time zone correction by fractional parts of an hour. The +maximum zone correction is 24 hours. + + Either `am'/`pm' or a time zone correction may be specified, but not +both. + +1.4 Time zone items +=================== + +A "time zone item" specifies an international time zone, indicated by a +small set of letters, e.g., `UTC' or `Z' for Coordinated Universal +Time. Any included periods are ignored. By following a +non-daylight-saving time zone by the string `DST' in a separate word +(that is, separated by some white space), the corresponding daylight +saving time zone may be specified. Alternatively, a +non-daylight-saving time zone can be followed by a time zone +correction, to add the two values. This is normally done only for +`UTC'; for example, `UTC+05:30' is equivalent to `+05:30'. + + Time zone items other than `UTC' and `Z' are obsolescent and are not +recommended, because they are ambiguous; for example, `EST' has a +different meaning in Australia than in the United States. Instead, +it's better to use unambiguous numeric time zone corrections like +`-0500', as described in the previous section. + + If neither a time zone item nor a time zone correction is supplied, +timestamps are interpreted using the rules of the default time zone +(*note Specifying time zone rules::). + +1.5 Combined date and time of day items +======================================= + +The ISO 8601 date and time of day extended format consists of an ISO +8601 date, a `T' character separator, and an ISO 8601 time of day. +This format is also recognized if the `T' is replaced by a space. + + In this format, the time of day should use 24-hour notation. +Fractional seconds are allowed, with either comma or period preceding +the fraction. ISO 8601 fractional minutes and hours are not supported. +Typically, hosts support nanosecond timestamp resolution; excess +precision is silently discarded. + + Here are some examples: + + 2012-09-24T20:02:00.052-05:00 + 2012-12-31T23:59:59,999999999+11:00 + 1970-01-01 00:00Z + +1.6 Day of week items +===================== + +The explicit mention of a day of the week will forward the date (only +if necessary) to reach that day of the week in the future. + + Days of the week may be spelled out in full: `Sunday', `Monday', +`Tuesday', `Wednesday', `Thursday', `Friday' or `Saturday'. Days may +be abbreviated to their first three letters, optionally followed by a +period. The special abbreviations `Tues' for `Tuesday', `Wednes' for +`Wednesday' and `Thur' or `Thurs' for `Thursday' are also allowed. + + A number may precede a day of the week item to move forward +supplementary weeks. It is best used in expression like `third +monday'. In this context, `last DAY' or `next DAY' is also acceptable; +they move one week before or after the day that DAY by itself would +represent. + + A comma following a day of the week item is ignored. + +1.7 Relative items in date strings +================================== + +"Relative items" adjust a date (or the current date if none) forward or +backward. The effects of relative items accumulate. Here are some +examples: + + 1 year + 1 year ago + 3 years + 2 days + + The unit of time displacement may be selected by the string `year' +or `month' for moving by whole years or months. These are fuzzy units, +as years and months are not all of equal duration. More precise units +are `fortnight' which is worth 14 days, `week' worth 7 days, `day' +worth 24 hours, `hour' worth 60 minutes, `minute' or `min' worth 60 +seconds, and `second' or `sec' worth one second. An `s' suffix on +these units is accepted and ignored. + + The unit of time may be preceded by a multiplier, given as an +optionally signed number. Unsigned numbers are taken as positively +signed. No number at all implies 1 for a multiplier. Following a +relative item by the string `ago' is equivalent to preceding the unit +by a multiplier with value -1. + + The string `tomorrow' is worth one day in the future (equivalent to +`day'), the string `yesterday' is worth one day in the past (equivalent +to `day ago'). + + The strings `now' or `today' are relative items corresponding to +zero-valued time displacement, these strings come from the fact a +zero-valued time displacement represents the current time when not +otherwise changed by previous items. They may be used to stress other +items, like in `12:00 today'. The string `this' also has the meaning +of a zero-valued time displacement, but is preferred in date strings +like `this thursday'. + + When a relative item causes the resulting date to cross a boundary +where the clocks were adjusted, typically for daylight saving time, the +resulting date and time are adjusted accordingly. + + The fuzz in units can cause problems with relative items. For +example, `2003-07-31 -1 month' might evaluate to 2003-07-01, because +2003-06-31 is an invalid date. To determine the previous month more +reliably, you can ask for the month before the 15th of the current +month. For example: + + $ date -R + Thu, 31 Jul 2003 13:02:39 -0700 + $ date --date='-1 month' +'Last month was %B?' + Last month was July? + $ date --date="$(date +%Y-%m-15) -1 month" +'Last month was %B!' + Last month was June! + + Also, take care when manipulating dates around clock changes such as +daylight saving leaps. In a few cases these have added or subtracted +as much as 24 hours from the clock, so it is often wise to adopt +universal time by setting the `TZ' environment variable to `UTC0' +before embarking on calendrical calculations. + +1.8 Pure numbers in date strings +================================ + +The precise interpretation of a pure decimal number depends on the +context in the date string. + + If the decimal number is of the form YYYYMMDD and no other calendar +date item (*note Calendar date items::) appears before it in the date +string, then YYYY is read as the year, MM as the month number and DD as +the day of the month, for the specified calendar date. + + If the decimal number is of the form HHMM and no other time of day +item appears before it in the date string, then HH is read as the hour +of the day and MM as the minute of the hour, for the specified time of +day. MM can also be omitted. + + If both a calendar date and a time of day appear to the left of a +number in the date string, but no relative item, then the number +overrides the year. + +1.9 Seconds since the Epoch +=========================== + +If you precede a number with `@', it represents an internal timestamp +as a count of seconds. The number can contain an internal decimal +point (either `.' or `,'); any excess precision not supported by the +internal representation is truncated toward minus infinity. Such a +number cannot be combined with any other date item, as it specifies a +complete timestamp. + + Internally, computer times are represented as a count of seconds +since an epoch--a well-defined point of time. On GNU and POSIX +systems, the epoch is 1970-01-01 00:00:00 UTC, so `@0' represents this +time, `@1' represents 1970-01-01 00:00:01 UTC, and so forth. GNU and +most other POSIX-compliant systems support such times as an extension +to POSIX, using negative counts, so that `@-1' represents 1969-12-31 +23:59:59 UTC. + + Traditional Unix systems count seconds with 32-bit two's-complement +integers and can represent times from 1901-12-13 20:45:52 through +2038-01-19 03:14:07 UTC. More modern systems use 64-bit counts of +seconds with nanosecond subcounts, and can represent all the times in +the known lifetime of the universe to a resolution of 1 nanosecond. + + On most hosts, these counts ignore the presence of leap seconds. +For example, on most hosts `@915148799' represents 1998-12-31 23:59:59 +UTC, `@915148800' represents 1999-01-01 00:00:00 UTC, and there is no +way to represent the intervening leap second 1998-12-31 23:59:60 UTC. + +1.10 Specifying time zone rules +=============================== + +Normally, dates are interpreted using the rules of the current time +zone, which in turn are specified by the `TZ' environment variable, or +by a system default if `TZ' is not set. To specify a different set of +default time zone rules that apply just to one date, start the date +with a string of the form `TZ="RULE"'. The two quote characters (`"') +must be present in the date, and any quotes or backslashes within RULE +must be escaped by a backslash. + + For example, with the GNU `date' command you can answer the question +"What time is it in New York when a Paris clock shows 6:30am on October +31, 2004?" by using a date beginning with `TZ="Europe/Paris"' as shown +in the following shell transcript: + + $ export TZ="America/New_York" + $ date --date='TZ="Europe/Paris" 2004-10-31 06:30' + Sun Oct 31 01:30:00 EDT 2004 + + In this example, the `--date' operand begins with its own `TZ' +setting, so the rest of that operand is processed according to +`Europe/Paris' rules, treating the string `2004-10-31 06:30' as if it +were in Paris. However, since the output of the `date' command is +processed according to the overall time zone rules, it uses New York +time. (Paris was normally six hours ahead of New York in 2004, but +this example refers to a brief Halloween period when the gap was five +hours.) + + A `TZ' value is a rule that typically names a location in the `tz' +database (http://www.twinsun.com/tz/tz-link.htm). A recent catalog of +location names appears in the TWiki Date and Time Gateway +(http://twiki.org/cgi-bin/xtra/tzdate). A few non-GNU hosts require a +colon before a location name in a `TZ' setting, e.g., +`TZ=":America/New_York"'. + + The `tz' database includes a wide variety of locations ranging from +`Arctic/Longyearbyen' to `Antarctica/South_Pole', but if you are at sea +and have your own private time zone, or if you are using a non-GNU host +that does not support the `tz' database, you may need to use a POSIX +rule instead. Simple POSIX rules like `UTC0' specify a time zone +without daylight saving time; other rules can specify simple daylight +saving regimes. *Note Specifying the Time Zone with `TZ': (libc)TZ +Variable. + +1.11 Authors of `parse_datetime' +================================ + +`parse_datetime' started life as `getdate', as originally implemented +by Steven M. Bellovin (<smb@research.att.com>) while at the University +of North Carolina at Chapel Hill. The code was later tweaked by a +couple of people on Usenet, then completely overhauled by Rich $alz +(<rsalz@bbn.com>) and Jim Berets (<jberets@bbn.com>) in August, 1990. +Various revisions for the GNU system were made by David MacKenzie, Jim +Meyering, Paul Eggert and others, including renaming it to `get_date' to +avoid a conflict with the alternative Posix function `getdate', and a +later rename to `parse_datetime'. The Posix function `getdate' can +parse more locale-specific dates using `strptime', but relies on an +environment variable and external file, and lacks the thread-safety of +`parse_datetime'. + + This chapter was originally produced by François Pinard +(<pinard@iro.umontreal.ca>) from the `parse_datetime.y' source code, +and then edited by K. Berry (<kb@cs.umb.edu>). + |