You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

323 lines
12 KiB

======================
Date and time handling
======================
This document describes the Duktape ``Date`` built-in implementation.
Overview
========
Ecmascript time value
---------------------
An Ecmascript time value is essentially a UNIX/Posix (UTC) time value
measured in milliseconds without fractions:
* http://www.ecma-international.org/ecma-262/5.1/#sec-15.9.1.1
* http://en.wikipedia.org/wiki/Unix_time
A time value has a simple arithmetic relationship with UTC datetime (calendar)
values; leap years are taken into account but leap seconds are not (as a side
effect, when a leap second is inserted the Ecmascript time value conceptually
jumps backwards by one second). This simple relationship allows easy, platform
independent conversion between the two representations.
The definition for a valid Ecmascript time value is very strict, and an
implementation is required to treat anything outside that range as an invalid
time value (NaN / "Invalid Date"). The valid time range is 100 million days
backwards and forwards from Jan 1, 1970. The minimum and maximum values are::
> new Date(-100e6 * 24 * 60 * 60 * 1000).toISOString()
'-271821-04-20T00:00:00.000Z'
> new Date(+100e6 * 24 * 60 * 60 * 1000).toISOString()
'+275760-09-13T00:00:00.000Z'
No fractions may be present in the internal millisecond value. Even if an
implementation maintained sub-millisecond time values, all return values
go through the internal ``TimeClip()`` algorithm which coerces the value
with ``ToInteger()``, so there is no standard way to access sub-millisecond
values.
Broken down datetime
--------------------
A datetime broken down into year, month, day-of-month, hour, minute, second,
millisecond, and weekday. Components can be read or written with setter and
getter API calls. Broken down datetime values can be UTC time or local time,
depending on the API call.
String representation: parsing and formatting
---------------------------------------------
The specification provides an ISO 8601 subset to provide a platform neutral
format for expressing date/time values as strings, and parsing them back
from strings. Platform specific issues come into play when converting
between UTC time and local time, and when formatting or parsing date values
in additional platform specific formats.
To simplify, the parsing/formatting requirements are:
* The implementation is required to parse the Ecmascript ISO 8601 subset but
may parse any other formats as well, including a larger ISO 8601 subset.
* The implementation is allowed to serialize time values into arbitrary
strings, as long as it can parse them back into matching time values.
(This is required only if milliseconds amount is zero; a reasonable
implementation will still, of course, guarantee that other components
are parsed back correctly.)
* ``toISOString()`` is required to use the Ecmascript ISO 8601 subset exactly,
and the resulting string must parse back to the same time value (again, only
technically required if milliseconds is zero). This is the platform neutral
string format which is guaranteed to work even across implementations.
Platform dependencies
=====================
Porting requirements
--------------------
The minimum requirement for porting the Date implementation to a new
platform is:
* A function to get the current (UTC) time as an Ecmascript time value,
preferably with a millisecond precision.
- In many cases the current time can be obtained directly, as is the
case with ``gettimeofday()`` for instance.
- An implementation can also get a broken down datetime for the current
UTC instant, and then use the Ecmascript timevalue conversion functions
to convert it to an Ecmascript time value. The conversion is entirely
platform neutral, because the Ecmascript time model enforces a simple
relationship between time values and calendar dates.
Without additional porting effort, string formatting and parsing will be
somewhat limited (but compliant), and the local time will always be UTC.
The following is thus very nice:
* A function to get the time offset between local time and UTC on a certain
UTC instant. The E5.1 specification has separate concepts for the local
time zone adjustment (LocalTZA) and daylight saving time adjustment
(DaylightSavingTA(t)). The Ecmascript conversion semantics, especially
with respect to handling of daylight savings, must be followed.
Finally, these are nice-to-have:
* Functions to format and parse Date values in a platform dependent manner
(in addition to the ISO 8601 format of the specification).
Platform specific formatting and parsing
----------------------------------------
The current approach to using platform specific formatting/parsing APIs is
as follows:
* The primary requirement is to provide a portable base implementation which
is as platform neutral as possible. Timestamps can be formatted in a ISO
8601-like manner, and local time can be assumed to be UTC if no timezone
and/or DST information is available.
* Platform specific local time and locale mechanisms can be used, as long as
they don't restrict the Ecmascript time range. For instance, if the valid
platform datetime range is smaller than Ecmascript's, the implementation
must either fall back to default handling if the range is exceeded, or
extrapolate in a reasonable manner.
The Ecmascript valid datetime range is huge, and may be larger than what the
underlying platform supports. This poses challenges to detect e.g. daylight
savings time reliably. For instance, if the platform has a Y2038 limit, how
does one query for daylight savings time for the year 200000?
The E5.1 specification provides explicit guidance for this; Section 15.9.1.8:
If the host environment provides functionality for determining daylight
saving time, the implementation of ECMAScript is free to map the year in
question to an equivalent year (same leap-year-ness and same starting week
day for the year) for which the host environment provides daylight saving
time information. The only restriction is that all equivalent years should
produce the same result.
However, the equivalent year mapping approach is not necessarily preferred
in the long term see e.g. the following discussion:
* https://bugzilla.mozilla.org/show_bug.cgi?id=351066
Note that using a platform specific API to get timezone offset and DST
information makes programs behave slightly differently across platforms, even
when they are running with the same locale. There's no way around this
unless the locale information needed by Duktape is provided by a portable
or pluggable provider (e.g. user callback for tzoffset/DST information).
Linux
-----
Current implementation uses:
* ``gettimeofday()``
* ``strptime()``
* ``strftime()``
APIs available for formatting datetime values:
* ``ctime_r()``
* ``asctime_r()``
* ``strftime()``
APIs available for parsing datetime values:
* ``strptime()``: quite portable, but requires an explicit format string
* ``getdate_r()``: GNU specific, more generic, but requires ``DATEMSK`` to be set
See also:
* http://www.gnu.org/software/libc/manual/html_node/Date-and-Time.html#Date-and-Time
OSX / Darwin
------------
Current implementation uses the same functions as on Linux.
Windows
-------
Current implementation uses time functions documented in:
* http://msdn.microsoft.com/en-us/library/windows/desktop/ms725473(v=vs.85).aspx
The same implementation works for WIN32 and WIN64.
See also:
* http://www.suacommunity.com/dictionary/gettimeofday-entry.php
Parsing the E5 ISO 8601 subset
==============================
E5.1 Section 15.9.1.15 describes the subset, with the following
possible parts::
YYYY T HH:mm empty
YYYY-MM HH:mm:ss Z
YYYY-MM-DD HH:mm:ss.sss +HH:mm
+YYYYYY -HH:mm
+YYYYYY-MM
+YYYYYY-MM-DD
-YYYYYY
-YYYYYY-MM ^
-YYYYYY-MM-DD |
|
| may skip time part |
`-----------------------------------'
A valid date time string may contain only a date part or both a
date and a time part, followed by an optional timezone part. A
missing timezone is interpreted the same as a 'Z'.
An implementation is allowed to parse a wider set of strings, so
an implementation can actually be made simpler by checking the input
format less rigidly. Some reasonable relaxations:
* Allow an arbitrary number of digits for any date part, including leading
zeroes. Millisecond digits after the third one can be ignored (which is
the same as truncation towards zero).
* Allow year to be signed regardless of the number of year digits.
* Allow date/time separator to be a space in addition to 'T'.
* Allow a timezone offset to be specified without colon (e.g. ``+1234``
in addition to ``+12:34``).
* Allow unnormalized components. In fact, the specification actually
requires accepting these two as equivalent: ``1995-02-04T24:00`` and
``1995-02-05T00:00``. Other unnormalized cases could be accepted too,
like ``1995-02-123T11:2345:99``.
* Allow whitespace in additional places; in particular, before and after
the string.
V8 seems to relax the rules if the date/time separator is a space but will
be strict if the separator is 'T'::
> new Date('+0001979-0001-0000002T00003:0004:00005.006123123Z').toISOString()
RangeError: Invalid time value
> new Date('+0001979-0001-0000002 00003:0004:00005.006123123Z').toISOString()
'1979-01-02T03:04:05.006Z'
> new Date(' +0001979-0001-0000002 00003:0004:00005.006123123 +01:00 ').toISOString()
'1979-01-02T02:04:05.006Z'
Some options for implementation a compact parser:
* Use an internal regexp to match the parts, then convert them to integers
(accepting leading zeroes).
* Use a set of partial ``sscanf()`` calls.
* Use a custom char-by-char parser.
With a relaxed format a custom char-by-char parser is relatively simple and
is the current implementation approach:
1. Strip the input string (remove leading and trailing whitespace).
(Currently not done.)
2. Initialize a broken down timestamp with default values. Initialize
part_index to 0. Check first character to handle year sign.
3. Parse a decimal number of 1...n digits. When it is finished, write it
to part_index.
4. Check the next character to determine what to do next: update part_index
(either by one or skip directly to "hour" part) and parse next part,
or accept/reject. The separator for timezone offset may be '+' or '-',
which needs to be recorded.
5. If accepted, subtract timezone hours and minutes from the hours and
minutes part (to convert to UTC), and then convert the (possibly
unnormalized) components into an Ecmascript time value.
The parser will produce the following "parts":
* Year, default: 1970 (actually arbitrary, because a year is always required)
* Month, default: 1
* Day-of-month, default: 1
* Hour, default: 0
* Minute, default: 0
* Second, default: 0
* Millisecond, default: 0
* Timezone hours, default: 0
* Timezone seconds, default: 0
The current implementation is a rule-driven parser based on this basic model.
Misc notes
==========
* Almost all API calls require a Date instance as the 'this' binding
(a TypeError is thrown otherwise). Exceptions are noted in the
specification; concretely, ``toJSON()``.
* The internal time value always exists for a Date instance, and is
always a number. The number value is either NaN, or a finite number
in the valid E5 range, with no millisecond fractions. The internal
component representation uses zero-based day and month, while the
Ecmascript uses one-based day and zero-based month.
* When the internal time value is broken into components, each
component will be normalized, and will fit into a 32-bit signed
integer. When using setter calls, one or more components are replaced
with unnormalized values, which will not necessarily fit into a 32-bit
signed integer, before converting back to an internal time value. The
setter values may be huge (even out of 64-bit range) without resulting
in an invalid result date, if multiple cancelling values are given
(e.g. 1e100 seconds and -1e103 milliseconds, cancelling to zero).
* Setters and getters are optimized for size, to use a single helper with a
set of flags and arguments to keep each getter and setter itself very small.
This makes them a bit cryptic; see e.g. handling of setters with optional
parameters.