mirror of https://github.com/svaarala/duktape.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
322 lines
12 KiB
322 lines
12 KiB
======================
|
|
Date and time handling
|
|
======================
|
|
|
|
This document describes the Duktape ``Date`` built-in implementation.
|
|
|
|
Overview
|
|
========
|
|
|
|
Ecmascript time value
|
|
---------------------
|
|
|
|
An Ecmascript time value is essentially a UNIX/Posix (UTC) time value
|
|
measured in milliseconds without fractions:
|
|
|
|
* http://www.ecma-international.org/ecma-262/5.1/#sec-15.9.1.1
|
|
* http://en.wikipedia.org/wiki/Unix_time
|
|
|
|
A time value has a simple arithmetic relationship with UTC datetime (calendar)
|
|
values; leap years are taken into account but leap seconds are not (as a side
|
|
effect, when a leap second is inserted the Ecmascript time value conceptually
|
|
jumps backwards by one second). This simple relationship allows easy, platform
|
|
independent conversion between the two representations.
|
|
|
|
The definition for a valid Ecmascript time value is very strict, and an
|
|
implementation is required to treat anything outside that range as an invalid
|
|
time value (NaN / "Invalid Date"). The valid time range is 100 million days
|
|
backwards and forwards from Jan 1, 1970. The minimum and maximum values are::
|
|
|
|
> new Date(-100e6 * 24 * 60 * 60 * 1000).toISOString()
|
|
'-271821-04-20T00:00:00.000Z'
|
|
> new Date(+100e6 * 24 * 60 * 60 * 1000).toISOString()
|
|
'+275760-09-13T00:00:00.000Z'
|
|
|
|
No fractions may be present in the internal millisecond value. Even if an
|
|
implementation maintained sub-millisecond time values, all return values
|
|
go through the internal ``TimeClip()`` algorithm which coerces the value
|
|
with ``ToInteger()``, so there is no standard way to access sub-millisecond
|
|
values.
|
|
|
|
Broken down datetime
|
|
--------------------
|
|
|
|
A datetime broken down into year, month, day-of-month, hour, minute, second,
|
|
millisecond, and weekday. Components can be read or written with setter and
|
|
getter API calls. Broken down datetime values can be UTC time or local time,
|
|
depending on the API call.
|
|
|
|
String representation: parsing and formatting
|
|
---------------------------------------------
|
|
|
|
The specification provides an ISO 8601 subset to provide a platform neutral
|
|
format for expressing date/time values as strings, and parsing them back
|
|
from strings. Platform specific issues come into play when converting
|
|
between UTC time and local time, and when formatting or parsing date values
|
|
in additional platform specific formats.
|
|
|
|
To simplify, the parsing/formatting requirements are:
|
|
|
|
* The implementation is required to parse the Ecmascript ISO 8601 subset but
|
|
may parse any other formats as well, including a larger ISO 8601 subset.
|
|
|
|
* The implementation is allowed to serialize time values into arbitrary
|
|
strings, as long as it can parse them back into matching time values.
|
|
(This is required only if milliseconds amount is zero; a reasonable
|
|
implementation will still, of course, guarantee that other components
|
|
are parsed back correctly.)
|
|
|
|
* ``toISOString()`` is required to use the Ecmascript ISO 8601 subset exactly,
|
|
and the resulting string must parse back to the same time value (again, only
|
|
technically required if milliseconds is zero). This is the platform neutral
|
|
string format which is guaranteed to work even across implementations.
|
|
|
|
Platform dependencies
|
|
=====================
|
|
|
|
Porting requirements
|
|
--------------------
|
|
|
|
The minimum requirement for porting the Date implementation to a new
|
|
platform is:
|
|
|
|
* A function to get the current (UTC) time as an Ecmascript time value,
|
|
preferably with a millisecond precision.
|
|
|
|
- In many cases the current time can be obtained directly, as is the
|
|
case with ``gettimeofday()`` for instance.
|
|
|
|
- An implementation can also get a broken down datetime for the current
|
|
UTC instant, and then use the Ecmascript timevalue conversion functions
|
|
to convert it to an Ecmascript time value. The conversion is entirely
|
|
platform neutral, because the Ecmascript time model enforces a simple
|
|
relationship between time values and calendar dates.
|
|
|
|
Without additional porting effort, string formatting and parsing will be
|
|
somewhat limited (but compliant), and the local time will always be UTC.
|
|
The following is thus very nice:
|
|
|
|
* A function to get the time offset between local time and UTC on a certain
|
|
UTC instant. The E5.1 specification has separate concepts for the local
|
|
time zone adjustment (LocalTZA) and daylight saving time adjustment
|
|
(DaylightSavingTA(t)). The Ecmascript conversion semantics, especially
|
|
with respect to handling of daylight savings, must be followed.
|
|
|
|
Finally, these are nice-to-have:
|
|
|
|
* Functions to format and parse Date values in a platform dependent manner
|
|
(in addition to the ISO 8601 format of the specification).
|
|
|
|
Platform specific formatting and parsing
|
|
----------------------------------------
|
|
|
|
The current approach to using platform specific formatting/parsing APIs is
|
|
as follows:
|
|
|
|
* The primary requirement is to provide a portable base implementation which
|
|
is as platform neutral as possible. Timestamps can be formatted in a ISO
|
|
8601-like manner, and local time can be assumed to be UTC if no timezone
|
|
and/or DST information is available.
|
|
|
|
* Platform specific local time and locale mechanisms can be used, as long as
|
|
they don't restrict the Ecmascript time range. For instance, if the valid
|
|
platform datetime range is smaller than Ecmascript's, the implementation
|
|
must either fall back to default handling if the range is exceeded, or
|
|
extrapolate in a reasonable manner.
|
|
|
|
The Ecmascript valid datetime range is huge, and may be larger than what the
|
|
underlying platform supports. This poses challenges to detect e.g. daylight
|
|
savings time reliably. For instance, if the platform has a Y2038 limit, how
|
|
does one query for daylight savings time for the year 200000?
|
|
|
|
The E5.1 specification provides explicit guidance for this; Section 15.9.1.8:
|
|
|
|
If the host environment provides functionality for determining daylight
|
|
saving time, the implementation of ECMAScript is free to map the year in
|
|
question to an equivalent year (same leap-year-ness and same starting week
|
|
day for the year) for which the host environment provides daylight saving
|
|
time information. The only restriction is that all equivalent years should
|
|
produce the same result.
|
|
|
|
However, the equivalent year mapping approach is not necessarily preferred
|
|
in the long term see e.g. the following discussion:
|
|
|
|
* https://bugzilla.mozilla.org/show_bug.cgi?id=351066
|
|
|
|
Note that using a platform specific API to get timezone offset and DST
|
|
information makes programs behave slightly differently across platforms, even
|
|
when they are running with the same locale. There's no way around this
|
|
unless the locale information needed by Duktape is provided by a portable
|
|
or pluggable provider (e.g. user callback for tzoffset/DST information).
|
|
|
|
Linux
|
|
-----
|
|
|
|
Current implementation uses:
|
|
|
|
* ``gettimeofday()``
|
|
* ``strptime()``
|
|
* ``strftime()``
|
|
|
|
APIs available for formatting datetime values:
|
|
|
|
* ``ctime_r()``
|
|
* ``asctime_r()``
|
|
* ``strftime()``
|
|
|
|
APIs available for parsing datetime values:
|
|
|
|
* ``strptime()``: quite portable, but requires an explicit format string
|
|
* ``getdate_r()``: GNU specific, more generic, but requires ``DATEMSK`` to be set
|
|
|
|
See also:
|
|
|
|
* http://www.gnu.org/software/libc/manual/html_node/Date-and-Time.html#Date-and-Time
|
|
|
|
OSX / Darwin
|
|
------------
|
|
|
|
Current implementation uses the same functions as on Linux.
|
|
|
|
Windows
|
|
-------
|
|
|
|
Current implementation uses time functions documented in:
|
|
|
|
* http://msdn.microsoft.com/en-us/library/windows/desktop/ms725473(v=vs.85).aspx
|
|
|
|
The same implementation works for WIN32 and WIN64.
|
|
|
|
See also:
|
|
|
|
* http://www.suacommunity.com/dictionary/gettimeofday-entry.php
|
|
|
|
Parsing the E5 ISO 8601 subset
|
|
==============================
|
|
|
|
E5.1 Section 15.9.1.15 describes the subset, with the following
|
|
possible parts::
|
|
|
|
YYYY T HH:mm empty
|
|
YYYY-MM HH:mm:ss Z
|
|
YYYY-MM-DD HH:mm:ss.sss +HH:mm
|
|
+YYYYYY -HH:mm
|
|
+YYYYYY-MM
|
|
+YYYYYY-MM-DD
|
|
-YYYYYY
|
|
-YYYYYY-MM ^
|
|
-YYYYYY-MM-DD |
|
|
|
|
|
| may skip time part |
|
|
`-----------------------------------'
|
|
|
|
A valid date time string may contain only a date part or both a
|
|
date and a time part, followed by an optional timezone part. A
|
|
missing timezone is interpreted the same as a 'Z'.
|
|
|
|
An implementation is allowed to parse a wider set of strings, so
|
|
an implementation can actually be made simpler by checking the input
|
|
format less rigidly. Some reasonable relaxations:
|
|
|
|
* Allow an arbitrary number of digits for any date part, including leading
|
|
zeroes. Millisecond digits after the third one can be ignored (which is
|
|
the same as truncation towards zero).
|
|
|
|
* Allow year to be signed regardless of the number of year digits.
|
|
|
|
* Allow date/time separator to be a space in addition to 'T'.
|
|
|
|
* Allow a timezone offset to be specified without colon (e.g. ``+1234``
|
|
in addition to ``+12:34``).
|
|
|
|
* Allow unnormalized components. In fact, the specification actually
|
|
requires accepting these two as equivalent: ``1995-02-04T24:00`` and
|
|
``1995-02-05T00:00``. Other unnormalized cases could be accepted too,
|
|
like ``1995-02-123T11:2345:99``.
|
|
|
|
* Allow whitespace in additional places; in particular, before and after
|
|
the string.
|
|
|
|
V8 seems to relax the rules if the date/time separator is a space but will
|
|
be strict if the separator is 'T'::
|
|
|
|
> new Date('+0001979-0001-0000002T00003:0004:00005.006123123Z').toISOString()
|
|
RangeError: Invalid time value
|
|
|
|
> new Date('+0001979-0001-0000002 00003:0004:00005.006123123Z').toISOString()
|
|
'1979-01-02T03:04:05.006Z'
|
|
|
|
> new Date(' +0001979-0001-0000002 00003:0004:00005.006123123 +01:00 ').toISOString()
|
|
'1979-01-02T02:04:05.006Z'
|
|
|
|
Some options for implementation a compact parser:
|
|
|
|
* Use an internal regexp to match the parts, then convert them to integers
|
|
(accepting leading zeroes).
|
|
|
|
* Use a set of partial ``sscanf()`` calls.
|
|
|
|
* Use a custom char-by-char parser.
|
|
|
|
With a relaxed format a custom char-by-char parser is relatively simple and
|
|
is the current implementation approach:
|
|
|
|
1. Strip the input string (remove leading and trailing whitespace).
|
|
(Currently not done.)
|
|
|
|
2. Initialize a broken down timestamp with default values. Initialize
|
|
part_index to 0. Check first character to handle year sign.
|
|
|
|
3. Parse a decimal number of 1...n digits. When it is finished, write it
|
|
to part_index.
|
|
|
|
4. Check the next character to determine what to do next: update part_index
|
|
(either by one or skip directly to "hour" part) and parse next part,
|
|
or accept/reject. The separator for timezone offset may be '+' or '-',
|
|
which needs to be recorded.
|
|
|
|
5. If accepted, subtract timezone hours and minutes from the hours and
|
|
minutes part (to convert to UTC), and then convert the (possibly
|
|
unnormalized) components into an Ecmascript time value.
|
|
|
|
The parser will produce the following "parts":
|
|
|
|
* Year, default: 1970 (actually arbitrary, because a year is always required)
|
|
* Month, default: 1
|
|
* Day-of-month, default: 1
|
|
* Hour, default: 0
|
|
* Minute, default: 0
|
|
* Second, default: 0
|
|
* Millisecond, default: 0
|
|
* Timezone hours, default: 0
|
|
* Timezone seconds, default: 0
|
|
|
|
The current implementation is a rule-driven parser based on this basic model.
|
|
|
|
Misc notes
|
|
==========
|
|
|
|
* Almost all API calls require a Date instance as the 'this' binding
|
|
(a TypeError is thrown otherwise). Exceptions are noted in the
|
|
specification; concretely, ``toJSON()``.
|
|
|
|
* The internal time value always exists for a Date instance, and is
|
|
always a number. The number value is either NaN, or a finite number
|
|
in the valid E5 range, with no millisecond fractions. The internal
|
|
component representation uses zero-based day and month, while the
|
|
Ecmascript uses one-based day and zero-based month.
|
|
|
|
* When the internal time value is broken into components, each
|
|
component will be normalized, and will fit into a 32-bit signed
|
|
integer. When using setter calls, one or more components are replaced
|
|
with unnormalized values, which will not necessarily fit into a 32-bit
|
|
signed integer, before converting back to an internal time value. The
|
|
setter values may be huge (even out of 64-bit range) without resulting
|
|
in an invalid result date, if multiple cancelling values are given
|
|
(e.g. 1e100 seconds and -1e103 milliseconds, cancelling to zero).
|
|
|
|
* Setters and getters are optimized for size, to use a single helper with a
|
|
set of flags and arguments to keep each getter and setter itself very small.
|
|
This makes them a bit cryptic; see e.g. handling of setters with optional
|
|
parameters.
|
|
|
|
|