You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

641 lines
20 KiB

===========
Code issues
===========
This document covers C coding issues related to Duktape implementation
such as:
* Conventions
* Portability concerns
* Specific platforms and compilers
* Size and performance optimization issues
Conventions
===========
Indentantion, naming, etc
-------------------------
Indent with tab. On continuation lines indent with tab to shared indent
depth and then indent with spaces. For example, denoting tab indent with
colon and space indent with period::
::::::::snprintf(buf,
::::::::.........sizeof(buf),
::::::::........."%d",
::::::::.........123);
Names are lowercase, underscore separated::
void duk_func(void) {
/* ... */
}
Macros are uppercase, underscore separated::
#define DUK_MACRO(x) /* ... */
Macro names must not begin with an underscore. Macros which are of local
interest only can have a local name or have a double underscore after "DUK"::
/* 'foo' alternatives, not to be used directly */
#define DUK__FOO_ALT1 /* ... */
#define DUK__FOO_ALT2 /* ... */
/* select DUK_FOO provider */
#define DUK_FOO DUK_FOO_ALT2
There is only one space after a ``#define``, ``#ifdef``, etc, but there
may be multiple spaces between the a macro name and its definition. There
is no strict rule on the alignment of a macro value; successive definitions
usually keep values in the same column.
Comments are always traditional C comments, never ``//``::
/* always used traditional C comments */
Opening brace on the same line as the start of the construct, even
for functions::
void func(int x) {
if (x) {
/* ... */
} else {
/* ... */
}
}
The case-statements of a switch are at the same level as the switch
to reduce indent. If case clauses have their own blocks, this leads
to a confusing closing brace, so a comment for that may be in order::
switch (x) {
case A: {
/* ... */
break;
}
case B: {
/* ... */
break;
}
default: {
}
} /* switch */
Space after ``if``, ``switch``, etc::
if (x) { ... } /* correct */
if(x) { ... } /* incorrect */
switch (x) { ... } /* correct */
switch(x) { ... } /* incorrect */
Use of goto for error cleanup and shared error handling is not only
allowed but encouraged.
No naked statements in e.g. ``if-then-else``, always use a block.
This is more macro compatible. Example::
if (x) {
return 1; /* correct */
}
if (x)
return 1; /* incorrect */
Multi-statement macros should use a ``do-while(0)`` construct::
#define FROBNICATE(x,y) do { \
x = x * x; \
y = y * y; \
} while (0)
Use parentheses when referring to macro arguments and the final macro
result to minimize error proneness::
#define MULTIPLY(a,b) ((a)*(b))
/* Now MULTIPLY(1+2,3) expands to ((1+2)*(3)) == 9, not
* 1+2*3 == 7. Parentheses are used around macro result for
* similar reasons.
*/
Variable declarations
---------------------
C variables should only be declared in the beginning of the block. Although
this is usually not a portability concern but some older still compilers
require it. In particular, MSVC (at least Visual Studio 2010 Express) seems
to require this.
Be careful especially of assertions, debug prints, and other macros::
int x, y;
DUK_UNREF(y);
int flags = 0; /* problem: DUK_UNREF() */
Include guards
--------------
There are several popular include guard conventions. Leading underscores
are reserved and should be avoided in user code. The current include guard
convention is::
/* duk_foo.h */
#ifndef DUK_FOO_H_INCLUDED
#define DUK_FOO_H_INCLUDED
...
#endif /* DUK_FOO_H_INCLUDED */
See:
* http://en.wikipedia.org/wiki/Include_guard
``#pragma once`` is not portable, and is not used.
FIXME, TODO, XXX, NOTE markers
------------------------------
The following markers are used inside comments:
FIXME:
Issue should be fixed before a stable release. Does not block
an intermediate release.
TODO:
Issue should be fixed but does not block a release (even a stable
one).
XXX:
Like TODO, but it may be unclear what the proper fix is.
NOTE:
Noteworthy issue important for e.g. maintenance, but no action needed.
The markers must appear verbatim and be followed by a colon without
any space in between. This is important so that the markers can be
grep'd. Example::
/* FIXME: foo should have a different type */
Unused variables
----------------
Suppressing unused variable warnings use the following macro::
DUK_UNREF(my_unused_var);
Internally, this currently uses the form::
(void) my_unused_var; /* suppress warning */
This seems to work with both GCC and Clang. The form::
my_unused_var = my_unused_var; /* suppress warning */
works with GCC but not with Clang.
Unreachable code and "noreturn" functions
-----------------------------------------
Noreturn functions must have a void return type and are declared as::
DUK_NORETURN(void myfunc(void));
The macro style is awkward but is not easy to implement in another way.
Unreachable points in code are declared as::
DUK_UNREACHABLE();
Likely/unlikely comparisons
---------------------------
Providing "branch hints" may provide benefits on some platforms but not on
others. ``DUK_LIKELY()`` and ``DUK_UNLIKELY()`` can always be used in code,
and will be defined as a no-op if using branch hints on the target platform
is not possible or useful.
``DUK_UNLIKELY()`` should be used at least for conditions which are almost
never true, like invalid API call arguments, string size overflows, etc::
if (DUK_UNLIKELY(ptr == NULL)) {
/* ... */
}
Similarly, ``DUK_LIKELY()`` should be used for conditions which are almost
always true::
if (DUK_LIKELY(ptr != NULL)) {
/* ... */
}
The argument to these macros must be an integer::
/* correct */
if (DUK_LIKELY(ptr != NULL)) {
/* ... */
}
/* incorrect */
if (DUK_LIKELY(ptr)) {
/* ... */
}
C++ compatibility
-----------------
The source code is meant to be C++ compatible so that you can both:
1. Compile Duktape with C but use it from C++.
2. Compile Duktape with C++ and use it from C++.
To achieve this:
* Avoid variable names conflicting with C++ keywords (``throw``,
``class``, ``this``, etc).
* Use explicit casts for all pointer conversions.
(Compiling Duktape as a C++ program doesn't currently work fully due to
feature detection issues.)
Portability concerns
====================
Strict aliasing rules
---------------------
Strict aliasing rules and prohibition of dereferencing type-punned pointers
are good for portability so the implementation should adhere to the common
rules, e.g. use a union to convert between types. Sometimes this is not
straightforward. For instance, the indirect realloc approach currently in
use needs a getter callback to avoid type-punning.
Current goal is to compile and work without warnings even with strict
aliasing rules enforced.
Numeric types
-------------
C data types, especially integer types, are a bit of a hassle: the best choice
of types depends on the platform and the compiler, and also the C specification
version. Types also affect e.g. printf() and scanf() format specifiers which
are, of course, potentially compiler specific. To remain portable, (almost)
all C types are wrapped behind a typedef. Because both Duktape internals and
the public ``duktape.h`` header need type wrappers, the current approach is that
``duktape.h`` performs whatever feature detection is necessary to define types
needed in the public API. ``duk_features.h`` then completes the process for the
remaining internal types.
Basic rules in implementation:
* ``duktape.h`` and ``duk_features.h`` perform all the detection needed and
provide typedefs for types used in the public API and inside Duktape.
* C99 types are **not** used directly, wrapper types are used instead. For
instance, use ``duk_uint32_t`` instead of ``uint32_t``. Wrapper types are
used because we don't want to rely on C99 types or define them if they are
missing.
* Only use ``duk_XXX_t`` typedefs for integer types unless there is a special
reason not to. For instance, if a platform API requires a specific type,
that type must of course be used (or casted to).
* Use ``duk_size_t`` for internal uses of ``size_t``. Coerce it explicitly
to ``size_t`` for library API calls.
* Use ``duk_double_t`` for IEEE double precision float. This is slight
paranoia but may be handy if e.g. built-in soft float library is introduced.
* The ``void`` type is used as is, cannot imagine a reason why it would need
to be reassigned for portability.
* Use ``duk_int_t`` as an ``int`` replacement; it behaves like an ``int`` but,
unlike ``int``, is guaranteed to be at least 32 bits wide. Similarly
``duk_uint_t`` should be used as an ``unsigned int`` replacement.
* The ``duk_small_int_t`` should be used in internal code e.g. for flags.
It is guaranteed to be 16 bits or more. Similarly ``duk_small_uint_t``.
It's also used for boolean values.
* ``duk_uint8_t`` should be used as a replacement for ``unsigned char`` and
often for ``char`` too. Since ``char`` may be signed, it is often a
problematic type when comparing ranges, indexing lookup tables, etc, so
a ``char`` or a ``signed char`` is often not the best type. Note that
proper string comparison of UTF-8 character strings, for instance, relies
on unsigned byte comparisons.
* Integer constants should generally use ``L`` or ``UL`` suffix, i.e.
makes them ``long int`` or ``unsigned long int``, and they are
guaranteed to be 32 bits or more. Without a suffix integer constants
may be only 16 bits. 64-bit constants need ``LL`` or ``ULL`` suffix.
Small constants (16 bits or less) don't need a suffix and are still
portable. This is convenient for codepoint constants and such.
* Integer constant sign should match the type the constant is related to.
For instance, ``duk_codepoint_t`` is a signed type, so a signed constant
should be used. This is more than a style issue: suppose signed codepoint
``cp`` had value ``-1``. The comparison ``(cp < 0x7fL)`` is true while
the comparison ``(cp < 0x7fUL)`` is false because of C coercion rules.
* **FIXME:** Format specifiers are under work.
* **FIXME:** normal vs. fast variables: use tight values in structs,
"fast" values as e.g. loop counters in fast paths (character / byte
iteration loops etc)
* **FIXME**: flags field type (storage vs. internal APIs)
* **FIXME**: avoid casting when unnecessary
* **FIXME**: type names, end in _t for internal types, no _t for public API
Random notes:
* The public API uses types at least for these (example type in parentheses):
- allocation size, entry count, etc (size_t)
- Unicode codepoint (int_fast32_t)
- value stack index (int_fast32_t, ssize_t)
- value stack count (uint_fast32_t, size_t)
- flag field (uint_fast32_t)
- boolean flag (int)
- Ecmascript array index (uint_fast32_t)
- Ecmascript number (double)
- Void and char pointers; easy, but sign of ``char`` varies:
explicit use of ``unsigned char`` is more portable but
more verbose
* A large amount of code needs an integer type which is fastest on the platform
but still guaranteed to be 32 bits or more. The ``int`` type is NOT a good
choice because it may be 16 bits even on platforms with a 32-bit type and
even 32-bit registers (e.g. PureC on M68K). The ``long`` type is also not a
good choice as it may be too wide (e.g. GCC on x86-64, int is 32 bits while
long is 64 bits). For this use, there are two typedefs: ``duk_int_t`` and
``duk_uint_t``. For small integers, like flags, there are typedefs
``duk_small_int_t`` and ``duk_small_uint_t`` which explicitly indicate that
a small integer (16 bits or more) suffices.
* Exact 32-bit types are needed in some cases e.g. for Ecmascript semantics.
Also, 64-bit arithmetic emulation (implemented on 32 bit types) relies on
exact unsigned overflows / underflows. The wrapped C99 types are used in
these cases.
Alignment
---------
Platforms vary in their alignment requirements:
* Some platforms cause an error ("bus error") when alignment requirements
are violated. Such platforms may have unaligned access instructions but
unaligned accesses may need to be flagged to the compiler.
* Some platforms have slower unaligned accesses but which behave externally
just like aligned accesses. "Slower" may mean that an interrupt / trap
handler is invoked, at a considerable penalty.
* Some platforms support aligned and unaligned accesses with more or less
the same performance.
Alignment level may also vary, e.g. platform may require 4-byte alignment
for both 32-bit integers and IEEE doubles, or it may require 4-byte alignment
for 32-bit integers but 8-byte alignment for doubles, etc.
**FIXME: alignment is now guaranteed to 4 bytes on platforms where unaligned
accesses are not allowed/preferable.**
64-bit arithmetic
-----------------
Some compilers on 32-bit platforms may have 64-bit arithmetic problems
(this seems to be the case with VBCC for example). There are also older
compiles with no 64-bit support at all.
Duktape must compile with only 32-bit operations if necessary, so
replacements are needed in the few places where 32 bits are not enough.
Integer overflows
-----------------
Signed integer overflows are undefined behavior:
* https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure+that+operations+on+signed+integers+do+not+result+in+overflow?showComments=false
At least unsigned overflow handling is important, as it is needed to make
"add with carry" etc convenient.
Detecting overflow in simple addition is straightforward when unsigned
integer type bit size is exact::
duk_uint32_t x, y, z;
/* ... */
z = x + y;
if (z < x) {
/* Overflow: (z < x) or equivalently (z < y) cannot be true unless
* overflow occurs. This relies on unsigned overflow behavior and
* an exact bit size for the type.
*/
}
Detecting overflow in multiplication is a bit trickier. This comes up
e.g. in array join/concat helper when it computes the combined size of
separators (separator_size times separator_count). The check is easy
if a larger type is available::
duk_uint32_t x, y, z;
duk_uint64_t t;
t = (duk_uint64_t) x * (duk_uint64_t) y;
if (t >= (duk_uint64_t) LIMIT) {
/* Overflow. */
}
z = (duk_uint32_t) t;
However, for portability a 64-bit type cannot (for instance) be assumed.
The following approach works without a larger temporary type, but is
conservative and may indicate overflow even when one wouldn't occur::
/*
* Basic idea:
*
* x * y > limit // limit is e.g. 2**32-1
* <=> x > limit / y // y != 0
* <=> y > limit / x // equivalent, x != 0
*
* When a truncating division is used on the right size, the result
* is no longer equivalent:
*
* x > floor(limit / y) <== x > limit / y // not ==>
*
* Limit must fit into the integer type.
*/
duk_uint32_t x, y, z;
if (y != 0 && x > (duk_uint32_t) 0xffffffffU / y) {
/* Probable overflow. */
}
z = x * y;
For 32-bit types the check is actually exact, see test in::
misc/c_overflow_test.py
String handling
---------------
snprintf buffer size
::::::::::::::::::::
NUL terminator behavior for snprintf() (and its friends) is inconsistent
across implementations. Some ensure a NUL terminator added when truncated
(unless of course the buffer size is zero) while others do not.
The most portable way seems to be to::
char buf[256];
snprintf(buf, sizeof(buf), "format", args);
buf[sizeof(buf) - 1] = (char) 0;
Using sizeof(buf) - 1 for size may cause a NUL terminator to appear at
the second to last character of buf in some implementations.
Examples of snprintf() calls which don't NUL terminate on truncation:
* Windows ``_snprintf()``: http://msdn.microsoft.com/en-us/library/2ts7cx93.aspx
s(n)printf %s and NULL value
::::::::::::::::::::::::::::
Giving a NULL argument to ``%s`` format string may cause a segfault in some
old compilers. Avoid NULL values for ``%s``.
Use of sprintf vs. snprintf
:::::::::::::::::::::::::::
Use snprintf instead of sprintf by default, even when legal output size is
known beforehand. There can always be bugs in the underlying standard library
implementation. Sometimes the output size is known to be limited because
input values are known to be constrained (e.g. year values are kept between
[-999999,999999]). However, if there is a bug, it's better to corrupt a
printed output value than to cause a memory error.
Other considerations
====================
Const qualifiers for tables
---------------------------
Using ``const`` for tables allows tables to compiled into the text section.
This is important on some embedded platforms where RAM is tight but there
is more space for code and fixed data.
Feature defines
===============
Almost all feature detection is concentrated into ``duk_features.h`` which
considers inputs from various sources:
* ``DUK_OPT_xxx`` defines, which allow a user to request a specific feature
or provide a specific value (such as traceback depth)
* Compiler and platform specific defines and features
As a result, ``duk_features.h`` defines ``DUK_USE_xxx`` macros which enable
and disable specific features and provide parameter values (such as traceback
depth). These are the **only** feature defines which should be used in
internal Duktape code.
The only exception so far is ``DUK_PANIC_HANDLER()`` in ``duk_error.h`` which
can be directly overridden by the user if necessary.
This basic approach is complicated a bit by the fact that ``duktape.h`` must
do some minimal platform feature detection to ensure that the public API uses
the correct types, etc. These are coordinated with ``duk_features.h``;
``duk_features.h`` either uses whatever ``duktape.h`` ended up using, or does
its own checking and ensures the two are consistent.
When adding specific hacks and workarounds which might not be of interest
to all users, add a ``DUK_OPT_xxx`` flag for them and translate it to a
``DUK_USE_xxx`` flag in ``duk_features.h``. If the ``DUK_OPT_xxx`` flag
is absent, the custom behavior MUST NOT be enabled.
Platforms and compilers
=======================
VBCC
----
Even in C99 mode VBCC 0.9b:
* Does not have ``inttypes.h``.
* Does not have ``fpclassify()`` and friends.
* Does not have ``NAN`` or ``INFINITY``.
* The expression ``0.0 / 0.0`` causes a warning and results in ``0.0``
instead of ``NaN`` as expected.
* The expression ``1.0 / 0.0`` causes a warning and results in ``0.0``
instead of infinity as expected.
The following program demonstrates the NaN issue::
#include <stdio.h>
void main(void) {
double z = 0.0;
double t;
volatile union {
double d;
unsigned char b[8];
} u;
int i;
/* this results in 0.0 */
t = 0.0 / 0.0;
printf("result: %lf\n", t);
/* this results in NaN */
t = z / z;
printf("result: %lf\n", t);
u.d = t;
for (i = 0; i < 8; i++) {
printf("%02x\n", u.b[i]);
}
}
To work with compiler optimization, the above approach needs to have the
``double`` values in ``volatile`` variables. Otherwise VBCC will end up
replacing the result with zero. So something like this is probably safest::
volatile double a = 0.0;
volatile double b = 0.0;
double t = a / b; /* -> NaN */
tcc
---
Tcc has trouble with negative zeroes. See ``misc/tcc_zerosign1.c``. For
instance:
* Assign d = 0.0
* Assign d = -d
* Now d should be a negative zero, but in tcc (with default options) it
has not changed sign: the memory dump verified this, signbit() returns
zero, etc.
This happens at least in tcc versions 0.9.25, 0.9.26.
clang
-----
Clang has some issues with union aliasing. See ``misc/clang_aliasing.c``.