Squashed commit of the following:
commit 99dc21b67a
Author: Chris Angelico <rosuav@gmail.com>
Date: Thu Jun 12 02:18:54 2014 +1000
Optimize as per TODO (thanks Damien!)
commit 5bf0153eca
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 08:42:06 2014 +1000
Test a default (= UTF-8) encode and decode
commit c962057ac3
Merge: e2c9782195de32
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:23:03 2014 +1000
Merge branch 'master' into unicode, resolving conflict on py/obj.h
commit e2c9782a65
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:05:57 2014 +1000
More whitespace fixups
commit 086a2a0f57
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:04:20 2014 +1000
Properly implement string slicing
commit 0d339a143e
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:24:11 2014 +1000
Support slicing in str_index_to_ptr, and fix a bounds error
commit 24371c7267
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:10:22 2014 +1000
Break out index-to-pointer calculation into a function
commit 616c24ac01
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:03:11 2014 +1000
Add tests of string slicing, which currently fail
commit a24d19f676
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 01:56:53 2014 +1000
Change string indexing to not precalculate the charlen, and add test for neg indexing
commit 0bcc7ab89e
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 22:09:17 2014 +1000
Clean up constant qstr declarations now that charlen isn't needed
commit 5473e1a1db
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:18:42 2014 +1000
Remove the charlen field from strings, calculating it when required
commit 5c1658ec71
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:11:27 2014 +1000
Get rid of mp_obj_str_get_data_len() which was used in only one place
commit a019ba968b
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:58:26 2014 +1000
Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes
commit 44b0d5cff8
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:32:44 2014 +1000
Use utf8_get/next_char in building up a string's repr
commit 30d1bad33f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:10:45 2014 +1000
Make utf8_get_char() and utf8_next_char() actually do what their names say
commit bc990dad9a
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 02:10:59 2014 +1000
Revert "Add PEP 393-flags to strings and stub usage."
This reverts commit c239f50952.
commit f9bebb28ad
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:41:48 2014 +1000
Whitespace fixes
commit 279de0c8eb
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:28:35 2014 +1000
Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.
commit f1911f53d5
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:56:02 2014 +1000
Make chr() Unicode-aware
commit f51ad737b4
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:44:07 2014 +1000
Make a string's repr Unicode-aware
commit 01bd686846
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:33:43 2014 +1000
Expand the Unicode tests
commit 7bc91904f8
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:27:30 2014 +1000
Record byte lengths for byte strings
commit bb13212071
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:25:06 2014 +1000
Make ord() Unicode-aware
commit 03f0cbe905
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 10:24:35 2014 +1000
Retain characters as UTF-8 encoded Unicode
commit e924659b85
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 08:37:27 2014 +1000
Add support for \u and \U escapes, but not \N (with explanatory comment)
commit 231031ac5f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 05:09:35 2014 +1000
Add character length to qstr
commit 6df1b946fb
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:48:36 2014 +1000
Add test of UTF-8 encoded source file resulting in properly formed string
commit 16429b81a8
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:44:15 2014 +1000
Make len(s) return character length (even though creation's still buggy)
commit cd2cf6663c
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:36 2014 +1000
HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.
All tests pass now, though string creation is still buggy.
commit 47c234584d
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:32 2014 +1000
objstr: Record character length separately from byte length
CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too
commit b0f41c72af
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:37:36 2014 +1000
Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way
commit 89452be641
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:28:47 2014 +1000
Update comments - now aiming for UTF-8 rather than PEP 393 strings
commit c239f50952
Author: Chris Angelico <rosuav@gmail.com>
Date: Wed Jun 4 05:28:12 2014 +1000
Add PEP 393-flags to strings and stub usage.
The test suite all passes, but nothing has actually been changed.
Such mechanism is important to get stable Python functioning, because Python
function calling is handled with C stack. The idea is to sprinkle
STACK_CHECK() calls in places where there can be C recursion.
TODO: Add more STACK_CHECK()'s.
Expected to be set on command line, with the idea being that for different
targets, there're different smartass ABIs which strive to put unneeded
sections into executables, etc., so let people have flexible way to
strip that.
The option name is similar to previously introduced CLFAGS_EXTRA &
LDFLAGS_EXTRA.
The idea is that it should be possible to pass any additional params for
experimentation without need to patch sources (and without need to deviate
from or repeat baseline options).
Some people want to enable even more warnings. Let them do it without putting
burden on everyone. Some people vice versa think that current settings should
be relaxed. In this regard, -Werror is the most problematic, it disallows to
use #warning directive, and disallows to pass configuration settings on make
command lines. Again, until decided how to deal with these globally, allow to
work around these problems locally.
Both "bound" (like, length known) and "unbound" (length unknown) are tested.
All of list, tuple, bytes, bytesarray offer approximately the same
performance, with "unbound" case being 30 times slower.