Squashed commit of the following:
commit 99dc21b67a
Author: Chris Angelico <rosuav@gmail.com>
Date: Thu Jun 12 02:18:54 2014 +1000
Optimize as per TODO (thanks Damien!)
commit 5bf0153eca
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 08:42:06 2014 +1000
Test a default (= UTF-8) encode and decode
commit c962057ac3
Merge: e2c9782195de32
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:23:03 2014 +1000
Merge branch 'master' into unicode, resolving conflict on py/obj.h
commit e2c9782a65
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:05:57 2014 +1000
More whitespace fixups
commit 086a2a0f57
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 05:04:20 2014 +1000
Properly implement string slicing
commit 0d339a143e
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:24:11 2014 +1000
Support slicing in str_index_to_ptr, and fix a bounds error
commit 24371c7267
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:10:22 2014 +1000
Break out index-to-pointer calculation into a function
commit 616c24ac01
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 02:03:11 2014 +1000
Add tests of string slicing, which currently fail
commit a24d19f676
Author: Chris Angelico <rosuav@gmail.com>
Date: Tue Jun 10 01:56:53 2014 +1000
Change string indexing to not precalculate the charlen, and add test for neg indexing
commit 0bcc7ab89e
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 22:09:17 2014 +1000
Clean up constant qstr declarations now that charlen isn't needed
commit 5473e1a1db
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:18:42 2014 +1000
Remove the charlen field from strings, calculating it when required
commit 5c1658ec71
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 07:11:27 2014 +1000
Get rid of mp_obj_str_get_data_len() which was used in only one place
commit a019ba968b
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:58:26 2014 +1000
Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes
commit 44b0d5cff8
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:32:44 2014 +1000
Use utf8_get/next_char in building up a string's repr
commit 30d1bad33f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 06:10:45 2014 +1000
Make utf8_get_char() and utf8_next_char() actually do what their names say
commit bc990dad9a
Author: Chris Angelico <rosuav@gmail.com>
Date: Sun Jun 8 02:10:59 2014 +1000
Revert "Add PEP 393-flags to strings and stub usage."
This reverts commit c239f50952.
commit f9bebb28ad
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:41:48 2014 +1000
Whitespace fixes
commit 279de0c8eb
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 15:28:35 2014 +1000
Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.
commit f1911f53d5
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:56:02 2014 +1000
Make chr() Unicode-aware
commit f51ad737b4
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:44:07 2014 +1000
Make a string's repr Unicode-aware
commit 01bd686846
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:33:43 2014 +1000
Expand the Unicode tests
commit 7bc91904f8
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:27:30 2014 +1000
Record byte lengths for byte strings
commit bb13212071
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 11:25:06 2014 +1000
Make ord() Unicode-aware
commit 03f0cbe905
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 10:24:35 2014 +1000
Retain characters as UTF-8 encoded Unicode
commit e924659b85
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 08:37:27 2014 +1000
Add support for \u and \U escapes, but not \N (with explanatory comment)
commit 231031ac5f
Author: Chris Angelico <rosuav@gmail.com>
Date: Sat Jun 7 05:09:35 2014 +1000
Add character length to qstr
commit 6df1b946fb
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:48:36 2014 +1000
Add test of UTF-8 encoded source file resulting in properly formed string
commit 16429b81a8
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:44:15 2014 +1000
Make len(s) return character length (even though creation's still buggy)
commit cd2cf6663c
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:36 2014 +1000
HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.
All tests pass now, though string creation is still buggy.
commit 47c234584d
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 13:15:32 2014 +1000
objstr: Record character length separately from byte length
CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too
commit b0f41c72af
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:37:36 2014 +1000
Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way
commit 89452be641
Author: Chris Angelico <rosuav@gmail.com>
Date: Fri Jun 6 05:28:47 2014 +1000
Update comments - now aiming for UTF-8 rather than PEP 393 strings
commit c239f50952
Author: Chris Angelico <rosuav@gmail.com>
Date: Wed Jun 4 05:28:12 2014 +1000
Add PEP 393-flags to strings and stub usage.
The test suite all passes, but nothing has actually been changed.