|
|
|
--- !ditz.rubyforge.org,2008-03-06/issue
|
|
|
|
title: improve Unicode string lookup performance with an offset table
|
|
|
|
desc: |-
|
|
|
|
Unicode string performance could be improved by adding some speedup data
|
|
|
|
after the actual string data. This can be done when interning the string,
|
|
|
|
and string heaphdr flags can indicate which kind of speedup data is present.
|
|
|
|
|
|
|
|
For example, ony could add an index giving byte offsets for character
|
|
|
|
offsets 0, 256, 512, .... Support for using the index could be added to
|
|
|
|
duk_heap_strcache_offset_char2byte(), and it would be easy to make the
|
|
|
|
index feature a compile time option.
|
|
|
|
|
|
|
|
A smaller footprint index would list the number of bytes used to encode
|
|
|
|
each chunk (e.g. 256) of characters. This would not allow random access
|
|
|
|
but would still be faster than raw seeking; the index would also be smaller
|
|
|
|
than in a plain byte offset index.
|
|
|
|
|
|
|
|
It's not clear that any sort of a speedup index is actually needed, so it
|
|
|
|
would be nice to implement one as an option. The upside of an index, even
|
|
|
|
a sparse one, is that it would give a reasonable upper bound on the time
|
|
|
|
it takes to access random characters. Currently there is no upper bound:
|
|
|
|
if a string is 10 megabytes in size, it may require a straight ~5 megabyte
|
|
|
|
scan to find a character.
|
|
|
|
type: :task
|
|
|
|
component: duk
|
|
|
|
release:
|
|
|
|
reporter: sva <sami.vaarala@iki.fi>
|
|
|
|
status: :unstarted
|
|
|
|
disposition:
|
|
|
|
creation_time: 2013-03-03 09:51:16.369359 Z
|
|
|
|
references: []
|
|
|
|
|
|
|
|
id: 144bbcf65f2a145246adefbc8d071288d2e91248
|
|
|
|
log_events:
|
|
|
|
- - 2013-03-03 09:51:16.694362 Z
|
|
|
|
- sva <sami.vaarala@iki.fi>
|
|
|
|
- created
|
|
|
|
- ""
|