--- !ditz.rubyforge.org,2008-03-06/issue 
title: improve Unicode string lookup performance with an offset table
desc: |-
  Unicode string performance could be improved by adding some speedup data
  after the actual string data.  This can be done when interning the string,
  and string heaphdr flags can indicate which kind of speedup data is present.
  
  For example, ony could add an index giving byte offsets for character
  offsets 0, 256, 512, ....  Support for using the index could be added to
  duk_heap_strcache_offset_char2byte(), and it would be easy to make the
  index feature a compile time option.
  
  A smaller footprint index would list the number of bytes used to encode
  each chunk (e.g. 256) of characters.  This would not allow random access
  but would still be faster than raw seeking; the index would also be smaller
  than in a plain byte offset index.
  
  It's not clear that any sort of a speedup index is actually needed, so it
  would be nice to implement one as an option.  The upside of an index, even
  a sparse one, is that it would give a reasonable upper bound on the time
  it takes to access random characters.  Currently there is no upper bound:
  if a string is 10 megabytes in size, it may require a straight ~5 megabyte
  scan to find a character.
type: :task
component: duk
release: 
reporter: sva <sami.vaarala@poplatek.fi>
status: :unstarted
disposition: 
creation_time: 2013-03-03 09:51:16.369359 Z
references: []

id: 144bbcf65f2a145246adefbc8d071288d2e91248
log_events: 
- - 2013-03-03 09:51:16.694362 Z
  - sva <sami.vaarala@poplatek.fi>
  - created
  - ""