micropython/py/makeqstrdata.py

"""
Process raw qstr file and output qstr data with length, hash and data bytes.

This script works with Python 2.6, 2.7, 3.3 and 3.4.
"""

from __future__ import print_function

import re
import sys

# codepoint2name is different in Python 2 to Python 3
import platform
if platform.python_version_tuple()[0] == '2':
    from htmlentitydefs import codepoint2name
elif platform.python_version_tuple()[0] == '3':
    from html.entities import codepoint2name
codepoint2name[ord('-')] = 'hyphen';

# add some custom names to map characters that aren't in HTML
codepoint2name[ord(' ')] = 'space'
codepoint2name[ord('\'')] = 'squot'
codepoint2name[ord(',')] = 'comma'
codepoint2name[ord('.')] = 'dot'
codepoint2name[ord(':')] = 'colon'
codepoint2name[ord('/')] = 'slash'
codepoint2name[ord('%')] = 'percent'
codepoint2name[ord('#')] = 'hash'
codepoint2name[ord('(')] = 'paren_open'
codepoint2name[ord(')')] = 'paren_close'
codepoint2name[ord('[')] = 'bracket_open'
codepoint2name[ord(']')] = 'bracket_close'
codepoint2name[ord('{')] = 'brace_open'
codepoint2name[ord('}')] = 'brace_close'
codepoint2name[ord('*')] = 'star'
codepoint2name[ord('!')] = 'bang'
codepoint2name[ord('\\')] = 'backslash'

# this must match the equivalent function in qstr.c
def compute_hash(qstr):
    hash = 5381
    for char in qstr:
        hash = (hash * 33) ^ ord(char)
    # Make sure that valid hash is never zero, zero means "hash not computed"
    return (hash & 0xffff) or 1

def do_work(infiles):
    # read the qstrs in from the input files
    qcfgs = {}
    qstrs = {}
    for infile in infiles:
        with open(infile, 'rt') as f:
            for line in f:
                line = line.strip()

                # is this a config line?
                match = re.match(r'^QCFG\((.+), (.+)\)', line)
                if match:
                    value = match.group(2)
                    if value[0] == '(' and value[-1] == ')':
                        # strip parenthesis from config value
                        value = value[1:-1]
                    qcfgs[match.group(1)] = value
                    continue

                # is this a QSTR line?
                match = re.match(r'^Q\((.*)\)$', line)
                if not match:
                    continue

                # get the qstr value
                qstr = match.group(1)
                ident = re.sub(r'[^A-Za-z0-9_]', lambda s: "_" + codepoint2name[ord(s.group(0))] + "_", qstr)

                # don't add duplicates
                if ident in qstrs:
                    continue

                # add the qstr to the list, with order number to retain original order in file
                qstrs[ident] = (len(qstrs), ident, qstr)

    # get config variables
    cfg_bytes_len = int(qcfgs['BYTES_IN_LEN'])
    cfg_max_len = 1 << (8 * cfg_bytes_len)

    # print out the starte of the generated C header file
    print('// This file was automatically generated by makeqstrdata.py')
    print('')

    # add NULL qstr with no hash or data
    print('QDEF(MP_QSTR_NULL, (const byte*)"\\x00\\x00%s" "")' % ('\\x00' * cfg_bytes_len))

    # go through each qstr and print it out
    for order, ident, qstr in sorted(qstrs.values(), key=lambda x: x[0]):
        qhash = compute_hash(qstr)
        # Calculate len of str, taking escapes into account
        qlen = len(qstr.replace("\\\\", "-").replace("\\", ""))
        qdata = qstr.replace('"', '\\"')
        if qlen >= cfg_max_len:
            print('qstr is too long:', qstr)
            assert False
        qlen_str = ('\\x%02x' * cfg_bytes_len) % tuple(((qlen >> (8 * i)) & 0xff) for i in range(cfg_bytes_len))
        print('QDEF(MP_QSTR_%s, (const byte*)"\\x%02x\\x%02x%s" "%s")' % (ident, qhash & 0xff, (qhash >> 8) & 0xff, qlen_str, qdata))

if __name__ == "__main__":
    do_work(sys.argv[1:])
py: Get makeqstrdata.py and makeversionhdr.py running under Python 2.6. These scripts should run under as wide a range of Python versions as possible. 10 years ago			`"""`
			`Process raw qstr file and output qstr data with length, hash and data bytes.`

			`This script works with Python 2.6, 2.7, 3.3 and 3.4.`
			`"""`

Fix makeqstrdata.py to work in Python 2.7 11 years ago			`from __future__ import print_function`

Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago			`import re`
makeqstrdata: print error to stderr. 11 years ago			`import sys`
Retain file order of qstr definitions. Want common qstrs to be first in the list so they have the lowest ids, so that in the byte code they take up the least room. 11 years ago
			`# codepoint2name is different in Python 2 to Python 3`
			`import platform`
			`if platform.python_version_tuple()[0] == '2':`
			`from htmlentitydefs import codepoint2name`
			`elif platform.python_version_tuple()[0] == '3':`
			`from html.entities import codepoint2name`
objstr: Add str.encode() and bytes.decode() methods. These largely duplicate str() & bytes() constructors' functionality, but can be used to achieve Python2 compatibility. 11 years ago			`codepoint2name[ord('-')] = 'hyphen';`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago
Change mp_obj_type_t.name from const char * to qstr. Ultimately all static strings should be qstr. This entry in the type structure is only used for printing error messages (to tell the type of the bad argument), and printing objects that don't supply a .print method. 11 years ago			`# add some custom names to map characters that aren't in HTML`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 10 years ago			`codepoint2name[ord(' ')] = 'space'`
			`codepoint2name[ord('\'')] = 'squot'`
			`codepoint2name[ord(',')] = 'comma'`
Change mp_obj_type_t.name from const char * to qstr. Ultimately all static strings should be qstr. This entry in the type structure is only used for printing error messages (to tell the type of the bad argument), and printing objects that don't supply a .print method. 11 years ago			`codepoint2name[ord('.')] = 'dot'`
stm: Initialize sys.path with ["0:/", "0:/src", "0:/lib"]. This is compatible with what search path was before sys.path refactor, with addition of module library path ("0:/lib"). 11 years ago			`codepoint2name[ord(':')] = 'colon'`
			`codepoint2name[ord('/')] = 'slash'`
py: Add hex builtin function. A one-liner, added especially for @pfalcon :) 11 years ago			`codepoint2name[ord('%')] = 'percent'`
py: Fix builtin hex to print prefix. I was too hasty. Still a one-liner though. 11 years ago			`codepoint2name[ord('#')] = 'hash'`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 10 years ago			`codepoint2name[ord('(')] = 'paren_open'`
			`codepoint2name[ord(')')] = 'paren_close'`
			`codepoint2name[ord('[')] = 'bracket_open'`
			`codepoint2name[ord(']')] = 'bracket_close'`
py: Add builtin functions bin and oct, and some tests for them. 11 years ago			`codepoint2name[ord('{')] = 'brace_open'`
			`codepoint2name[ord('}')] = 'brace_close'`
py: Add '' qstr for 'import '; use blank qstr for comprehension arg. 11 years ago			`codepoint2name[ord('*')] = 'star'`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 10 years ago			`codepoint2name[ord('!')] = 'bang'`
makeqstrdata.py: Add support for strings with backslash escapes. 10 years ago			`codepoint2name[ord('\\')] = 'backslash'`
Change mp_obj_type_t.name from const char * to qstr. Ultimately all static strings should be qstr. This entry in the type structure is only used for printing error messages (to tell the type of the bad argument), and printing objects that don't supply a .print method. 11 years ago
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago			`# this must match the equivalent function in qstr.c`
			`def compute_hash(qstr):`
py: Replace naive and teribble hash function with djb2. 11 years ago			`hash = 5381`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago			`for char in qstr:`
py: Replace naive and teribble hash function with djb2. 11 years ago			`hash = (hash * 33) ^ ord(char)`
Bring the C and Python compute_hash functions into consistency 11 years ago			`# Make sure that valid hash is never zero, zero means "hash not computed"`
			`return (hash & 0xffff) or 1`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago
			`def do_work(infiles):`
			`# read the qstrs in from the input files`
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 10 years ago			`qcfgs = {}`
Allow qstr's with non-ident chars, construct good identifier for them. Also, add qstr's for string appearing in unix REPL loop, gross effect being less allocations for each command run. 11 years ago			`qstrs = {}`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago			`for infile in infiles:`
			`with open(infile, 'rt') as f:`
			`for line in f:`
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 10 years ago			`line = line.strip()`

			`# is this a config line?`
			`match = re.match(r'^QCFG\((.+), (.+)\)', line)`
			`if match:`
			`value = match.group(2)`
			`if value[0] == '(' and value[-1] == ')':`
			`# strip parenthesis from config value`
			`value = value[1:-1]`
			`qcfgs[match.group(1)] = value`
			`continue`

More relaxed parsing of preprocessed qstr header The original parsing would error out on any C declarations that are not typedefs or extern variables. This limits what can go in mpconfig.h and mpconfigport.h, as they are included in qstr.h. For instance even a function declaration would be rejected and including system headers is a complete no-go. That seems too limiting for a global config header, so makeqstrdata now ignores everything that does not match a qstr definition. 11 years ago			`# is this a QSTR line?`
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 10 years ago			`match = re.match(r'^Q\((.*)\)$', line)`
More relaxed parsing of preprocessed qstr header The original parsing would error out on any C declarations that are not typedefs or extern variables. This limits what can go in mpconfig.h and mpconfigport.h, as they are included in qstr.h. For instance even a function declaration would be rejected and including system headers is a complete no-go. That seems too limiting for a global config header, so makeqstrdata now ignores everything that does not match a qstr definition. 11 years ago			`if not match:`
py: Modify makeqstrdata to recognise better the output of CPP. 11 years ago			`continue`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago
			`# get the qstr value`
			`qstr = match.group(1)`
Allow qstr's with non-ident chars, construct good identifier for them. Also, add qstr's for string appearing in unix REPL loop, gross effect being less allocations for each command run. 11 years ago			`ident = re.sub(r'[^A-Za-z0-9_]', lambda s: "_" + codepoint2name[ord(s.group(0))] + "_", qstr)`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago
			`# don't add duplicates`
Allow qstr's with non-ident chars, construct good identifier for them. Also, add qstr's for string appearing in unix REPL loop, gross effect being less allocations for each command run. 11 years ago			`if ident in qstrs:`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago			`continue`

Retain file order of qstr definitions. Want common qstrs to be first in the list so they have the lowest ids, so that in the byte code they take up the least room. 11 years ago			`# add the qstr to the list, with order number to retain original order in file`
Revert "makeqstrdata.py: Add support for conditionally defined qstrs." This reverts commit acb133d1b1a68847bd85c545312c3e221a6f7c0b. Conditionals will be suported using C preprocessor. 11 years ago			`qstrs[ident] = (len(qstrs), ident, qstr)`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 10 years ago			`# get config variables`
			`cfg_bytes_len = int(qcfgs['BYTES_IN_LEN'])`
			`cfg_max_len = 1 << (8 * cfg_bytes_len)`

			`# print out the starte of the generated C header file`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago			`print('// This file was automatically generated by makeqstrdata.py')`
Fix malformed generated file when using python 2.7 11 years ago			`print('')`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 10 years ago
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 10 years ago			`# add NULL qstr with no hash or data`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 10 years ago			`print('QDEF(MP_QSTR_NULL, (const byte)"\\x00\\x00%s" "")' % ('\\x00' cfg_bytes_len))`

			`# go through each qstr and print it out`
Revert "makeqstrdata.py: Add support for conditionally defined qstrs." This reverts commit acb133d1b1a68847bd85c545312c3e221a6f7c0b. Conditionals will be suported using C preprocessor. 11 years ago			`for order, ident, qstr in sorted(qstrs.values(), key=lambda x: x[0]):`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago			`qhash = compute_hash(qstr)`
makeqstrdata.py: Add support for strings with backslash escapes. 10 years ago			`# Calculate len of str, taking escapes into account`
			`qlen = len(qstr.replace("\\\\", "-").replace("\\", ""))`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 10 years ago			`qdata = qstr.replace('"', '\\"')`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 10 years ago			`if qlen >= cfg_max_len:`
			`print('qstr is too long:', qstr)`
			`assert False`
py/makeqstrdata.py: Make it work again with both Python2 and Python3. 10 years ago			`qlen_str = ('\\x%02x' * cfg_bytes_len) % tuple(((qlen >> (8 * i)) & 0xff) for i in range(cfg_bytes_len))`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 10 years ago			`print('QDEF(MP_QSTR_%s, (const byte*)"\\x%02x\\x%02x%s" "%s")' % (ident, qhash & 0xff, (qhash >> 8) & 0xff, qlen_str, qdata))`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 11 years ago
			`if __name__ == "__main__":`
py: Get makeqstrdata.py and makeversionhdr.py running under Python 2.6. These scripts should run under as wide a range of Python versions as possible. 10 years ago			`do_work(sys.argv[1:])`