docs: Add reference for Thumb2 inline assembler.

Thanks to Peter Hinch for contributing this.
10 years ago · 2110dc5a6d
14 changed files with 800 additions and 0 deletions
--- a/docs/pyboard/tutorial/assembler.rst
+++ b/docs/pyboard/tutorial/assembler.rst
@ -1,3 +1,5 @@
+.. _pyboard_tutorial_assembler:
+
 Inline assembler
 ================

--- a/docs/reference/asm_thumb2_arith.rst
+++ b/docs/reference/asm_thumb2_arith.rst
@ -0,0 +1,50 @@
+Arithmetic instructions
+=======================
+
+Document conventions
+--------------------
+
+Notation: ``Rd, Rm, Rn`` denote ARM registers R0-R7. ``immN`` denotes an immediate
+value having a width of N bits e.g. ``imm8``, ``imm3``. ``carry`` denotes
+the carry condition flag, ``not(carry)`` denotes its complement. In the case of instructions
+with more than one register argument, it is permissible for some to be identical. For example
+the following will add the contents of R0 to itself, placing the result in R0:
+
+* add(r0, r0, r0)
+
+Arithmetic instructions affect the condition flags except where stated.
+
+Addition
+--------
+
+* add(Rdn, imm8) ``Rdn = Rdn + imm8``
+* add(Rd, Rn, imm3) ``Rd = Rn + imm3``
+* add(Rd, Rn, Rm) ``Rd = Rn +Rm``
+* adc(Rd, Rn) ``Rd = Rd + Rn + carry``
+
+Subtraction
+-----------
+
+* sub(Rdn, imm8) ``Rdn = Rdn - imm8``
+* sub(Rd, Rn, imm3) ``Rd = Rn - imm3``
+* sub(Rd, Rn, Rm) ``Rd = Rn - Rm``
+* sbc(Rd, Rn) ``Rd = Rd - Rn - not(carry)``
+
+Negation
+--------
+
+* neg(Rd, Rn) ``Rd = -Rn``
+
+Multiplication and division
+---------------------------
+
+* mul(Rd, Rn) ``Rd = Rd * Rn``
+
+This produces a 32 bit result with overflow lost. The result may be treated as
+signed or unsigned according to the definition of the operands.
+
+* sdiv(Rd, Rn, Rm) ``Rd = Rn / Rm``
+* udiv(Rd, Rn, Rm) ``Rd = Rn / Rm``
+
+These functions perform signed and unsigned division respectively. Condition flags
+are not affected.
--- a/docs/reference/asm_thumb2_compare.rst
+++ b/docs/reference/asm_thumb2_compare.rst
@ -0,0 +1,90 @@
+Comparison instructions
+=======================
+
+These perform an arithmetic or logical instruction on two arguments, discarding the result
+but setting the condition flags. Typically these are used to test data values without changing
+them prior to executing a conditional branch.
+
+Document conventions
+--------------------
+
+Notation: ``Rd, Rm, Rn`` denote ARM registers R0-R7. ``imm8`` denotes an immediate
+value having a width of 8 bits.
+
+The Application Program Status Register (APSR)
+----------------------------------------------
+
+This contains four bits which are tested by the conditional branch instructions. Typically a
+conditional branch will test multiple bits, for example ``bge(LABEL)``. The meaning of
+condition codes can depend on whether the operands of an arithmetic instruction are viewed as
+signed or unsigned integers. Thus ``bhi(LABEL)`` assumes unsigned numbers were processed while
+``bgt(LABEL)`` assumes signed operands.
+
+APSR Bits
+---------
+
+* Z (zero)
+
+This is set if the result of an operation is zero or the operands of a comparison are equal.
+
+* N (negative)
+
+Set if the result is negative.
+
+* C (carry)
+
+An addition sets the carry flag when the result overflows out of the MSB, for example adding
+0x80000000 and 0x80000000. By the nature of two's complement arithmetic this behaviour is reversed
+on subtraction, with a borrow indicated by the carry bit being clear. Thus 0x10 - 0x01 is executed
+as 0x10 + 0xffffffff which will set the carry bit.
+
+* V (overflow)
+
+The overflow flag is set if the result, viewed as a two's compliment number, has the "wrong" sign
+in relation to the operands. For example adding 1 to 0x7fffffff will set the overflow bit because
+the result (0x8000000), viewed as a two's complement integer, is negative. Note that in this instance
+the carry bit is not set.
+
+Comparison instructions
+-----------------------
+
+These set the APSR (Application Program Status Register) N (negative), Z (zero), C (carry) and V
+(overflow) flags.
+
+* cmp(Rn, imm8) ``Rn - imm8``
+* cmp(Rn, Rm) ``Rn - Rm``
+* cmn(Rn, Rm) ``Rn + Rm``
+* tst(Rn, Rm) ``Rn & Rm``
+
+Conditional execution
+---------------------
+
+The ``it`` and ``ite`` instructions provide a means of conditionally executing from one to four subsequent
+instructions without the need for a label.
+
+* it(<condition>) If then
+
+Execute the next instruction if <condition> is true:
+
+::
+
+    cmp(r0, r1)
+    it(eq)
+    mov(r0, 100) # runs if r0 == r1
+    # execution continues here
+
+* ite(<condition>) If then else
+
+If <condtion> is true, execute the next instruction, otherwise execute the
+subsequent one. Thus:
+
+::
+
+    cmp(r0, r1)
+    ite(eq)
+    mov(r0, 100) # runs if r0 == r1
+    mov(r0, 200) # runs if r0 != r1
+    # execution continues here
+
+This may be extended to control the execution of upto four subsequent instructions: it[x[y[z]]]
+where x,y,z=t/e; e.g. itt, itee, itete, ittte, itttt, iteee, etc.
--- a/docs/reference/asm_thumb2_directives.rst
+++ b/docs/reference/asm_thumb2_directives.rst
@ -0,0 +1,36 @@
+Assembler Directives
+====================
+
+Labels
+------
+
+* label(INNER1)
+
+This defines a label for use in a branch instruction. Thus elsewhere in the code a ``b(INNER1)``
+will cause execution to continue with the instruction after the label directive.
+
+Defining inline data
+--------------------
+
+The following assembler directives facilitate embedding data in an assembler code block.
+
+* data(size, d0, d1 .. dn)
+
+The data directive creates n array of data values in memory. The first argument specifies the
+size in bytes of the subsequent arguments. Hence the first statement below will cause the
+assembler to put three bytes (with values 2, 3 and 4) into consecutive memory locations
+while the second will cause it to emit two four byte words.
+
+::
+
+    data(1, 2, 3, 4)
+    data(4, 2, 100000)
+
+Data values longer than a single byte are stored in memory in little-endian format.
+
+* align(nBytes)
+
+Align the following instruction to an nBytes value. ARM Thumb-2 instructions must be two
+byte aligned, hence it's advisable to issue ``align(2)`` after ``data`` directives and
+prior to any subsequent code. This ensures that the code will run irrespective of the
+size of the data array.
--- a/docs/reference/asm_thumb2_float.rst
+++ b/docs/reference/asm_thumb2_float.rst
@ -0,0 +1,77 @@
+Floating Point instructions
+==============================
+
+These instructions support the use of the ARM floating point coprocessor
+(on platforms such as the Pyboard which are equipped with one). The FPU
+has 32 registers known as ``s0-s31`` each of which can hold a single
+precision float. Data can be passed between the FPU registers and the
+ARM core registers with the ``vmov`` instruction.
+
+Note that MicroPython doesn't support passing floats to
+assembler functions, nor can you put a float into ``r0`` and expect a
+reasonable result. There are two ways to overcome this. The first is to
+use arrays, and the second is to pass and/or return integers and convert
+to and from floats in code.
+
+Document conventions
+--------------------
+
+Notation: ``Sd, Sm, Sn`` denote FPU registers, ``Rd, Rm, Rn`` denote ARM core
+registers. The latter can be any ARM core register although registers
+``R13-R15`` are unlikely to be appropriate in this context.
+
+Arithmetic
+----------
+
+* vadd(Sd, Sn, Sm) ``Sd = Sn + Sm``
+* vsub(Sd, Sn, Sm) ``Sd = Sn - Sm``
+* vneg(Sd, Sm) ``Sd = -Sm``
+* vmul(Sd, Sn, Sm) ``Sd = Sn * Sm``
+* vdiv(Sd, Sn, Sm) ``Sd = Sn / Sm``
+* vsqrt(Sd, Sm) ``Sd = sqrt(Sm)``
+
+Registers may be identical: ``vmul(S0, S0, S0)`` will execute ``S0 = S0*S0``
+ 
+Move between ARM core and FPU registers
+---------------------------------------
+
+* vmov(Sd, Rm) ``Sd = Rm``
+* vmov(Rd, Sm) ``Rd = Sm``
+
+The FPU has a register known as FPSCR, similar to the ARM core's APSR, which stores condition
+codes plus other data. The following instructions provide access to this.
+ 
+* vmrs(APSR\_nzcv, FPSCR)
+
+Move the floating-point N, Z, C, and V flags to the APSR N, Z, C, and V flags.
+
+This is done after an instruction such as an FPU
+comparison to enable the condition codes to be tested by the assembler
+code. The following is a more general form of the instruction.
+
+* vmrs(Rd, FPSCR) ``Rd = FPSCR``
+
+Move between FPU register and memory
+------------------------------------
+
+* vldr(Sd, [Rn, offset]) ``Sd = [Rn + offset]``
+* vstr(Sd, [Rn, offset]) ``[Rn + offset] = Sd``
+
+Where ``[Rn + offset]`` denotes the memory address obtained by adding Rn to the offset. This
+is specified in bytes. Since each float value occupies a 32 bit word, when accessing arrays of
+floats the offset must always be a multiple of four bytes.
+
+Data Comparison
+---------------
+
+* vcmp(Sd, Sm)
+
+Compare the values in Sd and Sm and set the FPU N, Z,
+C, and V flags. This would normally be followed by ``vmrs(APSR_nzcv, FPSCR)``
+to enable the results to be tested.
+
+Convert between integer and float
+---------------------------------
+
+* vcvt\_f32\_s32(Sd, Sm) ``Sd = float(Sm)``
+* vcvt\_s32\_f32(Sd, Sm) ``Sd = int(Sm)``
--- a/docs/reference/asm_thumb2_hints_tips.rst
+++ b/docs/reference/asm_thumb2_hints_tips.rst
@ -0,0 +1,232 @@
+Hints and tips
+==============
+
+The following are some examples of the use of the inline assembler and some
+information on how to work around its limitations. In this document the term
+"assembler function" refers to a function declared in Python with the 
+``@micropython.asm_thumb`` decorator, whereas "subroutine" refers to assembler
+code called from within an assembler function.
+
+Code branches and subroutines
+-----------------------------
+
+It is important to appreciate that labels are local to an assembler function.
+There is currently no way for a subroutine defined in one function to be called
+from another.
+
+To call a subroutine the instruction ``bl(LABEL)`` is issued. This transfers
+control to the instruction following the ``label(LABEL)`` directive and stores
+the return address in the link register (``lr`` or ``r14``). To return the
+instruction ``bx(lr)`` is issued which causes execution to continue with
+the instruction following the subroutine call. This mechanism implies that, if
+a subroutine is to call another, it must save the link register prior to
+the call and restore it before terminating.
+
+The following rather contrived example illustrates a function call. Note that
+it's necessary at the start to branch around all subroutine calls: subroutines
+end execution with ``bx(lr)`` while the outer function simply "drops off the end"
+in the style of Python functions.
+
+::
+
+    @micropython.asm_thumb
+    def quad(r0):
+        b(START)
+        label(DOUBLE)
+        add(r0, r0, r0)
+        bx(lr)
+        label(START)
+        bl(DOUBLE)
+        bl(DOUBLE)
+
+    print(quad(10))
+
+The following code example demonstrates a nested (recursive) call: the classic
+Fibonacci sequence. Here, prior to a recursive call, the link register is saved
+along with other registers which the program logic requires to be preserved.
+
+::
+
+    @micropython.asm_thumb
+    def fib(r0):
+        b(START)
+        label(DOFIB)
+        push({r1, r2, lr})
+        cmp(r0, 1)
+        ble(FIBDONE)
+        sub(r0, 1)
+        mov(r2, r0) # r2 = n -1
+        bl(DOFIB)
+        mov(r1, r0) # r1 = fib(n -1)
+        sub(r0, r2, 1)
+        bl(DOFIB)   # r0 = fib(n -2)
+        add(r0, r0, r1)
+        label(FIBDONE)
+        pop({r1, r2, lr})
+        bx(lr)
+        label(START)
+        bl(DOFIB)
+
+    for n in range(10):
+        print(fib(n))
+
+Argument passing and return
+---------------------------
+
+The tutorial details the fact that assembler functions can support from zero to
+three arguments, which must (if used) be named ``r0``, ``r1`` and ``r2``. When
+the code executes the registers will be initialised to those values.
+
+The data types which can be passed in this way are integers and memory
+addresses. Further, integers are restricted in that the top two bits
+must be identical, limiting the range to -2**30 to 2**30 -1. Return
+values are similarly limited. These limitations can be overcome by means
+of the ``array`` module to allow any number of values of any type to
+be accessed.
+
+Multiple arguments
+~~~~~~~~~~~~~~~~~~
+
+If a Python array of integers is passed as an argument to an assembler
+function, the function will receive the address of a contiguous set of integers.
+Thus multiple arguments can be passed as elements of a single array. Similarly a
+function can return multiple values by assigning them to array elements.
+Assembler functions have no means of determining the length of an array:
+this will need to be passed to the function.
+
+This use of arrays can be extended to enable more than three arrays to be used. 
+This is done using indirection: the ``uctypes`` module supports ``addressof()`` 
+which will return the address of an array passed as its argument. Thus you can
+populate an integer array with the addresses of other arrays:
+
+::
+
+    from uctypes import addressof
+    @micropython.asm_thumb
+    def getindirect(r0):
+        ldr(r0, [r0, 0]) # Address of array loaded from passed array
+        ldr(r0, [r0, 4]) # Return element 1 of indirect array (24)
+
+    def testindirect():
+        a = array.array('i',[23, 24])
+        b = array.array('i',[0,0])
+        b[0] = addressof(a)
+        print(getindirect(b))
+
+Non-integer data types
+~~~~~~~~~~~~~~~~~~~~~~
+
+These may be handled by means of arrays of the appropriate data type. For
+example, single precison floating point data may be processed as follows.
+This code example takes an array of floats and replaces its contents with
+their squares.
+
+::
+
+    from array import array
+
+    @micropython.asm_thumb
+    def square(r0, r1):
+        label(LOOP)
+        vldr(s0, [r0, 0])
+        vmul(s0, s0, s0)
+        vstr(s0, [r0, 0])
+        add(r0, 4)
+        sub(r1, 1)
+        bgt(LOOP)
+
+    a = array('f', (x for x in range(10)))
+    square(a, len(a))
+    print(a)
+
+The uctypes module supports the use of data structures beyond simple
+arrays. It enables a Python data structure to be mapped onto a bytearray
+instance which may then be passed to the assembler function.
+
+Named constants
+---------------
+
+Assembler code may be made more readable and maintainable by using named
+constants rather than littering code with numbers. This may be achieved
+thus:
+
+::
+
+    MYDATA = const(33)
+
+    @micropython.asm_thumb
+    def foo():
+        mov(r0, MYDATA)
+
+The const() construct causes MicroPython to replace the variable name
+with its value at compile time. If constants are declared in an outer
+Python scope they can be shared between mutiple assembler functions and
+with Python code.
+
+Assembler code as class methods
+-------------------------------
+
+MicroPython passes the address of the object instance as the first argument
+to class methods. This is normally of little use to an assembler function.
+It can be avoided by declaring the function as a static method thus:
+
+::
+
+    class foo:
+      @staticmethod
+      @micropython.asm_thumb
+      def bar(r0):
+        add(r0, r0, r0)
+
+Use of unsupported instructions
+-------------------------------
+
+These can be coded using the data statement as shown below. While
+``push()`` and ``pop()`` are supported the example below illustrates the
+principle. The necessary machine code may be found in the ARM v7-M
+Architecture Reference Manual. Note that the first argument of data
+calls such as
+
+::
+
+    data(2, 0xe92d, 0x0f00) # push r8,r9,r10,r11
+
+indicates that each subsequent argument is a two byte quantity.
+
+Overcoming MicroPython's integer restriction
+--------------------------------------------
+
+The Pyboard chip includes a CRC generator. Its use presents a problem in
+MicroPython because the returned values cover the full gamut of 32 bit
+quantities whereas small integers in MicroPython cannot have differing values
+in bits 30 and 31. This limitation is overcome with the following code, which
+uses assembler to put the result into an array and Python code to
+coerce the result into an arbitrary precision unsigned integer.
+
+::
+
+    from array import array
+    import stm
+
+    def enable_crc():
+        stm.mem32[stm.RCC + stm.RCC_AHB1ENR] |= 0x1000
+
+    def reset_crc():
+        stm.mem32[stm.CRC+stm.CRC_CR] = 1
+
+    @micropython.asm_thumb
+    def getval(r0, r1):
+        movwt(r3, stm.CRC + stm.CRC_DR)
+        str(r1, [r3, 0])
+        ldr(r2, [r3, 0])
+        str(r2, [r0, 0])
+
+    def getcrc(value):
+        a = array('i', [0])
+        getval(a, value)
+        return a[0] & 0xffffffff # coerce to arbitrary precision
+
+    enable_crc()
+    reset_crc()
+    for x in range(20):
+        print(hex(getcrc(0)))
--- a/docs/reference/asm_thumb2_index.rst
+++ b/docs/reference/asm_thumb2_index.rst
@ -0,0 +1,73 @@
+.. _asm_thumb2_index:
+
+Inline Assembler for Thumb2 architectures
+=========================================
+
+This document assumes some familiarity with assembly language programming and should be read after studying
+the :ref:`tutorial <pyboard_tutorial_assembler>`. For a detailed description of the instruction set consult the
+Architecture Reference Manual detailed below.
+The inline assembler supports a subset of the ARM Thumb-2 instruction set described here. The syntax tries
+to be as close as possible to that defined in the above ARM manual, converted to Python function calls.
+
+Instructions operate on 32 bit signed integer data except where stated otherwise. Most supported instructions
+operate on registers ``R0-R7`` only: where ``R8-R15`` are supported this is stated. Registers ``R8-R12`` must be
+restored to their initial value before return from a function. Registers ``R13-R15`` constitute the Link Register,
+Stack Pointer and Program Counter respectively.
+
+Document conventions
+--------------------
+
+Where possible the behaviour of each instruction is described in Python, for example
+
+* add(Rd, Rn, Rm) ``Rd = Rn + Rm``
+
+This enables the effect of instructions to be demonstrated in Python. In certain case this is impossible
+because Python doesn't support concepts such as indirection. The pseudocode employed in such cases is
+described on the relevant page.
+
+Instruction Categories
+----------------------
+
+The following sections details the subset of the ARM Thumb-2 instruction set supported by MicroPython.
+
+.. toctree::
+   :maxdepth: 1
+   :numbered:
+
+   asm_thumb2_mov.rst
+   asm_thumb2_ldr.rst
+   asm_thumb2_str.rst
+   asm_thumb2_logical_bit.rst
+   asm_thumb2_arith.rst
+   asm_thumb2_compare.rst
+   asm_thumb2_label_branch.rst
+   asm_thumb2_stack.rst
+   asm_thumb2_misc.rst
+   asm_thumb2_float.rst
+   asm_thumb2_directives.rst
+
+Usage examples
+--------------
+
+These sections provide further code examples and hints on the use of the assembler.
+
+.. toctree::
+   :maxdepth: 1
+   :numbered:
+
+   asm_thumb2_hints_tips.rst
+
+References
+----------
+
+-  :ref:`Assembler Tutorial <pyboard_tutorial_assembler>`
+-  `Wiki hints and tips
+   <http://wiki.micropython.org/platforms/boards/pyboard/assembler>`__
+-  `uPy Inline Assembler source-code,
+   emitinlinethumb.c <https://github.com/micropython/micropython/blob/master/py/emitinlinethumb.c>`__
+-  `ARM Thumb2 Instruction Set Quick Reference
+   Card <http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001l/QRC0001_UAL.pdf>`__
+-  `RM0090 Reference
+   Manual <http://www.google.ae/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0CBoQFjAA&url=http%3A%2F%2Fwww.st.com%2Fst-web-ui%2Fstatic%2Factive%2Fen%2Fresource%2Ftechnical%2Fdocument%2Freference_manual%2FDM00031020.pdf&ei=G0rSU66xFeuW0QWYwoD4CQ&usg=AFQjCNFuW6TgzE4QpahO_U7g3f3wdwecAg&sig2=iET-R0y9on_Pbflzf9aYDw&bvm=bv.71778758,bs.1,d.bGQ>`__
+-  ARM v7-M Architecture Reference Manual (Available on the
+   ARM site after a simple registration procedure. Also available on academic sites but beware of out of date versions.)
--- a/docs/reference/asm_thumb2_label_branch.rst
+++ b/docs/reference/asm_thumb2_label_branch.rst
@ -0,0 +1,85 @@
+Branch instructions
+===================
+
+These cause execution to jump to a target location usually specified by a label (see the ``label``
+assembler directive). Conditional branches and the ``it`` and ``ite`` instructions test
+the Application Program Status Register (APSR) N (negative), Z (zero), C (carry) and V
+(overflow) flags to determine whether the branch should be executed.
+
+Most of the exposed assembler instructions (including move operations) set the flags but
+there are explicit comparison instructions to enable values to be tested.
+
+Further detail on the meaning of the condition flags is provided in the section
+describing comparison functions.
+
+Document conventions
+--------------------
+
+Notation: ``Rm`` denotes ARM registers R0-R15. ``LABEL`` denotes a label defined with the
+``label()`` assembler directive. ``<condition>`` indicates one of the following condition
+specifiers:
+
+* eq Equal to (result was zero)
+* ne Not equal
+* cs Carry set
+* cc Carry clear
+* mi Minus (negaive)
+* pl Plus (positive)
+* vs Overflow set
+* vc Overflow clear
+* hi > (unsigned comparison)
+* ls <= (unsigned comparison)
+* ge >= (signed comparison)
+* lt < (signed comparison)
+* gt > (signed comparison)
+* le <= (signed comparison)
+
+Branch to label
+---------------
+
+* b(LABEL) Unconditional branch
+* beq(LABEL) branch if equal
+* bne(LABEL) branch if not equal
+* bge(LABEL) branch if greater than or equal
+* bgt(LABEL) branch if greater than
+* blt(LABEL) branch if less than (<) (signed)
+* ble(LABEL) branch if less than or equal to (<=) (signed)
+* bcs(LABEL) branch if carry flag is set
+* bcc(LABEL) branch if carry flag is clear
+* bmi(LABEL) branch if negative
+* bpl(LABEL) branch if positive
+* bvs(LABEL) branch if overflow flag set
+* bvc(LABEL) branch if overflow flag is clear
+* bhi(LABEL) branch if higher (unsigned)
+* bls(LABEL) branch if lower or equal (unsigned)
+
+Long branches
+-------------
+
+The code produced by the branch instructions listed above uses a fixed bit width to specify the
+branch destination, which is PC relative. Consequently in long programs where the
+branch instruction is remote from its destination the assembler will produce a "branch not in
+range" error. This can be overcome with the "wide" variants such as
+
+* beq\_w(LABEL) long branch if equal
+
+Wide branches use 4 bytes to encode the instruction (compared with 2 bytes for standard branch instructions).
+
+Subroutines (functions)
+-----------------------
+
+When entering a subroutine the processor stores the return address in register r14, also
+known as the link register (lr). Return to the instruction after the subroutine call is
+performed by updating the program counter (r15 or pc) from the link register, This
+process is handled by the following instructions.
+
+* bl(LABEL)
+
+Transfer execution to the instruction after ``LABEL`` storing the return address in
+the link register (r14).
+
+* bx(Rm) Branch to address specified by Rm.
+
+Typically ``bx(lr)`` is issued to return from a subroutine. For nested subroutines the
+link register of outer scopes must be saved (usually on the stack) before performing
+inner subroutine calls.
--- a/docs/reference/asm_thumb2_ldr.rst
+++ b/docs/reference/asm_thumb2_ldr.rst
@ -0,0 +1,23 @@
+Load register from memory
+=========================
+
+Document conventions
+--------------------
+
+Notation: ``Rt, Rn`` denote ARM registers R0-R7 except where stated. ``immN`` represents an immediate
+value having a width of N bits hence ``imm5`` is constrained to the range 0-31. ``[Rn + immN]`` is the contents
+of the memory address obtained by adding Rn and the offset ``immN``. Offsets are measured in
+bytes. These instructions affect the condition flags.
+
+Register Load
+-------------
+
+* ldr(Rt, [Rn, imm7]) ``Rt = [Rn + imm7]`` Load a 32 bit word
+* ldrb(Rt, [Rn, imm5]) ``Rt = [Rn + imm5]`` Load a byte
+* ldrh(Rt, [Rn, imm6]) ``Rt = [Rn + imm6]`` Load a 16 bit half word
+
+Where a byte or half word is loaded, it is zero-extended to 32 bits.
+
+The specified immediate offsets are measured in bytes. Hence in the case of ``ldr`` the 7 bit value
+enables 32 bit word aligned values to be accessed with a maximum offset of 31 words. In the case of ``ldrh`` the
+6 bit value enables 16 bit half-word aligned values to be accessed with a maximum offset of 31 half-words.
--- a/docs/reference/asm_thumb2_logical_bit.rst
+++ b/docs/reference/asm_thumb2_logical_bit.rst
@ -0,0 +1,53 @@
+Logical & Bitwise instructions
+==============================
+
+Document conventions
+--------------------
+
+Notation: ``Rd, Rn`` denote ARM registers R0-R7 except in the case of the
+special instructions where R0-R15 may be used. ``Rn<a-b>`` denotes an ARM register
+whose contents must lie in range ``a <= contents <= b``. In the case of instructions
+with two register arguments, it is permissible for them to be identical. For example
+the following will zero R0 (Python ``R0 ^= R0``) regardless of its initial contents.
+
+* eor(r0, r0)
+
+These instructions affect the condition flags except where stated.
+
+Logical instructions
+--------------------
+
+* and\_(Rd, Rn) ``Rd &= Rn``
+* orr(Rd, Rn) ``Rd |= Rn``
+* eor(Rd, Rn) ``Rd ^= Rn``
+* mvn(Rd, Rn) ``Rd = Rn ^ 0xffffffff`` i.e. Rd = 1's complement of Rn
+* bic(Rd, Rn) ``Rd &= ~Rn`` bit clear Rd using mask in Rn
+
+Note the use of "and\_" instead of "and", because "and" is a reserved keyword in Python.
+
+Shift and rotation instructions
+-------------------------------
+
+* lsl(Rd, Rn<0-31>) ``Rd <<= Rn``
+* lsr(Rd, Rn<1-32>) ``Rd = (Rd & 0xffffffff) >> Rn`` Logical shift right
+* asr(Rd, Rn<1-32>) ``Rd >>= Rn`` arithmetic shift right
+* ror(Rd, Rn<1-31>) ``Rd = rotate_right(Rd, Rn)`` Rd is rotated right Rn bits.
+
+A rotation by (for example) three bits works as follows. If Rd initially
+contains bits ``b31 b30..b0`` after rotation it will contain ``b2 b1 b0 b31 b30..b3``
+
+Special instructions
+--------------------
+
+Condition codes are unaffected by these instructions.
+
+* clz(Rd, Rn) ``Rd = count_leading_zeros(Rn)``
+
+count_leading_zeros(Rn) returns the number of binary zero bits before the first binary one bit in Rn.
+
+* rbit(Rd, Rn) ``Rd = bit_reverse(Rn)``
+
+bit_reverse(Rn) returns the bit-reversed contents of Rn. If Rn contains bits ``b31 b30..b0`` Rd will be set
+to ``b0 b1 b2..b31``
+
+Trailing zeros may be counted by performing a bit reverse prior to executing clz.
--- a/docs/reference/asm_thumb2_misc.rst
+++ b/docs/reference/asm_thumb2_misc.rst
@ -0,0 +1,10 @@
+Miscellaneous instructions
+==========================
+
+* nop() ``pass`` no operation.
+* wfi() Suspend execution in a low power state until an interrupt occurs.
+* cpsid(flags) set the Priority Mask Register - disable interrupts.
+* cpsie(flags) clear the Priority Mask Register - enable interrupts.
+
+Currently the ``cpsie()`` and ``cpsid()`` functions are partially implemented.
+They require but ignore the flags argument and serve as a means of enabling and disabling interrupts.
--- a/docs/reference/asm_thumb2_mov.rst
+++ b/docs/reference/asm_thumb2_mov.rst
@ -0,0 +1,28 @@
+Register move instructions
+==========================
+
+Document conventions
+--------------------
+
+Notation: ``Rd, Rn`` denote ARM registers R0-R15. ``immN`` denotes an immediate
+value having a width of N bits. These instructions affect the condition flags.
+
+Register moves
+--------------
+
+Where immediate values are used, these are zero-extended to 32 bits. Thus
+``mov(R0, 0xff)`` will set R0 to 255.
+
+* mov(Rd, imm8) ``Rd = imm8``
+* mov(Rd, Rn) ``Rd = Rn``
+* movw(Rd, imm16) ``Rd = imm16``
+* movt(Rd, imm16) ``Rd = (Rd & 0xffff) | (imm16 << 16)``
+
+movt writes an immediate value to the top halfword of the destination register.
+It does not affect the contents of the bottom halfword.
+
+* movwt(Rd, imm30) ``Rd = imm30``
+
+movwt is a pseudo-instruction: the MicroPython assembler emits a ``movw`` and a ``movt``
+to move a zero extended 30 bit value into Rd. Where the full 32 bits are required a
+workround is to use the movw and movt operations.
--- a/docs/reference/asm_thumb2_stack.rst
+++ b/docs/reference/asm_thumb2_stack.rst
@ -0,0 +1,20 @@
+Stack push and pop
+==================
+
+Document conventions
+--------------------
+
+The ``push()`` and ``pop()`` instructions accept as their argument a register set containing
+a subset, or possibly all, of the general-purpose registers R0-R12 and the link register (lr or R14).
+As with any Python set the order in which the registers are specified is immaterial. Thus the
+in the following example the pop() instruction would restore R1, R7 and R8 to their contents prior
+to the push():
+
+* push({r1, r8, r7}) Save three registers on the stack.
+* pop({r7, r1, r8}) Restore them
+
+Stack operations
+----------------
+
+* push({regset}) Push a set of registers onto the stack
+* pop({regset}) Restore a set of registers from the stack
--- a/docs/reference/asm_thumb2_str.rst
+++ b/docs/reference/asm_thumb2_str.rst
@ -0,0 +1,21 @@
+Store register to memory
+========================
+
+Document conventions
+--------------------
+
+Notation: ``Rt, Rn`` denote ARM registers R0-R7 except where stated. ``immN`` represents an immediate
+value having a width of N bits hence ``imm5`` is constrained to the range 0-31. ``[Rn + imm5]`` is the
+contents of the memory address obtained by adding Rn and the offset ``imm5``. Offsets are measured in
+bytes. These instructions do not affect the condition flags.
+
+Register Store
+--------------
+
+* str(Rt, [Rn, imm7]) ``[Rn + imm7] = Rt`` Store a 32 bit word
+* strb(Rt, [Rn, imm5]) ``[Rn + imm5] = Rt`` Store a byte (b0-b7)
+* strh(Rt, [Rn, imm6]) ``[Rn + imm6] = Rt`` Store a 16 bit half word (b0-b15)
+
+The specified immediate offsets are measured in bytes. Hence in the case of ``str`` the 7 bit value
+enables 32 bit word aligned values to be accessed with a maximum offset of 31 words. In the case of ``strh`` the
+6 bit value enables 16 bit half-word aligned values to be accessed with a maximum offset of 31 half-words.