Maths on the simpleCPUv1d2

Figure 1 : maths / relational functions

Arguments, results and a data-stack
Integer and Fixed-point numbers
Testing
Negate
Addition
Addition fixed-point
Accumulate
Subtraction
Subtraction fixed-point
Multiplication
Multiplication fixed-point
Hardware multiplication 8bit and 16bit
Division
Division fixed-point
Relational operators
Relational operators fixed-points

To test the new simpleCPUv1d2's instruction-set i decided to write some maths routines to go with it e.g. neg, add, sub, mul and div etc. There are a number of different ways to implement these routines: different algorithms, different data types and different ways of passing parameters / results e.g. register/memory/stack, between the caller and callee code. This processor is a 16bit machine so the base data type is a 16bit signed / unsigned integer. However, to give a bit of flexibility also going to support 32bit signed / unsigned values. This allows bigger values to be represented and smaller values e.g. a Q16.16 fixed point representation.

Arguments, results and a data stack

For most programming cases we don't need to worry about nested subroutines, we don't need to worry about recursive algorithms. Sooo for the simple tasks i'm going to keep things simple and use 16bit operands i.e. 16bit arguments and results, that are passed via variables in memory, as shown below:

# 16bit arguments and results

W:
  .data 0      # input
X: 
  .data 0      # input
Y:
  .data 0      # output 
Z:
  .data 0      # output
CNT:
  .data 0      # bit count, working variable for different algorithms

To also support 32bit arguments we will need to double the size of these variables i.e. 32bit values will be stored across two 16bit memory locations, sooo:

# 16bit and 32bit arguments and results

W:
W_LOW:
  .data 0                        # input argument
W_HIGH:
  .data 0                        # input argument 

X:
X_LOW: 
  .data 0                        # input argument
X_HIGH:
  .data 0                        # input argument

Y:
Y_LOW:
  .data 0                        # output result
Y_HIGH:
  .data 0                        # output result

Z:
Z_LOW:
  .data 0                        # output result
Z_HIGH:
  .data 0                        # output result

CNT:
  .data 0                        # bit count, working variable for different algorithms

TMP:
TMP_0
TMP_LOW:
  .data 0                        # temp buffer variables
TMP_1
TMP_HIGH:
  .data 0
TMP_2:
  .data 0
TMP_3:
  .data 0

Ideally these variables should be stored in the first 255 memory locations to simplify address pointer calculations e.g. the generation of the addresses of W+1, X+1, Y+1 and Z+1. Storing values in named memory locations is an ok solution as long as you don't have more than two arguments and you don't have nested subroutines i.e. subroutines that call other subroutines, that also use these variables, that will overwrite these variables, corrupting the caller subroutine's state. A good example of this would be a recursive algorithm. Therefore, in these cases we need to implement a data-stack in memory, as shown in figure 2.

Figure 2 : data stack

To control where data is written to / read from this stack we will need a Stack Pointer (SP) and a Frame Pointer (FP). These could be stored in general purpose registers, but as we only have four data registers, loosing 50% of our registers is a bit of a big hit, sooo these pointers will be implemented as variables in memory. To simplify coding I also define the stack depth i.e. its start and stop addresses in memory. For this example i have selected 0xEFF to 0xE00, 256 memory locations. Therefore, the stack grows downwards i.e. if SP=0xEFF and you push data onto the stack pointer is updated to SP=0xEFE. Therefore, our final list of variables will be:

# 16bit, 32bit and Stack arguments and results

W:
W_LOW:
  .data 0                        # input argument
W_HIGH:
  .data 0                        # input argument 

X:
X_LOW: 
  .data 0                        # input argument
X_HIGH:
  .data 0                        # input argument

Y:
Y_LOW:
  .data 0                        # output result
Y_HIGH:
  .data 0                        # output result

Z:
Z_LOW:
  .data 0                        # output result
Z_HIGH:
  .data 0                        # output result

CNT:
  .data 0                        # bit count, working variable for different algorithms

TMP:
TMP_0
TMP_LOW:
  .data 0                        # temp buffer variables
TMP_1
TMP_HIGH:
  .data 0
TMP_2:
  .data 0
TMP_3:
  .data 0

STACK_START_ADDRESS:
  .data 0xEFF                    # stack start address

STACK_STOP_ADDRESS:
  .data 0xE00                    # stack stop address

STACK_POINTER:
  .data 0xEFF                    # stack pointer

FRAME_POINTER:
  .data 0xEFF                    # frame pointer

To implement this data stack's functions we can use the subroutines below. However, do remember that the subroutine return addresses are stored in the LILO_12 component, are stored in the processor's internal hardware implemented CALL/RET stack. The stack in memory is only used to store arguments (data to be processed) and results.

#####################
# STACK SUBROUTINES #
#####################

# STACK register usage 
# RA, RB temporary working registers, will be changed e.g. address / data calculations
# RC, RD data registers, contain data to be written to stack or read from stack          

# STACK - initialise pointers

stack_init:
  load ra STACK_START_ADDRESS       # top of stack address defined as constant
  store ra STACK_POINTER            # copy to stack pointer 
  store ra FRAME_POINTER            # copy to frame pointer
  ret

# STACK - increment stack pointer with empty check

inc_stack_pointer:
  load ra STACK_POINTER             # test limit
  subm ra STACK_START_ADDRESS
  jumpz inc_stack_pointer_exit

  load ra STACK_POINTER             # inc stack pointer
  add ra 1
  store ra STACK_POINTER
inc_stack_pointer_exit:
  ret

# STACK - decrement stack pointer with full check

dec_stack_pointer:
  load ra STACK_POINTER             # test limit
  subm ra STACK_STOP_ADDRESS
  jumpz dec_stack_pointer_exit

  load ra STACK_POINTER             # dec stack pointer
  sub ra 1
  store ra STACK_POINTER
dec_stack_pointer_exit:
  ret

# STACK - get stack pointer

get_stack_pointer:
  load ra STACK_POINTER             # RB = STACK_POINTER
  move rb ra   
  ret

get_dec_stack_pointer:
  call dec_stack_pointer            # dec SP
  move rb ra                        # RB = STACK_POINTER
  ret

get_inc_stack_pointer:
  call inc_stack_pointer            # inc SP
  move rb ra                        # RB = STACK_POINTER
  ret

# PUSH - write 16bit value to stack

push16:
  call get_stack_pointer            # SP returned in RB  
  store rc (rb)                     # write data to top of stack
  call dec_stack_pointer            # decrement stack pointer
  ret

# PUSH - write 32bit value to stack

push32:
  call get_stack_pointer            # SP returned in RB 
  store rd (rb)                     # write data to top of stack (high 16bit)
  call get_dec_stack_pointer        # dec SP, SP returned in RB         
  store rc (rb)                     # write data to top of stack (low 16bit)
  call dec_stack_pointer            # dec SP 
  ret

# POP - read 16bit value from stack

pop16:
  call get_inc_stack_pointer        # inc SP, SP returned in RB     
  load rc (rb)                      # read data from top of stack
  ret

# POP - read 32bit value from stack

pop32:
  call get_inc_stack_pointer        # SP returned in RB   
  load rc (rb)                      # read data from top of stack (low 16bit)
  call get_inc_stack_pointer        # inc SP, SP returned in RB 
  load rd (rb)                      # read data from top of stack (high 16bit)
  ret

# STACK - subroutine enter function

stack_subroutine_enter:
  call get_stack_pointer            # SP returned in RB 
  load ra FRAME_POINTER             # read FP
  store ra (rb)                     # write data to top of stack
  move ra rb
  store ra FRAME_POINTER            # FP=SP
  call dec_stack_pointer            # dec SP 
  ret

# STACK - subroutine exit function

stack_subroutine_exit:
  load ra FRAME_POINTER             # set SP=FP
  store ra STACK_POINTER
  load ra (ra)                      # set FP=OLD_FP
  store ra FRAME_POINTER
  ret

# STACK - remove arguments passed on stack

stack_remove_arguments:
  call inc_stack_pointer
  sub rc 1
  jumpnz stack_remove_arguments
  ret

Integer and Fixed-point numbers

When you think of a computer you naturally think of a binary representations, base 2. Each binary digit being called a "bit", the more bits you have the bigger the value you can represent. To represent smaller values we can move the decimal point, go to a fixed-point representation and use the negative powers of 2 e.g. the Q4.4 representation shown in figure 3, a 4bit integer term and a 4bit fractional term.

Figure 3 : Binary representation (top), fixed-point (bottom)

To represent negative numbers we can use a 2s complemented representation i.e. signed and unsigned values, soooo, that allows us to represent:

16BIT
-----
integer signed            : MIN = −32768  MAX = +32767
integer unsigned          : MIN = 0       MAX = +65535
fixed-point Q8.8 signed   : MIN = −128    MAX = +127.99609375  Resolution = 0.00390625
fixed-point Q8.8 unsigned : MIN = 0       MAX = +255.99609375  Resolution = 0.00390625

32BIT
-----
integer signed              : MIN = −2,147,483,648  MAX = +2,147,483,647
integer unsigned            : MIN = 0               MAX = +4,294,967,295
fixed-point Q16.16 signed   : MIN = −32768          MAX = +32767.9999847412  Resolution = 0.0000152587890625
fixed-point Q16.16 unsigned : MIN = 0               MAX = +65535.9999847412  Resolution = 0.0000152587890625

The 32bit data type The good thing about using a fixed-point representation is that were can use the existing hardware i.e. adders, registers etc, so they are fast. The bad thing about using a fixed-point representation is its limited range. The decimal of each bit position of the Q16.16 number are listed in the table below:

N (bit position)	Integer (2^N)	Fractional (2^-N)
0	1	-
1	2	0.5
2	4	0.25
3	8	0.125
4	16	0.0625
5	32	0.03125
6	64	0.015625
7	128	0.0078125
8	256	0.00390625
9	512	0.001953125
10	1024	0.0009765625
11	2048	0.00048828125
12	4096	0.000244140625
13	8192	0.0001220703125
14	16384	0.00006103515625
15	32768	0.000030517578125
16	-	0.0000152587890625

These value can be used in the python program below to help calculate our fixed point numbers e.g. the example in figure 4 shows how the fixed-point value for 123.45 can be calculated, well the value 123.449, as there will be rounding errors, well a quantisation error, the fixed-point value is rounded "nearest" fixed-point step. The values calculated by this code can be dumped to the terminal for cut-and-paste, or to a file: fp_value.txt, when the Save button is pressed.

import tkinter as tk

INT_BITS = [2**i for i in range(16)]       # 2^0 .. 2^15
FRAC_BITS = [2**(-i) for i in range(1,17)] # 2^-1 .. 2^-16

class FixedPointGUI:
    def __init__(self, root):
        self.root = root
        root.title("Fixed‑Point Bit Viewer")

        self.int_vars = []
        self.frac_vars = []

        tk.Label(root, text="Fixed‑Point Bit Viewer", font=("Arial", 16)).pack(pady=10)

        frame_int = tk.LabelFrame(root, text="Integer Bits (15..0)", padx=10, pady=10)
        frame_frac = tk.LabelFrame(root, text="Fractional Bits (−1..−16)", padx=10, pady=10)
        frame_int.pack(padx=10, pady=5)
        frame_frac.pack(padx=10, pady=5)

        # Integer bit checkboxes (bit 15 down to bit 0)
        for i in reversed(range(16)):
            var = tk.IntVar()
            chk = tk.Checkbutton(frame_int, text=f"{i}", variable=var,
                                 command=self.update_value)
            chk.grid(row=0, column=15 - i, padx=3)
            self.int_vars.append(var)

        # Fractional bit checkboxes (bit −1 down to −16)
        for i in range(16):
            var = tk.IntVar()
            chk = tk.Checkbutton(frame_frac, text=f"-{i+1}", variable=var,
                                 command=self.update_value)
            chk.grid(row=0, column=i, padx=3)
            self.frac_vars.append(var)

        # Output labels
        self.value_label = tk.Label(root, text="Decimal: 0.0", font=("Arial", 14))
        self.binary_label = tk.Label(root, text="Binary: 0000000000000000.0000000000000000", font=("Courier", 12))
        self.hex_label = tk.Label(root, text="Hex: 0x00000000", font=("Courier", 12))

        self.value_label.pack(pady=5)
        self.binary_label.pack(pady=5)
        self.hex_label.pack(pady=5)

        # Save button
        save_button = tk.Button(root, text="Save to File", command=self.save_to_file, font=("Arial", 12))
        save_button.pack(pady=10)

        # Internal storage for last values
        self.last_decimal = 0.0
        self.last_binary = "0" * 16 + "." + "0" * 16
        self.last_hex = "0x00000000"

    # Calc Value
    def update_value(self):
        value = 0.0

        # Integer part
        for var, weight in zip(reversed(self.int_vars), INT_BITS):
            if var.get() == 1:
                value += weight

        # Fractional part
        for var, weight in zip(self.frac_vars, FRAC_BITS):
            if var.get() == 1:
                value += weight

        # Update decimal
        self.last_decimal = value
        self.value_label.config(text=f"Decimal: {value}")

        # Build binary string
        int_bits = "".join(str(v.get()) for v in self.int_vars)
        frac_bits = "".join(str(v.get()) for v in self.frac_vars)
        binary_str = int_bits + "." + frac_bits

        self.last_binary = binary_str
        self.binary_label.config(text=f"Binary: {binary_str}")

        # Convert to 32‑bit integer (Q16.16)
        int_value = 0

        # Integer bits (bit 31..16)
        for i, var in enumerate(self.int_vars):
            if var.get() == 1:
                int_value |= (1 << (31 - i))

        # Fractional bits (bit 15..0)
        for i, var in enumerate(self.frac_vars):
            if var.get() == 1:
                int_value |= (1 << (15 - i))

        hex_str = f"0x{int_value:08X}"
        self.last_hex = hex_str
        self.hex_label.config(text=f"Hex: {hex_str}")

    # Save values to file
    def save_to_file(self):
        print(f"Decimal: {self.last_decimal}\n")
        print(f"Binary:  {self.last_binary}\n")
        print(f"Hex:     {self.last_hex}\n")
        with open("fp_value.txt", "w") as f:
            f.write(f"Decimal: {self.last_decimal}\n")
            f.write(f"Binary:  {self.last_binary}\n")
            f.write(f"Hex:     {self.last_hex}\n")

# Start GUI
root = tk.Tk()
app = FixedPointGUI(root)
root.mainloop()

Figure 4 : Fixed point number calculator (top), output file (bottom)

Testing

Testing software is always a joy, but its a doubly fun when its written in assembler :). Confess, so some tasks i do just sit down and code, but even then you always hit that point where something does not work, in these cases you need to write code to test your code, rather than just random guesses. Sooooo, for these subroutine we need a framework to pass arguments can test results. This will vary depending on subroutine, but as an example consider the neg subroutines in the next section, these will be passed one argument and return one result sooo:

###################
# TEST CODE 16bit #
###################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0=0
  .data 0x0000            # result
  .data 0xFFFF            # test data -1=1
  .data 0x0001            # result
  .data 0x0001            # test data 1=-1
  .data 0xFFFF            # result
  .data 0x00FF            # test data 255=-255
  .data 0xFF01            # result
data_end:
  .data 0                 # finished

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print to screen

  call neg16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y               # read code result
  store ra 0xFFF          # print to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

Run this program for 100us, if all is fine the code will be trapped in an infinite loop at address 2, if a test fails it will enter an infinite loop at address 1. This is very easy to see in the simulation. However, if it does fail how can you tell what test failed? That where the writing to address 0xFFF comes in. Within the VHDL test bench there is a monitor process (shown below) that whats for data to be written to address 0xFFF, when it is this value is printed to the simulation terminal, as shown in figure 5.

monitor : PROCESS( ADDR, DATA_OUT, RAM_WR )
  VARIABLE L : line;
BEGIN
  if RAM_WR'event and RAM_WR='1'
  then
    if ADDR=x"FFF"
    then
      RESULT <= DATA_OUT;
      write(L, now);
      write(L, string'(" : DATA = "));
      write(L, DATA_OUT);
      write(L, string'(" = "));
      write(L, integer'image(to_integer(unsigned(DATA_OUT))));		  
      writeline(output, L);
    end if;
  end if;
END PROCESS;

Figure 5 : Simulation debug messages

Negate

To convert positive numbers into negative numbers and vice-versa we use 2's complement i.e. invert the bits and add 1. To invert each bit position we could use the XOR instruction, alternatively we can use the subtract instruction, as shown in figure 6. Subtracting the value to be converted from an all 1s value will invert each bit i.e. 1-1=0 and 1-0=1. Then add 1 to the result.

Figure 6 : 16-bit 2's complement

Test code for 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0=0
  .data 0x0000            # result
  .data 0xFFFF            # test data -1=1
  .data 0x0001            # result
  .data 0x0001            # test data 1=-1
  .data 0xFFFF            # result
  .data 0x00FF            # test data 255=-255
  .data 0xFF01            # result
data_end:
  .data 0                 # finished

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT to screen

  call neg16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y               # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# NEG SUBROUTINES #
###################

# description: negate 16bit data
# result (16bit) = Y (16bit) <= -W (16bit) 
# input: W (operand)
# output: Y (result)

neg16:
  move ra 0xFF   # set RA to 0xFFFF
  subm ra W      # subtract data to invert
  add ra 1       # increment  
  store ra Y     # save result
  ret

Test code for 32bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 32bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0=0
  .data 0x0000
  .data 0x0000            # result
  .data 0x0000
  .data 0xFFFF            # test data -1=1
  .data 0xFFFF
  .data 0x0001            # result
  .data 0x0000
  .data 0x0001            # test data 1=-1
  .data 0x0000
  .data 0xFFFF            # result
  .data 0xFFFF
  .data 0x00FF            # test data 255=-255
  .data 0x0000
  .data 0xFF01            # result
  .data 0xFFFF
data_end:
  .data 0                 # finished

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W_LOW          # store in working variable
  store ra 0xFFF          # print INPUT LOW to screen
  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read data
  store ra W_HIGH         # store in working variable
  store ra 0xFFF          # print INPUT HIGH to screen

  call neg32              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT LOW to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH           # read code result
  store ra 0xFFF          # print OUTPUT HIGH to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# NEG SUBROUTINES #
###################

# description: negate 32bit data
# result (32bit) = Y_HIGH (16bit) || Y_LOW (16bit) <= -( W_HIGH (16bit) || W_LOW (16bit) ) 
# input: W_HIGH (operand), W_LOW (operand)
# output: Y_HIGH (result), Y_LOW (result)

neg32:
  move ra 0xFF       # set RA to 0xFFFF
  subm ra W_HIGH     # subtract data to invert
  store ra Y_HIGH     
  move ra 0xFF       # set RA to all 1s i.e. 0xFF sign extended to 0xFFFF
  subm ra W_LOW      # subtract data to invert
  add ra 1
  store ra Y_LOW     # write low word
  move ra 0
  addmc ra Y_HIGH 
  store ra Y_HIGH    # write high word
  ret

Stack based solution, 16bit arguments and results are transferred between the caller and callee via the stack:

#############################
# TEST CODE : STACK - 16bit #
#############################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0=0
  .data 0x0000            # result
  .data 0xFFFF            # test data -1=1
  .data 0x0001            # result
  .data 0x0001            # test data 1=-1
  .data 0xFFFF            # result
  .data 0x00FF            # test data 255=-255
  .data 0xFF01            # result
data_end:
  .data 0                 # finished

#  ------------  
#  |    RC    |    FP+1   INPUT 
#  ------------ 
#  |    FP    |
#  ------------ 

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra 0xFFF          # print INPUT to screen
  move rc ra
  call push16

  call neg16_stack        # process data
  call pop16              # pop results off stack
 
  move ra rc
  store ra 0xFFF          # print OUTPUT to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  sub rc ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# NEG SUBROUTINES #
###################

# description: negate 16bit data
# result = FP+1 (16bit) <= -(FP+1) (16bit) 
# input: FP+1 (operand)
# output: FP+1 (result)
# caller clean-up : 0

neg16_stack:
  call stack_subroutine_enter 
      
  load ra FRAME_POINTER        # push old FP to stack, update FP and SP         
  add ra 1
  load rc (ra)                 # read data from stack FP+1
  move rd 0xFF
  sub rd rc                    # invert
  add rd 1                     # add 1
  store rd (ra)                # write data to stack FP+1

  call stack_subroutine_exit   # pop old FP off stack, update FP and SP  
  ret

Stack based solution, 32bit arguments and results are transferred between the caller and callee via the stack:

#############################
# TEST CODE : STACK - 32bit #
#############################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0=0
  .data 0x0000
  .data 0x0000            # result
  .data 0x0000
  .data 0xFFFF            # test data -1=1
  .data 0xFFFF
  .data 0x0001            # result
  .data 0x0000
  .data 0x0001            # test data 1=-1
  .data 0x0000
  .data 0xFFFF            # result
  .data 0xFFFF
  .data 0x00FF            # test data 255=-255
  .data 0x0000
  .data 0xFF01            # result
  .data 0xFFFF
data_end:
  .data 0                 # finished

#  ------------  
#  |    RD    |    FP+2   INPUT HIGH
#  ------------ 
#  |    RC    |    FP+1   INPUT LOW
#  ------------ 
#  |    FP    |
#  ------------ 

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra 0xFFF          # print INPUT LOW to screen
  move rc ra

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra 0xFFF          # print INPUT HIGH to screen
  move rd ra

  call push32             # push data onto stack
  call neg32_stack        # process data 
  call pop32              # pop results off stack

  move ra rc              # print OUTPUT LOW to screen
  store ra 0xFFF 
  move ra rd              # print OUTPUT HIGH to screen
  store ra 0xFFF 

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result low
  sub rc ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result high
  sub rd ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# NEG SUBROUTINES #
###################

# description: negate 32bit data 
# result (32bit) = FP+2 (16bit) || FP+1 (16bit) <= -( FP+2 (16bit) || FP+1 (16bit) ) 
# input: (high operand) FP+2, FP+1 (low operand) 
# output: (high result) FP+2, FP+1 (low result)  
# caller clean-up : 0

neg32_stack:
  call stack_subroutine_enter  # push old FP to stack, update FP and SP  
      
  load ra FRAME_POINTER        # read arguments into RC and RD
  add ra 1
  load rc (ra)                 # read low data from stack FP+1
  add ra 1
  load rd (ra)                 # read high data from stack FP+2

  move ra 0xFF                 # set RA to 0xFFFF
  sub ra rc                    # subtract low data to invert
  move rc ra

  move ra 0xFF                 # set RA to 0xFFFF
  sub ra rd                    # subtract high data to invert
  move rd ra

  add rc 1                     # add 1
  addc rd 0

  load ra FRAME_POINTER        # save result to stack
  add ra 1
  store rc (ra)
  add ra 1
  store rd (ra)

  call stack_subroutine_exit   # pop old FP off stack, update FP and SP  
  ret

A key thing to note between these two implementations i.e. passing arguments / results using variables or stack, is the significant difference in processing times, as shown in figure 7. If the processor is running at 10MHz, the 16bit variable implementation takes approx 3us, whilst the 16bit stack implementations takes approx 20us, sooo is about seven times slower. This is the cost of recursion / nested subroutines, so deciding when and where to use, or not use the stack can increase processing performance.

Figure 7 : variable (top) and stack (bottom) processing times

Addition

Figure 8 : 16-bit addition

Add two unsigned 16bit values to produce a 17bit result. Perhaps a little overkill to implement as a subroutine, but felt odd the leave it out. As the result could be larger than 16bits i.e. max is 0xFFFF + 0xFFFF = 0x1FFFE, we need to use two memory locations, to store that extra 1bit. In this implementation i used the SHL instruction to move the carry flag bit into the LSB of the high result word i.e. if no carry generate shifts in a 0, if a carry is generated shifts in a 1.

Test code for unsigned 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0+0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255+255=510
  .data 0x00FF
  .data 0x01FE            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095+4095=8190
  .data 0x0FFF
  .data 0x1FFE            # result 
  .data 0x0000
  .data 0xFFFF            # test data 65535+1=65536 = 0x10000
  .data 0x0001
  .data 0x0000            # result
  .data 0x0001  
  .data 0xFFFF            # test data 65535+65535=131070 = 0x1FFFE
  .data 0xFFFF 
  .data 0xFFFE            # result 
  .data 0x0001       
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call add16u             # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT LOW to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT HIGH to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# ADD SUBROUTINES #
###################

# description: 16bit unsigned addition
# result (17bit) = Y_HIGH (1bit) || Y_LOW (16bit) <= W (16bit) + X (16bit)
# input: W (operand), X (operand)
# output: (high result) Y_HIGH (16bit) || Y_LOW (16bit) (low result)

add16u:
  load ra W            # read W into RA
  addm ra X            # RA = W + X
  store ra Y_LOW       # save low word result
  move ra 0            # clear RA
  shl ra               # shift left, move CY flag into LSB
  store ra Y_HIGH      # save high word result
  ret

A complexity comes when we consider signed values e.g. 0xFFFF is -1 as a signed representation, so -1 + -1 = -2 which is the value 0xFFFFFFFE. However, if you use the previous subroutine you will produce the value 0x0001FFFE. Therefore, we need different subroutines for signed and unsigned values i.e. carry operations are processed differently, or to put it another way if we wish to keep the extra bit generated by the signed addition, we need to sign extend the 16bit values into 32bit values first. RULE NUMBER 1 of signed arithmetic : the final carry is ALWAYS ignored. Soooo our 16bit signed addition turns into a 32bit addition

Test code for signed 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0+0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255+255=510
  .data 0x00FF
  .data 0x01FE            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095+4095=8190
  .data 0x0FFF
  .data 0x1FFE            # result 
  .data 0x0000
  .data 0xFFFF            # test data -1+1=0 = 0x0000
  .data 0x0001
  .data 0x0000            # result
  .data 0x0000
  .data 0xFFFF            # test data -1+-1=-2 = 0xFFFFFFFE
  .data 0xFFFF
  .data 0xFFFE            # result
  .data 0xFFFF
  .data 0x8000            # test data −32768+−32768=−65536 = 0x8000+0x8000=0xFFFF0000
  .data 0x8000 
  .data 0x0000            # result 
  .data 0xFFFF       
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call add16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT LOW to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT HIGH to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# ADD SUBROUTINES #
###################

# description: 16bit signed addition
# result (32bit) = Y_HIGH (166bit) || Y_LOW (16bit) <= ((W(15)16 || W (16bit)) + 
#                                                      ((X(15)16 || X (16bit))
# input: W_HIGH || W_LOW (operand), X_HIGH || X_LOW (operand)
# output: (high result) Y_HIGH (16bit) || Y_LOW (16bit) (low result)

add16:
  move ra 0            # zero high words           
  store ra W_HIGH
  store ra X_HIGH

  load ra W            # read W
  rol ra
  and ra 1             # is MSB set?
  jumpz add16_x

  move ra 0xFF         # sign extend with 1s
  store ra W_HIGH

add16_x:
  load ra X            # read X 
  rol ra
  and ra 1             # is MSB set?
  jumpz add16_calc

  move ra 0xFF         # sign extend with 1s
  store ra X_HIGH

add16_calc:
  load ra W_LOW
  addm ra X_LOW        # RA = W_LOW + X_LOW
  store ra Y_LOW       # save low word result
  load ra W_HIGH
  addmc ra X_HIGH      # RA = W_HIGH + X_HIGH + C
  store ra Y_HIGH      # save high word result
  ret

Test code for unsigned 32bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 32bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0+0=0
  .data 0x0000
  .data 0x0000
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000
  .data 0x0000
  .data 0xFFFF            # test data 65535+65535=131070 = 0x1FFFE
  .data 0x0000
  .data 0xFFFF 
  .data 0x0000
  .data 0xFFFE            # result
  .data 0x0001
  .data 0x0000
  .data 0xFFFF            # test data 4,294,967,295+1 = 4,294,967,296
  .data 0xFFFF
  .data 0x0001
  .data 0x0000 
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x0001
data_end:
  .data 0                 # finished

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call add16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT LOW to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT HIGH to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# ADD SUBROUTINES #
###################

# description: 32bit addition
# result (33bit) = Z_LOW (1bit) || Y_HIGH (16bit) || Y_LOW (16bit) <= (W_HIGH (16bit) || W_HIGH (16bit)) + 
#                                                                     (X_HIGH (16bit) || X_HIGH (16bit))   
# input: W_HIGH || W_LOW (operand), X_HIGH || X_LOW (operand)
# output: (high result) Z_LOW (1bit) || Y_HIGH (16bit) || Y_LOW (16bit) (low result)

add32u:
  load ra W_LOW       # read W_LOW into RA
  addm ra X_LOW       # RA = W_LOW + X_LOW
  store ra Y_LOW      # save low word result
  load ra W_HIGH      # read W_HIGH into RA
  addmc ra X_HIGH     # RA = W_HIGH + X_HIGH
  store ra Y_HIGH     # save low word result
  move ra 0           # clear RA
  shl ra              # shift left, move CY flag into LSB
  store ra Z_LOW      # save high word result
  ret

The equivalent 32bit signed addition subroutine would again need the W and X variables to be signed extended i.e. from 32bit to 48bit or 64bit values. For the intended games console application i am not sure such a subroutine is needed, sooo for the moment i am not going to implement this subroutine.

Stack based solution, 16bit unsigned arguments and results are transferred between the caller and callee via the stack:

#############################
# TEST CODE : STACK - 16bit #
#############################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0+0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255+255=510
  .data 0x00FF
  .data 0x01FE            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095+4095=8190
  .data 0x0FFF
  .data 0x1FFE            # result 
  .data 0x0000
  .data 0xFFFF            # test data 65535+1=65536 = 0x10000
  .data 0x0001
  .data 0x0000            # result
  .data 0x0001  
  .data 0xFFFF            # test data 65535+65535=131070 = 0x1FFFE
  .data 0xFFFF 
  .data 0xFFFE            # result 
  .data 0x0001       
data_end:
  .data 0  

#  ------------ 
#  |    RC    |    FP+2   INPUT 0
#  ------------ 
#  |    RC    |    FP+1   INPUT 1
#  ------------ 
#  |    FP    |
#  ------------ 

test:
  load ra data_ptr          # read address of data 
  load ra (ra)              # read data
  store ra 0xFFF            # print INPUT 0 to screen
  move rc ra
  call push16               # push data to stack

  load ra data_ptr          # read address of data 
  add ra 1                  # increment
  store ra data_ptr
  load ra (ra)              # read test result
  store ra 0xFFF            # print INPUT 1 to screen
  move rc ra
  call push16               # push data to stack

  call add16u_stack         # process data

  call pop16                # pop result off stack
  move ra rc                # display OUTPUT LOW result in simulation
  store ra 0xFFF 

  load ra data_ptr          # read address of data 
  add ra 1                  # increment
  store ra data_ptr
  load ra (ra)              # read test result
  sub rc ra                 # equal?
  jumpnz fail               # no, stop fail

  call pop16                # pop result off stack
  move ra rc                # display OUTPUT HIGH result in simulation 
  store ra 0xFFF  

  load ra data_ptr          # read address of data 
  add ra 1                  # increment
  store ra data_ptr
  load ra (ra)              # read test result
  sub rc ra                 # equal?
  jumpnz fail               # no, stop fail

  load ra data_ptr          # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr    # have all tests been performed?
  jumpz pass                # yes, pass

  jump test                 # no, repeat

###################
# ADD SUBROUTINES #
###################

# description: 16bit unsigned addition 
# result (17bit) = FP+2 (1bit) + FP+1 (16bit) <= FP+1 (16bit) + FP+2 (16bit)
# input: FP+1 (operand), FP+2 (operand)
# output: (high result) FP+2 || FP+1 (low result)
# caller clean-up : 0

add16u_stack:
  call stack_subroutine_enter 
      
  load ra FRAME_POINTER        # read arguments into RC and RD
  add ra 1
  load rc (ra)                 # data
  add ra 1
  load rd (ra)                 # data

  add rd rc                    # add data               
  move rc rd
  move rd 0                    # carry bit
  shl rd 

  load ra FRAME_POINTER
  add ra 2
  store rd (ra)                # save high word result
  sub ra 1
  store rc (ra)                # save low word result

  call stack_subroutine_exit
  ret

Stack based solution, 32bit unsigned arguments and results are transferred between the caller and callee via the stack:

#############################
# TEST CODE : STACK - 32bit #
#############################

start:
  jump test                   # test code
fail: 
  jump fail                   # if simulation finished with address 1 = failed test
pass:
  jump pass                   # if simulation finished with address 2 = passed test
trap:
  jump trap                   # debug

data_ptr:
  .data data                  # index into array
data_stop_pntr:
  .data data_end              # address of end of array

data:
  .data 0x0000                # test data 0+0=0
  .data 0x0000
  .data 0x0000
  .data 0x0000
  .data 0x0000                # result 
  .data 0x0000
  .data 0x0000
  .data 0xFFFF                # test data 65535+65535=131070 = 0x1FFFE
  .data 0x0000
  .data 0xFFFF 
  .data 0x0000
  .data 0xFFFE                # result
  .data 0x0001
  .data 0x0000
  .data 0xFFFF                # test data 4,294,967,295+1 = 4,294,967,296
  .data 0xFFFF
  .data 0x0001
  .data 0x0000 
  .data 0x0000                # result 
  .data 0x0000 
  .data 0x0001
data_end:
  .data 0                     # finished

#  ------------  
#  |    RD    |    FP+4   INPUT 0 HIGH
#  ------------ 
#  |    RC    |    FP+3   INPUT 0 LOW
#  ------------ 
#  |    RD    |    FP+2   INPUT 1 HIGH
#  ------------ 
#  |    RC    |    FP+1   INPUT 1 LOW
#  ------------ 
#  |    FP    |
#  ------------ 

test:
  load ra data_ptr            # read address of data 
  load ra (ra)                # read data
  store ra 0xFFF              # print INPUT 0 LOW to screen
  move rc ra

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  store ra 0xFFF              # print INPUT 0 HIGH to screen
  move rd ra

  call push32                 # push data onto stack

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read data
  store ra 0xFFF              # print INPUT 1 LOW to screen
  move rc ra

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  store ra 0xFFF              # print INPUT 1 HIGH to screen
  move rd ra

  call push32                 # push data onto stack

  call add32u_stack           # process data

  call pop32                  # pop result off stack
  move ra rc
  store ra 0xFFF              # print OUTPUT LOW to screen
  move ra rd
  store ra 0xFFF              # print OUTPUT MID to screen
 
  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  sub rc ra                   # equal?
  jumpnz fail                 # no, stop fail

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  sub rd ra                   # equal?
  jumpnz fail                 # no, stop fail

  call pop16                  # pop result off stack
  move ra rc                  # print OUTPUT HIGH to screen
  store ra 0xFFF 

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  sub rc ra                   # equal?
  jumpnz fail                 # no, stop fail

  load ra data_ptr            # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr      # have all tests been performed?
  jumpz pass                  # yes, pass

  move rc 1                   # remove old argument 
  call stack_remove_arguments
  jump test                   # no, repeat

###################
# ADD SUBROUTINES #
###################

# description: 32bit unsigned addition 
# result (33bit) = FP+3 (1bit) || FP+2 (16bit) || FP+1 (16bit) <= (FP+2 (16bit) || FP+1 (16bit)) +
#                                                                 (FP+4 (16bit) || FP+3 (16bit))  
# input: FP+2 || FP+1 (operand), FP+4 || FP+3 (operand)
# output: (high result) FP+3 (1bit) || FP+2 (16bit) || FP+1 (16bit) (low result)
# caller clean-up : 1

add32u_stack:
  call stack_subroutine_enter 
   
  load ra FRAME_POINTER        # read low data
  add ra 3
  load rc (ra)             
  sub ra 2
  load rd (ra)

  add rc rd                    # add data
  store rc (ra)
  move rb 0                    # carry bit
  shl rb 

  add ra 3
  load rc (ra)  
  sub ra 2
  load rd (ra)

  add rc rd                    # add data
  add rc rb          
  store rc (ra)
  move rb 0                    # carry bit
  shl rb 
  
  add ra 1
  store rb (ra)

  call stack_subroutine_exit
  ret

I have not implemented a stack based 16bit signed addition subroutine. The thoughts here were that if you do need this subroutine it can be implemented using the 32bit unsigned add stack based subroutine i.e. argument sign extension is performed by the caller code.

Addition Fixed-point

Figure 9 : fixed point addition

The previous add16 and add32 subroutines can also be used to perform fixed-point calculations. A fixed-point number is a binary number with an imaginary decimal point i.e. a user defined decimal point, as shown in figure 9. Remember, the decimal point is not represented in hardware. In this example we have a Q16.16 representations of the values 123.456 and 89.012. These values are not exact powers of 2, therefore, there will be a rounding error when these are converted into a binary representation. These binary values are then just added together using the signed or unsigned Add subroutines. Note, when processing integer and fixed-point values you must to make sure you align the decimal points, consider the signed Q4.4 fixed point and 4bit integer values below:

15.9375 = 01111.1111
10.5    = 01010.1000
-10.5   = 10101.1000
-16     = 10000.0000

16      = 10000.
10      = 1010.
5       = 101.

10.5 + -10.5 = 01010.1000
               10101.1000
               ----------
               00000.0000  == 0
               ----------
              111111

10.5 + 5 =     01010.1000
               00101.0000   -- Aligned decimal points
               ----------
               01111.1000  == 15.5
               ----------

10.5 + -16 =   01010.1000
               10000.0000
               ----------
               11010.1000  == 00101.0111 + 1 = 101.1 = -5.5
               ----------

Note, as for integer representations if you represent a signed value you loose the MSB i.e. it becomes the sign bit, therefore, half the range. To convert a signed fixed point value into a negative value, ignore the decimal point, invert and add 1 as normal.

Accumulation

A variation on a theme here, rather than adding two 16bit numbers together we add one 16bit number to an accumulator, a running total, so these subroutines are only passed one argument. Decided not to implement a stack based version for this function as it wouldn't make sense i.e. where would the accumulator be stored? A stack is a dynamic thing used to store arguments and results. The caller code could push onto the stack space for this value, but this felt a little odd, so decided to just implement the variable based solution.

Test code for 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x000F            # test data 0+F       = F
  .data 0x00FF            #           F+FF      = 10E
  .data 0x0FFF            #           10E+FFF   = 110D 
  .data 0xFFFF            #           110D+FFFF = 1110C
  .data 0x110C            # result
  .data 0x0001       
data_end:
  .data 0                 # finished

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  call acc16              # process data (F)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call acc16              # process data (FF)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 2 to screen

  call acc16              # process data (FFF)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 3 to screen

  call acc16              # process data (FFFF)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# ACC SUBROUTINES #
###################
 
# description: accumulate 16bit
# result (32bit) =  Y_HIGH (2bit) || Y_LOW (16bit) <= (Y_HIGH (16bit) || Y_LOW (16bit)) + 
                                                      ((0)16 (16bit)  || W (16bit)) 
# input: W (16bit) (operand)
# output: (high result) Y_HIGH (16bit) || Y_LOW (16bit) (low result)

acc16:
  load ra Y_LOW      # read Y_LOW
  addm ra W          # 
  store ra Y_LOW     # save low word result
  move ra 0          # clear RA
  addmc ra Y_HIGH    # add Y_HIGH, add in carry
  store ra Y_HIGH    # save high word result
  ret

Test code for 32bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0xFFFF            # test data 0+FFFFF           = FFFFF
  .data 0x000F             
  .data 0xFFFF            #           FFFFF+FFFFFF      = 10FFFFE
  .data 0x00FF            
  .data 0xFFFF            #           10FFFFE+FFFFFFF   = 110FFFFD 
  .data 0x0FFF
  .data 0xFFFF            #           110FFFFD+FFFFFFFF = 1110FFFFC
  .data 0xFFFF 
  .data 0xFFFC            # result
  .data 0x110F
  .data 0x0001 
  .data 0x0000     
data_end:
  .data 0                 # finished

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W_LOW          # store in working variable
  store ra 0xFFF          # print INPUT 0 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_HIGH         # store in working variable
  store ra 0xFFF          # print INPUT 0 HIGH to screen

  call acc32              # process data (FFFFF)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_LOW          # store in working variable
  store ra 0xFFF          # print INPUT 1 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_HIGH         # store in working variable
  store ra 0xFFF          # print INPUT 1 HIGH to screen

  call acc32              # process data (FFFFFF)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_LOW          # store in working variable
  store ra 0xFFF          # print INPUT 2 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_HIGH         # store in working variable
  store ra 0xFFF          # print INPUT 2 HIGH to screen

  call acc32              # process data (FFFFFFF)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_LOW          # store in working variable
  store ra 0xFFF          # print INPUT 3 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_HIGH         # store in working variable
  store ra 0xFFF          # print INPUT 3 HIGH to screen

  call acc32              # process data (FFFFFFFF)

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT LOW to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT MID LOW to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Z_LOW           # read code result
  store ra 0xFFF          # print OUTPUT MID HIGH to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Z_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT HIGH to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# ACC SUBROUTINES #
###################

# description: accumulate 32bit 
# result (64bit) =  Z_HIGH (16bit) || Z_LOW (16bit) || Y_HIGH (16bit) || Y_LOW (16bit) <= (Z_HIGH (16bit) || Z_LOW (16bit) || Y_HIGH (16bit) || Y_LOW (16bit)) +
#                                                                                         ((0)16 (16bit)  || (0)16 (16bit) || W_HIGH (16bit) || W_LOW (16bit)) +
# input: W_HIGH, W_LOW (operand)
# output: (high result) Z_HIGH (16bit) || Z_LOW (16bit) || Y_HIGH (16bit) || Y_LOW (16bit) (low result)

acc32:
  load ra Y_LOW      # read Y_LOW
  addm ra W_LOW      # 
  store ra Y_LOW     # save result
  load ra Y_HIGH     # read Y_HIGH
  addmc ra W_HIGH    # 
  store ra Y_HIGH    # save result
  move ra 0          #
  addmc ra Z_LOW     # 
  store ra Z_LOW     # save result
  move ra 0          #
  addmc ra Z_HIGH    # 
  store ra Z_HIGH    # save result
  ret

Subtraction

Figure 10 : 16-bit subtraction - positive result

Figure 11 : 16-bit subtraction - negative result

Subtract two unsigned 16bit values to produce a 16bit result. Again perhaps a little overkill to implement as a subroutine, but included for completeness. The joy of subtracting unsigned values is that the result will always be smaller than the original values, so no carries to worry about, not complexities of capturing that extra bit. However, you can generate a signed result as shown in figure 10, sooo you do need to consider this when allocating variables, recognise you will be working with a 15bit number range i.e. signed 16bit values. This is also an important point to remember if this result is passed to other functions e.g. multiply or divide, as these are unsigned only implementations.

Note, showing how the borrows work in figures 10 and 11 was a little tricky, particularly when looking at hexadecimal i.e. borrowing 16 rather than 2. Sooo the number above the hex digits are that columns value after the borrow, hopefully that makes sense :). The hex, dec and bin representations and the result of these calculations are shown below:

0x567  = 0000 0101 0110 0111 = 1383
0x1234 = 0001 0010 0011 0100 = 4660

0x1234 – 0x567 = 4660 – 1383 = 3277 = 0xCCD
0x567 – 0x1234 = 1383 – 4660 = −3277 = 0xF130

3277 = 0xCCD = 0000 1100 1100 1101
               1111 0011 0011 0010
-3277 =        1111 0011 0011 0011 = 0xF333

Hopefully the above makes sense, gives a better understand of the results from the two examples, the calculations and the conversion of the result -3277 into a signed binary value. As always to convert a negative value into its positive value just invert and add 1 :).

Test code for unsigned 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0-0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x00FF            # test data 255-255=0
  .data 0x00FF
  .data 0x0000            # result 
  .data 0x00FF            # test data 255-0=255
  .data 0x0000
  .data 0x00FF            # result 
  .data 0x0000            # test data 0-255=-255  = 0x00FF = 0xFF00+1
  .data 0x00FF
  .data 0xFF01            # result
  .data 0xFFFF            # test data -1--1=0 
  .data 0xFFFF
  .data 0x0000            # result   
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call sub16u             # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y               # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# SUB SUBROUTINES #
###################

# description: 16bit unsigned subtraction
# result (16bit) = Y (16bit) <= W (16bit) - X (16bit)
# input: W (operand), X (operand)
# output: Y (result)

sub16u:
  load ra W
  subm ra X
  store ra Y
  ret

Test code for unsigned 32bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 32bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0-0=0
  .data 0x0000
  .data 0x0000
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0xFFFF            # test data FFFFF-FFFFF=0
  .data 0x000F
  .data 0xFFFF
  .data 0x000F
  .data 0x0000            # result 
  .data 0x0000 
  .data 0xFFFF            # test data FFFFF-0=FFFFF
  .data 0x000F
  .data 0x0000
  .data 0x0000
  .data 0xFFFF            # result 
  .data 0x000F
  .data 0x0000            # test data 0-FFFFF=-FFFFF  = 0x000FFFFF = 0xFFF00000+1
  .data 0x0000
  .data 0xFFFF
  .data 0x000F
  .data 0x0001            # result
  .data 0xFFF0 
  .data 0xFFFF            # test data -1--1=0 
  .data 0xFFFF
  .data 0xFFFF
  .data 0xFFFF
  .data 0x0000            # result 
  .data 0x0000  
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call sub32u             # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y               # read code result
  store ra 0xFFF          # print OUTPUT LOW to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

###################
# SUB SUBROUTINES #
###################

# description: 32bit unsigned subtraction
# result (16bit) = Y_HIGH (16bit) || Y_LOW (16bit) <= (W_HIGH (16bit) || W_LOW (16bit)) - (X_HIGH (16bit) || X_LOW (16bit))
# input: W (operand), X (operand)
# output: Y (result)

sub32u:
  load ra W_LOW
  subm ra X_LOW
  store ra Y_LOW
  load ra W_HIGH
  submc ra X_HIGH
  store ra Y_HIGH
  ret

Stack based solution, unsigned 16bit arguments and results are transferred between the caller and callee via the stack:

#############################
# TEST CODE : STACK - 16bit #
#############################

start:
  jump test                    # test code
fail:
  jump fail                    # if simulation finished with address 1 = failed test
pass:
  jump pass                    # if simulation finished with address 2 = passed test
trap:
  jump trap                    # debug

data_ptr:
  .data data                   # index into array
data_stop_pntr:
  .data data_end               # address of end of array

data:
  .data 0x0000                 # test data 0-0=0
  .data 0x0000
  .data 0x0000                 # result 
  .data 0x00FF                 # test data 255-255=0
  .data 0x00FF
  .data 0x0000                 # result 
  .data 0x00FF                 # test data 255-0=255
  .data 0x0000
  .data 0x00FF                 # result 
  .data 0x0000                 # test data 0-255=-255  = 0x00FF = 0xFF00+1
  .data 0x00FF
  .data 0xFF01                 # result
  .data 0xFFFF                 # test data -1--1=0 
  .data 0xFFFF
  .data 0x0000                 # result   
data_end:
  .data 0  

#  ------------  
#  |    RC    |    FP+2   INPUT 0 
#  ------------ 
#  |    RC    |    FP+1   INPUT 1 
#  ------------ 
#  |    FP    |
#  ------------ 

test:
  load ra data_ptr             # read address of data 
  load ra (ra)                 # read data
  store ra 0xFFF               # print INPUT 0 to screen

  move rc ra
  call push16                  # push to stack

  load ra data_ptr             # read address of data 
  add ra 1                     # increment
  store ra data_ptr
  load ra (ra)                 # read test result
  store ra 0xFFF               # print INPUT 1 to screen

  move rc ra
  call push16                  # push to stack

  call sub16u_stack            # process data

  call pop16                   # pop result off stack
  move ra rc                   # display OUTPUT to screen
  store ra 0xFFF 

  load ra data_ptr             # read address of data 
  add ra 1                     # increment
  store ra data_ptr
  load ra (ra)                 # read test result
  sub rc ra                    # equal?
  jumpnz fail                  # no, stop fail

  load ra data_ptr             # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr       # have all tests been performed?
  jumpz pass                   # yes, pass

  move rc 1
  call stack_remove_arguments  # remove argument from stack
  jump test                    # repeat

###################
# SUB SUBROUTINES #
###################

# description: 16bit unsigned subtraction 
# result (16bit) = FP+1 (16bit) <= FP+2 (16bit) - FP+1 (16bit)
# input: FP+1 (operand), FP+2 (operand)
# output: FP+1 (result)
# caller clean-up : 1

sub16u_stack:
  call stack_subroutine_enter 
      
  load ra FRAME_POINTER        # read arguments into RC and RD
  add ra 1
  load rc (ra)                 # data
  add ra 1
  load rd (ra)                 # data

  sub rd rc                    # add data               

  load ra FRAME_POINTER
  add ra 1
  store rd (ra)                # save word result

  call stack_subroutine_exit
  ret

Stack based solution, unsigned 32bit arguments and results are transferred between the caller and callee via the stack:

#############################
# TEST CODE : STACK - 32bit #
#############################

start:
  jump test                   # test code
fail: 
  jump fail                   # if simulation finished with address 1 = failed test
pass:
  jump pass                   # if simulation finished with address 2 = passed test
trap:
  jump trap                   # debug

data_ptr:
  .data data                  # index into array
data_stop_pntr:
  .data data_end              # address of end of array

data:
  .data 0x0000            # test data 0-0=0
  .data 0x0000
  .data 0x0000
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0xFFFF            # test data FFFFF-FFFFE=1
  .data 0x000F
  .data 0xFFFE
  .data 0x000F
  .data 0x0001            # result 
  .data 0x0000 
  .data 0xFFFF            # test data FFFFF-0=FFFFF
  .data 0x000F
  .data 0x0000
  .data 0x0000
  .data 0xFFFF            # result 
  .data 0x000F
  .data 0x0000            # test data 0-FFFFF=-FFFFF  = 0x000FFFFF = 0xFFF00000+1
  .data 0x0000
  .data 0xFFFF
  .data 0x000F
  .data 0x0001            # result
  .data 0xFFF0 
  .data 0xFFFF            # test data -1--1=0 
  .data 0xFFFF
  .data 0xFFFF
  .data 0xFFFF
  .data 0x0000            # result 
  .data 0x0000  
data_end:
  .data 0  

#  ------------  
#  |    RD    |    FP+2   INPUT 0 HIGH
#  ------------ 
#  |    RC    |    FP+1   INPUT 0 LOW
#  ------------
#  |    RD    |    FP+2   INPUT 1 HIGH
#  ------------ 
#  |    RC    |    FP+1   INPUT 1 LOW
#  ------------
#  |    FP    |
#  ------------ 

test:
  load ra data_ptr            # read address of data 
  load ra (ra)                # read data
  store ra 0xFFF              # print INPUT 0 LOW to screen
  move rc ra

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  store ra 0xFFF              # print INPUT 0 HIGH to screen
  move rd ra

  call push32                 # push data onto stack

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  store ra 0xFFF              # print INPUT 1 LOW to screen
  move rc ra

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  store ra 0xFFF              # print INPUT 1 HIGH to screen
  move rd ra

  call push32                 # push data onto stack

  call sub32u_stack           # process data

  call pop32                  # pop result off stack

  move ra rc
  store ra 0xFFF              # print OUTPUT LOW to screen
  move ra rd
  store ra 0xFFF              # print OUTPUT HIGH to screen

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  sub rc ra                   # equal?
  jumpnz fail                 # no, stop fail

  load ra data_ptr            # read address of data 
  add ra 1                    # increment
  store ra data_ptr
  load ra (ra)                # read test result
  sub rd ra                   # equal?
  jumpnz fail                 # no, stop fail

  load ra data_ptr            # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr      # have all tests been performed?
  jumpz pass                  # yes, pass

  move rc 2
  call stack_remove_arguments  # remove argument from stack
  jump test                    # no, repeat

###################
# SUB SUBROUTINES #
###################

# description: 32bit unsigned subtraction 
# result (32bit) = FP+2 (16bit) || FP+1 (16bit) <= (FP+4 (16bit) || FP+3 (16bit)) -
#                                                  (FP+2 (16bit) || FP+1 (16bit)) 
# input: FP+2 || FP+1 (operand), FP+4 || FP+3 (operand)
# output: (high result) FP+2 (16bit) || FP+1 (16bit) (low result)
# caller clean-up : 2

sub32u_stack:
  call stack_subroutine_enter 
   
  load ra FRAME_POINTER        # read low data

  add ra 3
  load rc (ra)             
  sub ra 2
  load rd (ra)

  sub rc rd                    # sub data
  store rc (ra)
  move rb 0                    # buffer carry bit
  shl rb 

  add ra 3                     # read high data
  load rc (ra)  
  sub ra 2
  load rd (ra)

  shr rb                       # restore carry bit
  subc rc rd                   # sub data                  
  store rc (ra)                     

  call stack_subroutine_exit
  ret

Again its a little more complex when we come to signed arithmetic and the sign bit e.g. 0x8000 is the minimum 2s complement value, so 0x8000 - 0x0001 will generate a negative overflow, as we need more bits to represent this value, as shown in the conversions below.

0x8000  - 0x0001 = 0x7FFF  WRONG
0xF8000 - 0x0001 = 0xF7FFF CORRECT

07FFF   =   0000 0111 1111 1111 1111
            1111 1000 0000 0000 0000
            1111 1000 0000 0000 0001  = 0xF8001 (incorrect, neg value)   
F7FFF   =   1111 0111 1111 1111 1111
            0000 1000 0000 0000 0000
            0000 1000 0000 0000 0001  = 0x08001 (correct, pos value)

Figure 12 : 16-bit subtraction - negative overflow

Sooo, when working with signed values we need to consider overflows, sooo we need to pre-sign extend our arguments to capture any carries. Thats not to say that the previous sub16 and sub32 subroutines do not work for signed values, its just that they will not handle carries correctly. However, for the type of code we will be writing i.e. for the video games console, we could consider these as edge cases, calculations that would never occurs. The range for 16bit and 32bit values are:

16BIT
-----
integer signed            : MIN = −32768  MAX = +32767
integer unsigned          : MIN = 0       MAX = +65535
fixed-point Q8.8 signed   : MIN = −128    MAX = +127.99609375  Resolution = 0.00390625
fixed-point Q8.8 unsigned : MIN = 0       MAX = +255.99609375  Resolution = 0.00390625

32BIT
-----
integer signed              : MIN = −2,147,483,648  MAX = +2,147,483,647
integer unsigned            : MIN = 0               MAX = +4,294,967,295
fixed-point Q16.16 signed   : MIN = −32768          MAX = +32767.9999847412  Resolution = 0.0000152587890625
fixed-point Q16.16 unsigned : MIN = 0               MAX = +65535.9999847412  Resolution = 0.0000152587890625

For most calculations signed 16bit values i.e. +/- 32K will be fine. Additional thought may be needed when dealing with fixed-point values, but again, i think these subroutines should be fine, sooo, to save time being i'm going stick with what i have.

Subtraction Fixed-point

Figure 13 : fixed point subtraction

Like the fixed-point add we can use the previous integer suroutines to process our fixed-point values i.e. sub16 and sub32 subroutines. Again the key point to remember is to remember where the decimal point is when processing integer and fixed-poiint values.

Multiplication

Figure 14 : multiply

The nice thing about base-2 multiplication is that when you perform long multiplication you don't have to do any "multiplication" :), i.e. when the multiplier bit is a 0 you write out 0s and when its a 1 you write out the multiplicand. The hardware just has to calculate the values: multiplicand x 0, or multiplicand x 1. These values are then added together to produce the partial product, as shown in figure 14.

The processor does have an 8bit hardware multiplier unit, producing a 16bit result, we will consider later how this could be used to multiply 16bit values, but initially i'm going to use a more general purpose multiplication algorithm (Link). There are a few different approaches to select from, but i'm going to keep it simple and go for the classic shift-and-add approach. The operation of this algorithm is described by the flowchart in figure 15.

Figure 15 : shift-and-add flowchart

This algorithm follows the basic steps of binary multiplication described in figure 14. However, rather than the partial product "growing" to the left, in the software implementation we shift the partial product to the right i.e. so that addition step is always performed on the same bit positions, or to put it another way, the LSB of each partial product is not used in future calculations, so shifting this bit to the right, out of the working register is a useful thing to do :). Therefore, a key thing to identify here is that the multiplier variable X is overwritten by the result, as when performing this algorithm we only need to examine the multiplier's LSB.

Note, key thing to remember is that the Y_HIGH=Y_HIGH+W step could generate a 17bit result, sooo the carry flag (C) needs to also be shifted when the right shift is performed as this is the 17th result bit. This is automatically done when we use the SHR instruction. We do not need to test if an overflow was generated as the carry flag will be set accordingly after the add i.e. no carry C=0, carry C=1.

Test code for unsigned 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025  == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225  == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call mul16u             # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

##################
# MUL SUBROUTINE #
##################

# description: unsigned multiplication 
# result (32bit) = Y_HIGH (16bit) || Y_LOW (16bit) = W (16bit) * X (16bit)
# input: W (multiplicand), X (multiplier)
# output: (high result) Y_HIGH, Y_LOW (low result)

mul16u:
  move RA 0             # zero result
  store RA Y_HIGH
  move RA 16      
  store RA CNT          # Loop counter = 16 bits

mul16u_loop:
  load RA X
  and RA 1              # Test LSB of multiplier
  jumpz mul16u_no_add    # If LSB = 0, skip add

  load RA Y_HIGH        # add multiplicand to partial product
  addm RA W
  store RA Y_HIGH

mul16u_no_add:
  load RA Y_HIGH        # shift partial product
  shr RA
  store RA Y_HIGH

  load RA Y_LOW         # shift partial product
  shr RA
  store RA Y_LOW

  load RA X
  ror RA                # rotate multiplier right by 1 (next bit)
  store RA X

  load RA CNT         
  sub RA 1              # Decrement counter
  store RA CNT
  jumpnz mul16u_loop     # Repeat if bits remain
  ret

To help illustrate how this algorithm works figure 16 shows the steps involved in performing the calculation: 123*42=5166 i.e. the steps needed to perform the multiplication shown in figure 14. To process these 16bit values we need to examine each multiplier bit, sooo, there will be 16 steps, in which data i.e. multiplicand and partial product, is added and shifted.

Figure 16 : shift-and-add steps

Note, an N-bit * M-bit calculation will produce a N+M-bit result, sooo, the previous 16bit * 16bit calculation will generate a 32bit result.

Test code for unsigned 32bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 32bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000
  .data 0x0000
  .data 0x0000
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x0000
  .data 0x00FF
  .data 0x0000
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0000
  .data 0x0000
  .data 0x0FFF            # test data 4095*4095=16769025  == 0x0FFF*0x0FFF=FFE001
  .data 0x0000
  .data 0x0FFF
  .data 0x0000
  .data 0xE001            # result 
  .data 0x00FF
  .data 0x0000
  .data 0x0000
  .data 0xFFFF            # test data 65535*65535=4294836225  == 0xFFFF*0xFFFF=FFFE0001
  .data 0x0000
  .data 0xFFFF
  .data 0x0000
  .data 0x0001            # result 
  .data 0xFFFE 
  .data 0x0000
  .data 0x0000
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W_LOW          # store in working variable
  store ra 0xFFF          # print INPUT 0 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra W_HIGH         # store in working variable
  store ra 0xFFF          # print INPUT 0 HIGH to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X_LOW          # store in working variable
  store ra 0xFFF          # print INPUT 1 LOW to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X_HIGH         # store in working variable
  store ra 0xFFF          # print INPUT 1 HIGH to screen

  call mul32u             # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Z_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Z_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

##################
# MUL SUBROUTINE #
##################

# description: unsigned multiplication 
# result (64bit) = Z_HIGH (16bit) || Z_LOW (16bit) || Y_HIGH (16bit) || Y_LOW (16bit) = (W_HIGH (16bit) || W_LOW (16bit)) * 
#                                                                                       (X_HIGH (16bit) || X_LOW (16bit))
# input: W_HIGH || W_LOW (multiplicand), X_HIGH || X_LOW (multiplier)
# output: (high result) Z_HIGH (16bit) || Z_LOW (16bit) || Y_HIGH (16bit) || Y_LOW (16bit) (low result)

mul32u:
    move ra 0              # zero high 32bits of partial product
    store ra Z_LOW
    store ra Z_HIGH

    load ra X_LOW          # make a copy of X to restore at end
    store ra TMP_LOW
    load ra X_HIGH
    store ra TMP_HIGH

    move ra 32             # set loop counter to 32 bits
    store ra CNT

mul32u_loop:
    load ra X_LOW          # test multiplier LSB
    and ra 1
    jumpz mul32u_no_add

    load ra Z_LOW          # add multiplicand to partial product
    addm ra W_LOW              
    store ra Z_LOW

    load ra Z_HIGH        
    addmc ra W_HIGH
    store ra Z_HIGH

mul32u_no_add:
    load ra Z_HIGH         # shift partial product
    shr ra
    store ra Z_HIGH

    load ra Z_LOW
    shr ra
    store ra Z_LOW

    load ra Y_HIGH
    shr ra
    store ra Y_HIGH

    load ra Y_LOW
    shr ra
    store ra Y_LOW

    load ra X_HIGH         # shift multiplier (X)
    asr ra                 # to simplify restore later X is copied into TMP 
    store ra X_HIGH        # so in this version zeros are shifted into X

    load ra X_LOW
    shr ra
    store ra X_LOW

    load ra CNT           # have all 32 bits been processed
    sub ra 1
    store ra CNT
    jumpnz mul32u_loop

    load ra TMP_LOW       # restore original version of X
    store ra X_LOW
    load ra TMP_HIGH
    store ra X_HIGH
    ret

Note, an N-bit * M-bit calculation will produce a N+M-bit result, sooo, the previous 32bit * 32bit calculation will generate a 64bit result.

These multiplication subroutines only process unsigned values. To process signed values we have to follow these rules:

+ * + = +
- * + = -
+ * - = -
- * - = +

There are multiplication algorithms that can process signed values, but a simpler solution is to test the signed of the multiplier and multiplicand, convert everything to unsigned, perform the multiplication and apply these rules to result i.e. call the neg16 subroutine if needed.

Test code for signed 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025 == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225 == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call mul16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

##################
# MUL SUBROUTINE #
##################

# description: signed multiplication 
# result (32bit) = Y_HIGH (16bit) || Y_LOW (16bit) = W (16bit) * X (16bit)
# input: W (multiplicand), X (multiplier)
# output: (high result) Y_HIGH, Y_LOW (low result)

mul16:
  move ra 0           # zero neg counter
  store ra TMP

mul16_t1:
  load ra W           # read W
  store ra TMP_1      # buffer so that it can be restored later
  shl ra
  jumpnc mul16_t2     # is MSB set?  

  load ra TMP         # yes, inc neg counter
  add ra 1
  store ra TMP
  
  move ra 0xFF        # convert to positive value
  subm ra W           # subtract data to invert
  add ra 1            # increment  
  store ra W          # save result

mul16_t2:
  load ra X           # read X
  store ra TMP_2      # buffer so that it can be restored later
  shl ra
  jumpnc mul16_calc   # is MSB set?

  load ra TMP         # yes, inc neg counter
  add ra 1
  store ra TMP

  move ra 0xFF        # convert to positive value
  subm ra X           # subtract data to invert
  add ra 1            # increment  
  store ra Y          # save result

mul16_calc:
  call mul16u

  load ra TMP
  and ra 3
  jumpz mul16_exit
  and ra 2
  jumpnz mul16_exit

  load ra Y_LOW
  store ra X_LOW
  load ra Y_HIGH
  store ra X_LOW

  call neg16  

mul16_exit:
  load ra TMP_1
  store ra W
  load ra TMP_2
  store ra X
  ret

Multiplication Fixed-point

Figure 17 : fixed point multiplication

The previous mul16 and mul32 subroutines can again be used to perform fixed-point calculations. However, the position of the decimal point will move, as shown in the Q5.2 example below:

Q5.2 = 7bit
7bit * 7bit = 14bit
XXXXX.XX * XXXXX.XX = XXXXXXXXX.XXXX

The Q5.2 values (7bit) have an imaginary decimal point located two bits from the LSB bit. However, the result is a Q10.4 value (14bit), sooo its decimal point position is 4bits from the LSB bit, therefore, to use this result e.g. to add this result to another Q5.2 value, you would either need to shift the Q10.4 value to the right two times i.e. convert it to a Q10.2 fixed point value, or extend the other Q5.2 value to a Q10.4. For both cases you need to align the decimal point.

Hardware multiplication 8bit and 16bit

The processor has an 8bit unsigned multiplier that can perform an 8bit * 8bit calculation to produce an 16bit result. This can also be used to perform 16bit * 16bit and 32bit * 32bit calculations. Like the 32bit addition and subtraction examples the trick is to break these calculation down into "chunks", a HIGH 8bit and LOW 8bit chunks that can be processed by the processor. However, this does not improve processing performance owing to the associated overheads, as shown below:

A16 * B16 = ((A_HIGH_8bits * 256) + A_LOW_8bits) * ((B_HIGH_8bits * 256) + B_LOW_8bits)
                      W           +      X                   Y           +       Z

A16 * B16 = (W + X) * (Y + Z)
A16 * B16 = W*Y + W*Z + X*Y + X*Z

A16 * B16 = (A_HIGH_8bits * B_HIGH_8bits * 65536) + (A_HIGH_8bits * B_LOW_8bits *256) +
            (A_LOW_8bits * B_HIGH_8bits * 256) + (A_LOW_8bits * B_LOW_8bits)


Note, *256 and *65536 can be done with ASL / MUL instructions, or simply writing the result to the correct variable.

Example 0x123 * 0x456 = 0x4EDC2

A_HIGH_8bits = 0x01 
A_LOW_8bits  = 0x23 
B_HIGH_8bits = 0x04 
B_LOW_8bits  = 0x56 

(A_HIGH_8bits * B_HIGH_8bits * 65536) + (A_HIGH_8bits * B_LOW_8bits *256) +
(A_LOW_8bits * B_HIGH_8bits * 256) + (A_LOW_8bits * B_LOW_8bits)

Total = (0x01 * 0x04 * 65536) + (0x01 * 0x56 *256) + (0x23 * 0x04 * 256) + (0x23 * 0x56)
      =      262144           +        22016       +         35840       +     3010
      = 323010 = 0x4EDC2

Test code for unsigned 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025  == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225  == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call mul16hw            # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

##################
# MUL SUBROUTINE #
##################

# description: unsigned multiplication 
# result (32bit) = Y_HIGH (16bit) || Y_LOW (16bit) = W (16bit) * X (16bit)
# input: W (multiplicand), X (multiplier)
# output: (high result) Y_HIGH, Y_LOW (low result)
# temp: TMP_0=W_LOW, TMP_1=W_HIGH, TMP_2=X_LOW, TMP_3=X_HIGH 

mul16hw:
  move ra 0
  store ra Y_LOW     # zero acc variables
  store ra Y_HIGH   
  store ra Z_LOW   
  store ra Z_HIGH 
 
  moveh rb 0xFF      # extract high 8bit values
  load ra W
  and ra rb
  store ra TMP_1
  load ra X
  and ra rb
  store ra TMP_3

  moveu rb 0xFF      # extract low 8bit values
  load ra W
  and ra rb
  store ra TMP_0
  load ra X
  and ra rb
  store ra TMP_2

  # (A_LOW_8bits * B_LOW_8bits)

  load ra TMP_0
  move rb ra
  load ra TMP_2
  mulu ra rb
  store ra W_LOW
  move ra 0
  store ra W_HIGH   
  call acc32

  # (A_LOW_8bits * B_HIGH_8bits * 256)

  load ra TMP_0
  move rb ra
  load ra TMP_3
  mulu ra rb
  move rb 0
  asl ra           # x2
  shl rb
  asl ra           # x4
  shl rb
  asl ra           # x8
  shl rb
  asl ra           # x16
  shl rb
  asl ra           # x32
  shl rb
  asl ra           # x64
  shl rb
  asl ra           # x128
  shl rb
  asl ra           # x256
  shl rb
  store ra W_LOW
  move ra rb
  store ra W_HIGH
  call acc32

  # (A_HIGH_8bits * B_LOW_8bits *256)

  load ra TMP_1
  move rb ra
  load ra TMP_2
  mulu ra rb
  move rb 0
  asl ra           # x2
  shl rb
  asl ra           # x4
  shl rb
  asl ra           # x8
  shl rb
  asl ra           # x16
  shl rb
  asl ra           # x32
  shl rb
  asl ra           # x64
  shl rb
  asl ra           # x128
  shl rb
  asl ra           # x256
  shl rb
  store ra W_LOW
  move ra rb
  store ra W_HIGH
  call acc32

  # (A_HIGH_8bits * B_HIGH_8bits * 65536) 

  load ra TMP_1
  move rb ra
  load ra TMP_3
  mulu ra rb
  store ra W_HIGH
  move ra 0
  store ra W_LOW
  call acc32
  ret

Division

Figure 18 : division - no remainder

Figure 19 : division - remainder

This function is not directly supported by the processor's hardware. Like multiplication there are a lot of different algorithms to chose from (Link), but i'm going to keep it simple and go again for the classic shift and subtract approach i.e. the restoring division algorithm (Link). The operation of this algorithm is described by the flowchart in figure 20.

Figure 20 : division flowchart

This algorithm repeatedly tests to see if the divisor can be subtracted from the section of the dividend being processed. If it can, that bit position in the quotient is set to 1, otherwise its set to 0. The term restoring is use to describe what happens in the event that the divisor can not be subtracted i.e. the divisor is bigger than section of the dividend, a negative result is produced at the end of the subtraction phase. When performing long division manually this step is just ignored, but when implemented in software we need to undo this subtraction step, sooo to restore the part of the dividend being processed we add back the divisor. To illustrate this process consider the steps shown below performing 100 divided by 5:

   100     00000000 00000000 00000000 01100100 = Dividend = YZ   
     5     00000000 00000000 00000000 00000101 = Divisor  = X    

STEP  OPERATION       Y                 Z
1     shift   00000000 00000000 00000000 11001000   
2     shift   00000000 00000000 00000001 10010000    
3     shift   00000000 00000000 00000011 00100000   
4     shift   00000000 00000000 00000110 01000000    
5     shift   00000000 00000000 00001100 10000000   
6     shift   00000000 00000000 00011001 00000000    
7     shift   00000000 00000000 00110010 00000000    
8     shift   00000000 00000000 01100100 00000000   
9     shift   00000000 00000000 11001000 00000000   
10    shift   00000000 00000001 10010000 00000000    
11    shift   00000000 00000011 00100000 00000000    
12    shift   00000000 00000110 01000000 00000000    
      sub 5   00000000 00000101 01000000 00000000    
      result  00000000 00000001 01000000 00000001
13    shift   00000000 00000010 10000000 00000010
14    shift   00000000 00000101 00000000 00000100
      sub 5   00000000 00000101 00000000 00000100    
      result  00000000 00000000 00000000 00000101
15    shift   00000000 00000000 00000000 00001010
16    shift   00000000 00000000 00000000 00010100

Note, the restore phase is not show to save space e.g. in steps 1 to 11 where the Y register contains a value less than 5 i.e. 101. In these cases subtracting 5 would generate a negative result. When this is detected, 5 is added to Y to undo the previous subtraction, then the Y and Z variables are shifted to the left 1bit position, adding a new digit.

This division algorithm produces a quotient and a remainder i.e. integer results, it does not produce a fractional representation i.e. a fixed-point number. To represent a fractional term we would need to move to a fixed point representation as shown in figure 21. Here we have a 16bit integer part and an 8bit fractional part. Therefore, there may be resolution issues i.e. can you represent the fractional term i.e. the remainder, in the given the fixed number of fractional bits :(.

Figure 21 : fixed point division

Test code for unsigned 16bit values stored in variables:

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025 == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225 == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call div16u             # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

####################
# RELATIONAL TESTS #
####################

# description: unsigned division
# result = Y (16bit) Quotient, Z (16bit) Remainder = W (16bit) / X (16bit) 
# input: W (Dividend), X (Divisor)
# output: Y (Quotient), Z (Remainder)

div16u:
  load RA W
  store RA Z
  move RA 0
  store RA Y
  move RA 16
  store RA CNT
  
div16u_loop:
  load RA Z
  asl RA
  store RA Z

  load RA Y
  shl RA
  store RA Y

  load RA Y
  subm RA X
  store RA Y
  jumpn div16u_restore

  load RA Z
  add RA 1
  store RA Z
  jump div16u_update

div16u_restore:
  load RA Y
  addm RA X
  store RA Y 

div16u_update:
  load RA CNT
  sub RA 1
  store RA CNT
  jumpnz div16u_loop

  ret

Confess have not got round to implementing an unsigned 32bit division, signed division 16bit, signed division 32bit, or division via the multiplication of a fixed-point number i.e. multiplying by a fraction, 100 * 0.125 etc.

Relational operators

A key requirement for any program is to test if a variable is equal, bigger, or smaller than another value e.g. IF-THEN-ELSE, FOR-LOOPS, WHILE-LOOPs etc.

Less than

Is W less-than X, if true store 1 in Y, if false store 0 in Y.

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025 == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225 == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call lt16u              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

####################
# RELATIONAL TESTS #
####################

# description: unsigned relational test, is W less-than X
# input: W (operand), X (operand)
# output: Y (result), 0=False, 1-True

lt16u:
  load RA W
  subm RA X
  jumpn ltu_true

ltu_false:
  move RA 0
  store RA Y
  ret

ltu_true:
  move RA 1
  store RA Y
  ret

Greater than

Is W greater-than X, if true store 1 in Y, if false store 0 in Y.

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025 == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225 == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call gt16u              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

####################
# RELATIONAL TESTS #
####################

# description: unsigned relational test, is W greater-than X
# input: W (operand), X (operand)
# output: Y (result), 0=False, 1-True

gt16u:
  load RA W
  subm RA X
  jumpz gtu_false
  jumpp gtu_true

gtu_false:
  move RA 0
  store RA Y
  ret

gtu_true:
  move RA 1
  store RA Y
  ret

Equal

Is W equal-to X, if true store 1 in Y, if false store 0 in Y.

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025 == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225 == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call equ16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

####################
# RELATIONAL TESTS #
####################

# description: unsigned relational test, is W equal-to X
# input: W (operand), X (operand)
# output: Y (result), 0=False, 1-True

eq16u:
  load RA W
  subm RA X
  jumpz equ_true

equ_false:
  move RA 0
  store RA Y
  ret

equ_true:
  move RA 1
  store RA Y
  ret

Less than OR Equal to

Is W less-than-equal-to X, if true store 1 in Y, if false store 0 in Y.

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025 == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225 == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call mul16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

####################
# RELATIONAL TESTS #
####################

# description: unsigned relational test, is W less-than-equal-to X
# input: W (operand), X (operand)
# output: Y (result), 0=False, 1-True

lte16:
  load RA W
  subm RA X
  jumpz lte_true
  jumpn lte_true

lte_false:
  move RA 0
  store RA Y
  ret

lte_true:
  move RA 1
  store RA Y
  ret

Greater than OR Equal to

Is W greater-than-equal-to X, if true store 1 in Y, if false store 0 in Y.

#################################
# TEST CODE : VARIABLES - 16bit #
#################################

start:
  jump test               # test code
fail:
  jump fail               # if simulation finished with address 1 = failed test
pass:
  jump pass               # if simulation finished with address 2 = passed test
trap:
  jump trap               # debug

data_ptr:
  .data data              # index into array
data_stop_pntr:
  .data data_end          # address of end of array

data:
  .data 0x0000            # test data 0*0=0
  .data 0x0000
  .data 0x0000            # result 
  .data 0x0000 
  .data 0x00FF            # test data 255*255=65025 == 0x00FF*0x00FF=0xFE01
  .data 0x00FF
  .data 0xFE01            # result 
  .data 0x0000 
  .data 0x0FFF            # test data 4095*4095=16769025 == 0x0FFF*0x0FFF=FFE001
  .data 0x0FFF
  .data 0xE001            # result 
  .data 0x00FF 
  .data 0xFFFF            # test data 65535*65535=4294836225 == 0xFFFF*0xFFFF=FFFE0001
  .data 0xFFFF
  .data 0x0001            # result 
  .data 0xFFFE 
data_end:
  .data 0  

test:
  load ra data_ptr        # read address of data 
  load ra (ra)            # read data
  store ra W              # store in working variable
  store ra 0xFFF          # print INPUT 0 to screen

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  store ra X              # store in working variable
  store ra 0xFFF          # print INPUT 1 to screen

  call mul16              # process data

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_LOW           # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # read address of data 
  add ra 1                # increment
  store ra data_ptr
  load ra (ra)            # read test result
  move rb ra

  load ra Y_HIGH          # read code result
  store ra 0xFFF          # print OUTPUT to screen
  sub rb ra               # equal?
  jumpnz fail             # no, stop fail

  load ra data_ptr        # yes, inc pntr
  add ra 1
  store ra data_ptr
  subm ra data_stop_pntr  # have all tests been performed?
  jumpz pass              # yes, pass

  jump test               # no, repeat

####################
# RELATIONAL TESTS #
####################

# description: unsigned relational test, is W greater-than-equal-to X
# input: W (operand), X (operand)
# output: Y (result), 0=False, 1-True

gte16:
  load RA W
  subm RA X
  jumpn gte_false

gte_true:
  move RA 1
  store RA Y
  ret 

gte_false:
  move RA 0
  store RA Y
  ret

WORK IN PROGRESS

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Contact email: mike@simplecpudesign.com

Back

Maths on the simpleCPUv1d2

Table of Contents

Arguments, results and a data stack

Integer and Fixed-point numbers

Testing

Negate

Addition

Addition Fixed-point

Accumulation

Subtraction

Subtraction Fixed-point

Multiplication

Multiplication Fixed-point

Hardware multiplication 8bit and 16bit

Division

Relational operators

Less than

Greater than

Equal

Less than OR Equal to

Greater than OR Equal to