[ Inverses | Sinus | Exp-Log | Appendix A | Appendix B ]

```
Alain BROBECKER                                    Dracula / Positivity (STe)
rte de Dardagny                                    Baah / Arm's Tech (Archie)
01630 CHALLEX                                                      Baah (PC)
FRANCE
---------------------------------------------------------------- 7 october 95

- MEMORY WAR -
============

This text is dedicated to ArmOric, my fave precalc-tables dealer. ;)

FOREWORD

This text is aimed to assembly programmer, I doubt people programming
in high level languages will appreciate it much. All algorithms, ideas
and assembly code in here were made by me. (else I mention the author)
If you use it in one of your programs, please send me a free copy of it
and credit me. If you appreciated this text, I would be glad to know it.

INTRODUCTION

The most effective packer is not a program, but your own grey cells.
In a program, the size of the code (if it is in full assembly language
I mean) is ridiculous compared to the size used by datas! So, instead
of generating your precalculations table and store them on the disk,
why don' t you code a small routine which will generate the table and
save it in memory just before the main program is launched?
Maybe you will consider it a waste of time, but the owners of modests
computers (I own a 1 Mb floppy based Archimedes 3000) will thank you a
lot! And well, by programming datas generator in assembly language, you
often learn a lot!

INVERSES TABLE

Since the Archimedes processor does not have a division instruction,
we generally use a table of inverses and then replace divisions by
multiplications. Making the table in basic and including it as binary
datas is a real waste of disk space, because a routine which will create
the inverses table is really small and easy to code.
In appendix A, you will find such a routine. It is 68 bytes long,
so if you generate a 1024 inverses table, the 'packing ratio' is 1:60.
I really doubt your fave packer can achieve the same.

SINUS TABLE

Sinus are certainly one of the most widely used datas when it comes to
demo or game programming. The generation of such a table is not as easy
as for the inverses, because we will need some (simple) mathematics in
order to create it. Here are the needed formulas...
1)  cosX=sin(X+pi/2)
2)  sin(X+pi)=-sinX
3)  sin(pi-X)=sinX
4)  sin(pi/2-X)=sin(X+pi/2)         because cos(-X)=cosX
5)  cos(X+Y)=cosX*cosY-sinX*sinY
6)  sin(X+Y)=cosX*sinY+sinX*cosY

The formula 1 tells us a very important thing: sinus and cosinus are
tightly bound. From now on I consider the cosinus table is the same as
the sinus table, but with a slight shift.
With the formulas 1-4, we easily deduct that all sinus values could be
found if only we knew the sinus and cosinus values for the angles between
[0;pi/4]. So, the problem is now smaller, we only need to calculate those
values and then construct the rest of the table with formulas 1-4.

Now consider the well known formulas 5 and 6. Let us call A the minimum
angle you want in your table, (must be a divisor of pi/4) and let' s say
we have sinA and cosA already calculated. (included them as datas for
example, takes 8 bytes) By using formulas 5 and 6, we immediatly obtain
sin(2A) and cos(2A), by using them anew we obtain sin(3A) and cos(3A), and
we continue in this iterative way until we reach n*A=pi/4. Here is a small
basic proggy which is simple enough to be considered as the algorithm...

REM Note that sin((n+1)*A) and cos((n+1)*A) will be calculated,
REM but they won' t be saved.
n=64                    : REM nb of angles between [0;pi/4[.
DIM sinus(8*n-1)
sinA=SIN((PI/4)/n)      : REM Better to have them as constants.
cosA=COS((PI/4)/n)
s=0                     : REM s=SIN(0)
c=1                     : REM c=COS(0)
FOR i=0 TO n+1
sinus(i)=s            : REM Save sin(i) and cos(i) in table.
sinus(i+2*n)=c
temp=c*cosA-s*sinA
s=c*sinA+s*cosA
c=temp
NEXT i

There are two problems with this algorithm... The first one is that it
is not as easy to implement in assembly language as the inverses table
creator, because it requires high precision fixed point 'mathematics' in
order to give precise results, even for small tables.
In appendix B you will find my asm sinus creator. With the given values,
the resulting table is perfect, so you needn' t worry too much about
precision unless you need a very big table. The routine is 360 bytes long,
and creates a 1280 sinus table, so the 'packing ratio' is 1:14. It is not
as impressive as for inverses, but still very good. (it can even be
smaller, if you merge the two loops, but you must make sure the offset
for ldr/str won' t exceed 4096 bytes)

Oh, by the way, I arbitrarily took 28 bits for the fractionnary part of
my fixed point reals. You can take a bigger value, (it is recommended if
you need higher precision) my choice was directed by habit since it is the
value I have used for some of my proggys. Also, you can increase the
precision by rounding to nearest integer in the adjust64 macro. (No need
to do it in other macros, it won' t be noticeable when the result will

The second problem appears when you need a very huge and precise table.
Whatever A you take, you will have an error on sinA and cosA. (because
cos(pi/4) is an irrationnal number, and so cannot be calculated with
rationnal ones) And the iterative algorithm above is so that the error
between real sinus values and the one calculated increases at each
iteration. If you need a lot of iterations, the error may be terrific. By
chance I dicovered a trick which minimizes the number of iterations. Quite
simply, suppose you have already calculated all sinus and cosinus between
[0;k*A], then you can calculate all sinus for angles [(k+1)*A;2*k*a] with
only one more iteration.Let' s say n is a number between [1;k], then...
sin((k+n)*A)=cos(kA)*sin(nA)+sin(kA)*cos(nA)
cos((k+n)*A)=cos(kA)*cos(nA)-sin(kA)*sin(nA)
Again, I give you a basic proggy, which I consider as the algorithm for
this second, more precise, method to create your sinus table. There is a
restriction which is that the nb of angles between [0;pi/4[ must be a power
of 2. Maybe it can be avoided, but I don' t think this restriction is very
annoying, so I din' t try to avoid it.

m=12
n=2^m                   : REM nb of angles between [0;pi/4[.
DIM sinus(8*n-1)
sinus(0)=0              : REM Save the first values of sinus.
sinus(2*n)=1
sinus(1)=SIN((PI/4)/n)
sinus(2*n+1)=COS((PI/4)/n)
FOR i=1 TO m
z=2^(i-1)
sinB=sinus(z)         : REM Will be used in the second loop.
cosB=sin(z+2*n)
FOR j=1 TO o
sinus(z+j)=cosB*sinus(j)+sinB*sinus(j+2*n)
sinus(z+j+2*n)=cosB*sinus(j+2*n)-sinB*sinus(j)
NEXT j
NEXT i

It is sad, but I have not written an assembly version of this algorithm,
because up to now I have never needed higher precision sinus table. So,
for once you will have to work by your own! :) I expect the routine to
be a bit bigger in size, but since it will be devoted to very big tables
it will certainly be worth its price anyway. Another thing not to forget
is that you will use values saved in the table, so you must save them with
maximum precision.

EXP-LOG TABLE

Again, something I have never used, but ArmOric gave me the basic idea,
and maybe some people will have the use of it. (As far as I know, you can
find logarithms in some fractals, so it is not as uninteresting as it may
seem) Quite simply, consider the following formulas...
7)  exp(A+B)=expA*expB
8)  log(A*B)=logA*logB

They both look similar to the one given in sinus tables, no? (Except
that they are stand-alone formulas) I bet I don' t need to say much,
the way 7 & 8 must be handled is clear. Thoses tables will be much bigger
than sinus table, because they are not periodic, so you will be forced
to minimize the number of iterations here too. I recommend an algorithm
looking like the second one given for sinus.

- THE END -

;****************************************************************************
;*****                                                                  *****
;*****                            APPENDIX A                            *****
;*****                                                                  *****
;****************************************************************************
; This routine calculates all the division between r1/1 and r1/r2, and
; save them into [r0]. Each inverse is contained in a longword.
; Entry parameters...
;         r0 = adress where to generate the table.
;         r1 = premultiplication factor for the inverses.
;         r2 = last number to take the inverse of.
.make_inverses_table
mov       r3,#1                   ; Used by the division routine.
mov       r4,#1                   ; r4 is the current divisor.
._make_one_inverse
mov       r5,r4                   ; r5=divisor.
mov       r6,r1                   ; r6=premul factor=dividend.
mov       r7,#0                   ; r7 will contain the quotient of r1/r4.
mov       r8,#15                  ; Shift for the division.
._divide_one_step
cmp       r6,r5,lsl r8            ; dividend bigger than divisor<<r8?
subGE     r6,r6,r5,lsl r8         ; Yes, then dividend-=divisor<<r8,
subS      r8,r8,#1                ; Next shift.
bPL       _divide_one_step
cmp       r5,r6,lsl #1            ; Flags=divisor-2*rest.
addLE     r7,r7,#1                ; Round to nearest integer.
str       r7,[r0],#4              ; Save r1/r4.
add       r4,r4,#1                ; r4 is the next divisor.
cmp       r4,r2                   ; It was the last one?
bLE       _make_one_inverse

;****************************************************************************
;*****                                                                  *****
;*****                            APPENDIX B                            *****
;*****                                                                  *****
;****************************************************************************
; Things are more complicated, so I divided the problem in smaller parts by
; using macros which performs high precision calculi. (64 bits) A 64 bits
; integer is noted [a|b] where a contains the upper longword.

; This macro performs an unsigned 32*32 bits multiply.
;   [m0|m1]=m2*m3. You can choose [m2|m3]=[m0|m1]
;   It destroys m0,m1,m4,m5,m6 and the flags. (C is the carry flag)
macro umul64 m0,m1,m2,m3,m4,m5,m6
{ mov       m4,m2,lsr #16           ; m4=up(m2).
sub       m5,m2,m4,lsl #16        ; m5=down(m2).
mov       m0,m3,lsr #16           ; m0=up(m3).
sub       m1,m3,m0,lsl #16        ; m1=down(m3).
mul       m6,m5,m0                ; m6=down(m2)*up(m3).
mul       m0,m4,m0                ; m0=up(m2)*up(m3).
mlaS      m6,m1,m4,m6             ; C+m6=up(m2)*down(m3)+down(m2)*up(m3).
mul       m1,m5,m1                ; m1=down(m2)*down(m3).
}

; This macro adjusts the 64 bits result to 32 bits, according to the fixed
; point shift factor. (c0)
;   m0=[m1|m2]>>c0. You can choose m1=m0.
;   It destroys m0.
{ mov       m0,m1,lsl #32-c0
}

; This macro performs a 64 bits addition.
;   [m0|m1]=[m2|m3]+[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1].
;   It destroys [m0|m1] and the flags.
}

; This macro performs a 64 bits substract.
;   [m0|m1]=[m2|m3]-[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1].
;   It destroys [m0|m1] and the flags.
macro sub64 m0,m1,m2,m3,m4,m5
{ subS      m1,m3,m5
sbc       m0,m2,m4
}

;----------------------------------------------------------------------------
; This routine calculates the sinus table, for angles between 0 and 5*pi/2.
; (So the cos table is between pi/2 and 5*pi/2) I take in account the fact
; that all sinus and cosinus values between 0 and pi/4 are positive.
; When I write sinN, it is in fact sin(N*A).
; Entry parameters...
;         r0 = adress where to generate the table.

#set sin_shift=20                   ; Shift for saved values. Must be =<28.
#set sin_nb=128                     ; Nb of angles between [0;pi/4[.

.sinA       dcd 1647089             ; sin((pi/4)/sin_nb)*2^28.
.cosA       dcd 268430403           ; cos((pi/4)/sin_nb)*2^28.

.make_sinus
ldr       r1,sinA                 ; r1=sinA*2^28.
ldr       r2,cosA                 ; r2=cosA*2^28.
mov       r3,#0                   ; r3=sin0*2^28.
mov       r4,#1<<28               ; r4=cos0*2^28.
mov       r5,#sin_nb+1
.make_one_sinus
mov       r6,r4,lsr #28-sin_shift ; r6=cosN*2^shift.
str       r6,[r0,#sin_nb*2*4]     ; Save sin(N+pi/2)=cosN.
mov       r6,r3,lsr #28-sin_shift ; r6=sinN*2^shift.
str       r6,[r0],#4              ; Save sinN.
umul64 r6,r7,r1,r3,r8,r9,r10      ; [r6|r7]=sinN*sinA.
umul64 r8,r9,r2,r4,r10,r11,r12    ; [r8|r9]=cosN*cosA.
sub64 r6,r7,r8,r9,r6,r7           ; [r6|r7]=cos(N+1)=cosN*sin1-sinN*sin1.
umul64 r3,r8,r3,r2,r9,r10,r11     ; [r3|r8]=sinN*cosA.
umul64 r4,r9,r4,r1,r10,r11,r12    ; [r4|r9]=cosN*sinA.
subS      r5,r5,#1                ; One sinus processed.
bNE       make_one_sinus

sub       r0,r0,#4                ; Complete the table by stupid copy.
mov       r1,r0                   ; Point on the position which are like
mov       r3,r2
mov       r5,r4
mov       r7,r6
mov       r9,r8
mov       r10,#sin_nb+1           ; Copy sin_nb+1 values.
._make_sinus_copy
ldr       r11,[r0],#-4
str       r11,[r3],#4             ; sin(pi-X)=sinX.
str       r11,[r8],#-4            ; sin(2*pi+X)=sinX.
rsb       r11,r11,#0
str       r11,[r4],#-4            ; sin(pi+X)=-sinX.
str       r11,[r7],#4             ; sin(2*pi-X)=-sinX.
ldr       r11,[r2],#-4
str       r11,[r1],#4             ; cos(-X)=cosX.
subS      r10,r10,#1              ; One value copied.
strNE     r11,[r9],#4             ; cos(2*pi+X)=cosX. No copy if r10=0.
rsb       r11,r11,#0
str       r11,[r5],#4             ; cos(pi-X)=-cosX.
str       r11,[r6],#-4            ; cos(pi+X)=-cosX.
bNE       _make_sinus_copy
```