[ Inverses
| Sinus
| Exp-Log
| Appendix A
| Appendix B ]
Alain BROBECKER Dracula / Positivity (STe)
rte de Dardagny Baah / Arm's Tech (Archie)
01630 CHALLEX Baah (PC)
FRANCE
---------------------------------------------------------------- 7 october 95
- MEMORY WAR -
============
This text is dedicated to ArmOric, my fave precalc-tables dealer. ;)
FOREWORD
This text is aimed to assembly programmer, I doubt people programming
in high level languages will appreciate it much. All algorithms, ideas
and assembly code in here were made by me. (else I mention the author)
If you use it in one of your programs, please send me a free copy of it
and credit me. If you appreciated this text, I would be glad to know it.
INTRODUCTION
The most effective packer is not a program, but your own grey cells.
In a program, the size of the code (if it is in full assembly language
I mean) is ridiculous compared to the size used by datas! So, instead
of generating your precalculations table and store them on the disk,
why don' t you code a small routine which will generate the table and
save it in memory just before the main program is launched?
Maybe you will consider it a waste of time, but the owners of modests
computers (I own a 1 Mb floppy based Archimedes 3000) will thank you a
lot! And well, by programming datas generator in assembly language, you
often learn a lot!
INVERSES TABLE
Since the Archimedes processor does not have a division instruction,
we generally use a table of inverses and then replace divisions by
multiplications. Making the table in basic and including it as binary
datas is a real waste of disk space, because a routine which will create
the inverses table is really small and easy to code.
In appendix A, you will find such a routine. It is 68 bytes long,
so if you generate a 1024 inverses table, the 'packing ratio' is 1:60.
I really doubt your fave packer can achieve the same.
SINUS TABLE
Sinus are certainly one of the most widely used datas when it comes to
demo or game programming. The generation of such a table is not as easy
as for the inverses, because we will need some (simple) mathematics in
order to create it. Here are the needed formulas...
1) cosX=sin(X+pi/2)
2) sin(X+pi)=-sinX
3) sin(pi-X)=sinX
4) sin(pi/2-X)=sin(X+pi/2) because cos(-X)=cosX
5) cos(X+Y)=cosX*cosY-sinX*sinY
6) sin(X+Y)=cosX*sinY+sinX*cosY
The formula 1 tells us a very important thing: sinus and cosinus are
tightly bound. From now on I consider the cosinus table is the same as
the sinus table, but with a slight shift.
With the formulas 1-4, we easily deduct that all sinus values could be
found if only we knew the sinus and cosinus values for the angles between
[0;pi/4]. So, the problem is now smaller, we only need to calculate those
values and then construct the rest of the table with formulas 1-4.
Now consider the well known formulas 5 and 6. Let us call A the minimum
angle you want in your table, (must be a divisor of pi/4) and let' s say
we have sinA and cosA already calculated. (included them as datas for
example, takes 8 bytes) By using formulas 5 and 6, we immediatly obtain
sin(2A) and cos(2A), by using them anew we obtain sin(3A) and cos(3A), and
we continue in this iterative way until we reach n*A=pi/4. Here is a small
basic proggy which is simple enough to be considered as the algorithm...
REM Note that sin((n+1)*A) and cos((n+1)*A) will be calculated,
REM but they won' t be saved.
n=64 : REM nb of angles between [0;pi/4[.
DIM sinus(8*n-1)
sinA=SIN((PI/4)/n) : REM Better to have them as constants.
cosA=COS((PI/4)/n)
s=0 : REM s=SIN(0)
c=1 : REM c=COS(0)
FOR i=0 TO n+1
sinus(i)=s : REM Save sin(i) and cos(i) in table.
sinus(i+2*n)=c
temp=c*cosA-s*sinA
s=c*sinA+s*cosA
c=temp
NEXT i
There are two problems with this algorithm... The first one is that it
is not as easy to implement in assembly language as the inverses table
creator, because it requires high precision fixed point 'mathematics' in
order to give precise results, even for small tables.
In appendix B you will find my asm sinus creator. With the given values,
the resulting table is perfect, so you needn' t worry too much about
precision unless you need a very big table. The routine is 360 bytes long,
and creates a 1280 sinus table, so the 'packing ratio' is 1:14. It is not
as impressive as for inverses, but still very good. (it can even be
smaller, if you merge the two loops, but you must make sure the offset
for ldr/str won' t exceed 4096 bytes)
Oh, by the way, I arbitrarily took 28 bits for the fractionnary part of
my fixed point reals. You can take a bigger value, (it is recommended if
you need higher precision) my choice was directed by habit since it is the
value I have used for some of my proggys. Also, you can increase the
precision by rounding to nearest integer in the adjust64 macro. (No need
to do it in other macros, it won' t be noticeable when the result will
be adjusted to 32 bits)
The second problem appears when you need a very huge and precise table.
Whatever A you take, you will have an error on sinA and cosA. (because
cos(pi/4) is an irrationnal number, and so cannot be calculated with
rationnal ones) And the iterative algorithm above is so that the error
between real sinus values and the one calculated increases at each
iteration. If you need a lot of iterations, the error may be terrific. By
chance I dicovered a trick which minimizes the number of iterations. Quite
simply, suppose you have already calculated all sinus and cosinus between
[0;k*A], then you can calculate all sinus for angles [(k+1)*A;2*k*a] with
only one more iteration.Let' s say n is a number between [1;k], then...
sin((k+n)*A)=cos(kA)*sin(nA)+sin(kA)*cos(nA)
cos((k+n)*A)=cos(kA)*cos(nA)-sin(kA)*sin(nA)
Again, I give you a basic proggy, which I consider as the algorithm for
this second, more precise, method to create your sinus table. There is a
restriction which is that the nb of angles between [0;pi/4[ must be a power
of 2. Maybe it can be avoided, but I don' t think this restriction is very
annoying, so I din' t try to avoid it.
m=12
n=2^m : REM nb of angles between [0;pi/4[.
DIM sinus(8*n-1)
sinus(0)=0 : REM Save the first values of sinus.
sinus(2*n)=1
sinus(1)=SIN((PI/4)/n)
sinus(2*n+1)=COS((PI/4)/n)
FOR i=1 TO m
z=2^(i-1)
sinB=sinus(z) : REM Will be used in the second loop.
cosB=sin(z+2*n)
FOR j=1 TO o
sinus(z+j)=cosB*sinus(j)+sinB*sinus(j+2*n)
sinus(z+j+2*n)=cosB*sinus(j+2*n)-sinB*sinus(j)
NEXT j
NEXT i
It is sad, but I have not written an assembly version of this algorithm,
because up to now I have never needed higher precision sinus table. So,
for once you will have to work by your own! :) I expect the routine to
be a bit bigger in size, but since it will be devoted to very big tables
it will certainly be worth its price anyway. Another thing not to forget
is that you will use values saved in the table, so you must save them with
maximum precision.
EXP-LOG TABLE
Again, something I have never used, but ArmOric gave me the basic idea,
and maybe some people will have the use of it. (As far as I know, you can
find logarithms in some fractals, so it is not as uninteresting as it may
seem) Quite simply, consider the following formulas...
7) exp(A+B)=expA*expB
8) log(A*B)=logA*logB
They both look similar to the one given in sinus tables, no? (Except
that they are stand-alone formulas) I bet I don' t need to say much,
the way 7 & 8 must be handled is clear. Thoses tables will be much bigger
than sinus table, because they are not periodic, so you will be forced
to minimize the number of iterations here too. I recommend an algorithm
looking like the second one given for sinus.
- THE END -
;****************************************************************************
;***** *****
;***** APPENDIX A *****
;***** *****
;****************************************************************************
; This routine calculates all the division between r1/1 and r1/r2, and
; save them into [r0]. Each inverse is contained in a longword.
; Entry parameters...
; r0 = adress where to generate the table.
; r1 = premultiplication factor for the inverses.
; r2 = last number to take the inverse of.
.make_inverses_table
mov r3,#1 ; Used by the division routine.
mov r4,#1 ; r4 is the current divisor.
._make_one_inverse
mov r5,r4 ; r5=divisor.
mov r6,r1 ; r6=premul factor=dividend.
mov r7,#0 ; r7 will contain the quotient of r1/r4.
mov r8,#15 ; Shift for the division.
._divide_one_step
cmp r6,r5,lsl r8 ; dividend bigger than divisor<<r8?
subGE r6,r6,r5,lsl r8 ; Yes, then dividend-=divisor<<r8,
addGE r7,r7,r3,lsl r8 ; and add 1<<r8 to the quotient.
subS r8,r8,#1 ; Next shift.
bPL _divide_one_step
cmp r5,r6,lsl #1 ; Flags=divisor-2*rest.
addLE r7,r7,#1 ; Round to nearest integer.
str r7,[r0],#4 ; Save r1/r4.
add r4,r4,#1 ; r4 is the next divisor.
cmp r4,r2 ; It was the last one?
bLE _make_one_inverse
;****************************************************************************
;***** *****
;***** APPENDIX B *****
;***** *****
;****************************************************************************
; Things are more complicated, so I divided the problem in smaller parts by
; using macros which performs high precision calculi. (64 bits) A 64 bits
; integer is noted [a|b] where a contains the upper longword.
; This macro performs an unsigned 32*32 bits multiply.
; [m0|m1]=m2*m3. You can choose [m2|m3]=[m0|m1]
; It destroys m0,m1,m4,m5,m6 and the flags. (C is the carry flag)
macro umul64 m0,m1,m2,m3,m4,m5,m6
{ mov m4,m2,lsr #16 ; m4=up(m2).
sub m5,m2,m4,lsl #16 ; m5=down(m2).
mov m0,m3,lsr #16 ; m0=up(m3).
sub m1,m3,m0,lsl #16 ; m1=down(m3).
mul m6,m5,m0 ; m6=down(m2)*up(m3).
mul m0,m4,m0 ; m0=up(m2)*up(m3).
mlaS m6,m1,m4,m6 ; C+m6=up(m2)*down(m3)+down(m2)*up(m3).
adc m0,m0,m6,lsr #16
mul m1,m5,m1 ; m1=down(m2)*down(m3).
addS m1,m1,m6,lsl #16
adc m0,m0,#0 ; [m0|m1]=m2*m3.
}
; This macro adjusts the 64 bits result to 32 bits, according to the fixed
; point shift factor. (c0)
; m0=[m1|m2]>>c0. You can choose m1=m0.
; It destroys m0.
macro adjust64 m0,m1,m2,c0
{ mov m0,m1,lsl #32-c0
add m0,m0,m2,lsr #c0
}
; This macro performs a 64 bits addition.
; [m0|m1]=[m2|m3]+[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1].
; It destroys [m0|m1] and the flags.
macro add64 m0,m1,m2,m3,m4,m5
{ addS m1,m3,m5
adc m0,m2,m4
}
; This macro performs a 64 bits substract.
; [m0|m1]=[m2|m3]-[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1].
; It destroys [m0|m1] and the flags.
macro sub64 m0,m1,m2,m3,m4,m5
{ subS m1,m3,m5
sbc m0,m2,m4
}
;----------------------------------------------------------------------------
; This routine calculates the sinus table, for angles between 0 and 5*pi/2.
; (So the cos table is between pi/2 and 5*pi/2) I take in account the fact
; that all sinus and cosinus values between 0 and pi/4 are positive.
; When I write sinN, it is in fact sin(N*A).
; Entry parameters...
; r0 = adress where to generate the table.
#set sin_shift=20 ; Shift for saved values. Must be =<28.
#set sin_nb=128 ; Nb of angles between [0;pi/4[.
.sinA dcd 1647089 ; sin((pi/4)/sin_nb)*2^28.
.cosA dcd 268430403 ; cos((pi/4)/sin_nb)*2^28.
.make_sinus
ldr r1,sinA ; r1=sinA*2^28.
ldr r2,cosA ; r2=cosA*2^28.
mov r3,#0 ; r3=sin0*2^28.
mov r4,#1<<28 ; r4=cos0*2^28.
mov r5,#sin_nb+1
.make_one_sinus
mov r6,r4,lsr #28-sin_shift ; r6=cosN*2^shift.
str r6,[r0,#sin_nb*2*4] ; Save sin(N+pi/2)=cosN.
mov r6,r3,lsr #28-sin_shift ; r6=sinN*2^shift.
str r6,[r0],#4 ; Save sinN.
umul64 r6,r7,r1,r3,r8,r9,r10 ; [r6|r7]=sinN*sinA.
umul64 r8,r9,r2,r4,r10,r11,r12 ; [r8|r9]=cosN*cosA.
sub64 r6,r7,r8,r9,r6,r7 ; [r6|r7]=cos(N+1)=cosN*sin1-sinN*sin1.
umul64 r3,r8,r3,r2,r9,r10,r11 ; [r3|r8]=sinN*cosA.
umul64 r4,r9,r4,r1,r10,r11,r12 ; [r4|r9]=cosN*sinA.
add64 r3,r8,r3,r8,r4,r9 ; [r3|r8]=sin(N+1)=sinN*cos1+cosN*sin1.
adjust64 r3,r3,r8,28 ; r1=sin(N+1)=sinN*cos1+cosN*sin1.
adjust64 r4,r6,r7,28 ; r2=cos(N+1)=cosN*sin1-sinN*sin1.
subS r5,r5,#1 ; One sinus processed.
bNE make_one_sinus
sub r0,r0,#4 ; Complete the table by stupid copy.
mov r1,r0 ; Point on the position which are like
add r2,r0,#sin_nb*8 ; (pi/4+k*(pi/2)) 0<=k<=4
mov r3,r2
add r4,r2,#sin_nb*8
mov r5,r4
add r6,r4,#sin_nb*8
mov r7,r6
add r8,r6,#sin_nb*8
mov r9,r8
mov r10,#sin_nb+1 ; Copy sin_nb+1 values.
._make_sinus_copy
ldr r11,[r0],#-4
str r11,[r3],#4 ; sin(pi-X)=sinX.
str r11,[r8],#-4 ; sin(2*pi+X)=sinX.
rsb r11,r11,#0
str r11,[r4],#-4 ; sin(pi+X)=-sinX.
str r11,[r7],#4 ; sin(2*pi-X)=-sinX.
ldr r11,[r2],#-4
str r11,[r1],#4 ; cos(-X)=cosX.
subS r10,r10,#1 ; One value copied.
strNE r11,[r9],#4 ; cos(2*pi+X)=cosX. No copy if r10=0.
rsb r11,r11,#0
str r11,[r5],#4 ; cos(pi-X)=-cosX.
str r11,[r6],#-4 ; cos(pi+X)=-cosX.
bNE _make_sinus_copy