[ Inverses | Sinus | Exp-Log | Appendix A | Appendix B ]

Alain BROBECKER                                    Dracula / Positivity (STe)
rte de Dardagny                                    Baah / Arm's Tech (Archie)
 01630 CHALLEX                                                      Baah (PC)
    FRANCE
---------------------------------------------------------------- 7 october 95

                                - MEMORY WAR -
                                 ============

    This text is dedicated to ArmOric, my fave precalc-tables dealer. ;)



FOREWORD

    This text is aimed to assembly programmer, I doubt people programming
  in high level languages will appreciate it much. All algorithms, ideas
  and assembly code in here were made by me. (else I mention the author)
  If you use it in one of your programs, please send me a free copy of it
  and credit me. If you appreciated this text, I would be glad to know it.



INTRODUCTION

    The most effective packer is not a program, but your own grey cells.
  In a program, the size of the code (if it is in full assembly language
  I mean) is ridiculous compared to the size used by datas! So, instead
  of generating your precalculations table and store them on the disk,
  why don' t you code a small routine which will generate the table and
  save it in memory just before the main program is launched?
    Maybe you will consider it a waste of time, but the owners of modests
  computers (I own a 1 Mb floppy based Archimedes 3000) will thank you a
  lot! And well, by programming datas generator in assembly language, you
  often learn a lot!



INVERSES TABLE

    Since the Archimedes processor does not have a division instruction,
  we generally use a table of inverses and then replace divisions by
  multiplications. Making the table in basic and including it as binary
  datas is a real waste of disk space, because a routine which will create
  the inverses table is really small and easy to code.
    In appendix A, you will find such a routine. It is 68 bytes long,
  so if you generate a 1024 inverses table, the 'packing ratio' is 1:60.
  I really doubt your fave packer can achieve the same.



SINUS TABLE

    Sinus are certainly one of the most widely used datas when it comes to
  demo or game programming. The generation of such a table is not as easy
  as for the inverses, because we will need some (simple) mathematics in
  order to create it. Here are the needed formulas...
            1)  cosX=sin(X+pi/2)
            2)  sin(X+pi)=-sinX
            3)  sin(pi-X)=sinX
            4)  sin(pi/2-X)=sin(X+pi/2)         because cos(-X)=cosX
            5)  cos(X+Y)=cosX*cosY-sinX*sinY
            6)  sin(X+Y)=cosX*sinY+sinX*cosY

    The formula 1 tells us a very important thing: sinus and cosinus are
  tightly bound. From now on I consider the cosinus table is the same as
  the sinus table, but with a slight shift.
    With the formulas 1-4, we easily deduct that all sinus values could be
  found if only we knew the sinus and cosinus values for the angles between
  [0;pi/4]. So, the problem is now smaller, we only need to calculate those
  values and then construct the rest of the table with formulas 1-4.

    Now consider the well known formulas 5 and 6. Let us call A the minimum
  angle you want in your table, (must be a divisor of pi/4) and let' s say
  we have sinA and cosA already calculated. (included them as datas for
  example, takes 8 bytes) By using formulas 5 and 6, we immediatly obtain
  sin(2A) and cos(2A), by using them anew we obtain sin(3A) and cos(3A), and
  we continue in this iterative way until we reach n*A=pi/4. Here is a small
  basic proggy which is simple enough to be considered as the algorithm...

            REM Note that sin((n+1)*A) and cos((n+1)*A) will be calculated,
            REM but they won' t be saved.
            n=64                    : REM nb of angles between [0;pi/4[.
            DIM sinus(8*n-1)
            sinA=SIN((PI/4)/n)      : REM Better to have them as constants.
            cosA=COS((PI/4)/n)
            s=0                     : REM s=SIN(0)
            c=1                     : REM c=COS(0)
            FOR i=0 TO n+1
              sinus(i)=s            : REM Save sin(i) and cos(i) in table.
              sinus(i+2*n)=c
              temp=c*cosA-s*sinA
              s=c*sinA+s*cosA
              c=temp
            NEXT i

    There are two problems with this algorithm... The first one is that it
  is not as easy to implement in assembly language as the inverses table
  creator, because it requires high precision fixed point 'mathematics' in
  order to give precise results, even for small tables.
    In appendix B you will find my asm sinus creator. With the given values,
  the resulting table is perfect, so you needn' t worry too much about
  precision unless you need a very big table. The routine is 360 bytes long,
  and creates a 1280 sinus table, so the 'packing ratio' is 1:14. It is not
  as impressive as for inverses, but still very good. (it can even be
  smaller, if you merge the two loops, but you must make sure the offset
  for ldr/str won' t exceed 4096 bytes)

    Oh, by the way, I arbitrarily took 28 bits for the fractionnary part of
  my fixed point reals. You can take a bigger value, (it is recommended if
  you need higher precision) my choice was directed by habit since it is the
  value I have used for some of my proggys. Also, you can increase the
  precision by rounding to nearest integer in the adjust64 macro. (No need
  to do it in other macros, it won' t be noticeable when the result will
  be adjusted to 32 bits)

    The second problem appears when you need a very huge and precise table.
  Whatever A you take, you will have an error on sinA and cosA. (because
  cos(pi/4) is an irrationnal number, and so cannot be calculated with
  rationnal ones) And the iterative algorithm above is so that the error
  between real sinus values and the one calculated increases at each
  iteration. If you need a lot of iterations, the error may be terrific. By
  chance I dicovered a trick which minimizes the number of iterations. Quite
  simply, suppose you have already calculated all sinus and cosinus between
  [0;k*A], then you can calculate all sinus for angles [(k+1)*A;2*k*a] with
  only one more iteration.Let' s say n is a number between [1;k], then...
            sin((k+n)*A)=cos(kA)*sin(nA)+sin(kA)*cos(nA)
            cos((k+n)*A)=cos(kA)*cos(nA)-sin(kA)*sin(nA)
    Again, I give you a basic proggy, which I consider as the algorithm for
  this second, more precise, method to create your sinus table. There is a
  restriction which is that the nb of angles between [0;pi/4[ must be a power
  of 2. Maybe it can be avoided, but I don' t think this restriction is very
  annoying, so I din' t try to avoid it.

            m=12
            n=2^m                   : REM nb of angles between [0;pi/4[.
            DIM sinus(8*n-1)
            sinus(0)=0              : REM Save the first values of sinus.
            sinus(2*n)=1
            sinus(1)=SIN((PI/4)/n)
            sinus(2*n+1)=COS((PI/4)/n)
            FOR i=1 TO m
              z=2^(i-1)
              sinB=sinus(z)         : REM Will be used in the second loop.
              cosB=sin(z+2*n)
              FOR j=1 TO o
                sinus(z+j)=cosB*sinus(j)+sinB*sinus(j+2*n)
                sinus(z+j+2*n)=cosB*sinus(j+2*n)-sinB*sinus(j)
              NEXT j
            NEXT i

    It is sad, but I have not written an assembly version of this algorithm,
  because up to now I have never needed higher precision sinus table. So,
  for once you will have to work by your own! :) I expect the routine to
  be a bit bigger in size, but since it will be devoted to very big tables
  it will certainly be worth its price anyway. Another thing not to forget
  is that you will use values saved in the table, so you must save them with
  maximum precision.



EXP-LOG TABLE

    Again, something I have never used, but ArmOric gave me the basic idea,
  and maybe some people will have the use of it. (As far as I know, you can
  find logarithms in some fractals, so it is not as uninteresting as it may
  seem) Quite simply, consider the following formulas...
            7)  exp(A+B)=expA*expB
            8)  log(A*B)=logA*logB

    They both look similar to the one given in sinus tables, no? (Except
  that they are stand-alone formulas) I bet I don' t need to say much,
  the way 7 & 8 must be handled is clear. Thoses tables will be much bigger
  than sinus table, because they are not periodic, so you will be forced
  to minimize the number of iterations here too. I recommend an algorithm
  looking like the second one given for sinus.


                                 - THE END -



;****************************************************************************
;*****                                                                  *****
;*****                            APPENDIX A                            *****
;*****                                                                  *****
;****************************************************************************
; This routine calculates all the division between r1/1 and r1/r2, and
; save them into [r0]. Each inverse is contained in a longword.
; Entry parameters...
;         r0 = adress where to generate the table.
;         r1 = premultiplication factor for the inverses.
;         r2 = last number to take the inverse of.
.make_inverses_table
  mov       r3,#1                   ; Used by the division routine.
  mov       r4,#1                   ; r4 is the current divisor.
._make_one_inverse
  mov       r5,r4                   ; r5=divisor.
  mov       r6,r1                   ; r6=premul factor=dividend.
  mov       r7,#0                   ; r7 will contain the quotient of r1/r4.
  mov       r8,#15                  ; Shift for the division.
._divide_one_step
  cmp       r6,r5,lsl r8            ; dividend bigger than divisor<<r8?
  subGE     r6,r6,r5,lsl r8         ; Yes, then dividend-=divisor<<r8,
  addGE     r7,r7,r3,lsl r8         ;   and add 1<<r8 to the quotient.
  subS      r8,r8,#1                ; Next shift.
  bPL       _divide_one_step
  cmp       r5,r6,lsl #1            ; Flags=divisor-2*rest.
  addLE     r7,r7,#1                ; Round to nearest integer.
  str       r7,[r0],#4              ; Save r1/r4.
  add       r4,r4,#1                ; r4 is the next divisor.
  cmp       r4,r2                   ; It was the last one?
  bLE       _make_one_inverse



;****************************************************************************
;*****                                                                  *****
;*****                            APPENDIX B                            *****
;*****                                                                  *****
;****************************************************************************
; Things are more complicated, so I divided the problem in smaller parts by
; using macros which performs high precision calculi. (64 bits) A 64 bits
; integer is noted [a|b] where a contains the upper longword.

; This macro performs an unsigned 32*32 bits multiply.
;   [m0|m1]=m2*m3. You can choose [m2|m3]=[m0|m1]
;   It destroys m0,m1,m4,m5,m6 and the flags. (C is the carry flag)
macro umul64 m0,m1,m2,m3,m4,m5,m6
{ mov       m4,m2,lsr #16           ; m4=up(m2).
  sub       m5,m2,m4,lsl #16        ; m5=down(m2).
  mov       m0,m3,lsr #16           ; m0=up(m3).
  sub       m1,m3,m0,lsl #16        ; m1=down(m3).
  mul       m6,m5,m0                ; m6=down(m2)*up(m3).
  mul       m0,m4,m0                ; m0=up(m2)*up(m3).
  mlaS      m6,m1,m4,m6             ; C+m6=up(m2)*down(m3)+down(m2)*up(m3).
  adc       m0,m0,m6,lsr #16
  mul       m1,m5,m1                ; m1=down(m2)*down(m3).
  addS      m1,m1,m6,lsl #16
  adc       m0,m0,#0                ; [m0|m1]=m2*m3.
}

; This macro adjusts the 64 bits result to 32 bits, according to the fixed
; point shift factor. (c0)
;   m0=[m1|m2]>>c0. You can choose m1=m0.
;   It destroys m0.
macro adjust64 m0,m1,m2,c0
{ mov       m0,m1,lsl #32-c0
  add       m0,m0,m2,lsr #c0
}

; This macro performs a 64 bits addition.
;   [m0|m1]=[m2|m3]+[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1].
;   It destroys [m0|m1] and the flags.
macro add64 m0,m1,m2,m3,m4,m5
{ addS      m1,m3,m5
  adc       m0,m2,m4
}

; This macro performs a 64 bits substract.
;   [m0|m1]=[m2|m3]-[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1].
;   It destroys [m0|m1] and the flags.
macro sub64 m0,m1,m2,m3,m4,m5
{ subS      m1,m3,m5
  sbc       m0,m2,m4
}

;----------------------------------------------------------------------------
; This routine calculates the sinus table, for angles between 0 and 5*pi/2.
; (So the cos table is between pi/2 and 5*pi/2) I take in account the fact
; that all sinus and cosinus values between 0 and pi/4 are positive.
; When I write sinN, it is in fact sin(N*A).
; Entry parameters...
;         r0 = adress where to generate the table.

#set sin_shift=20                   ; Shift for saved values. Must be =<28.
#set sin_nb=128                     ; Nb of angles between [0;pi/4[.

.sinA       dcd 1647089             ; sin((pi/4)/sin_nb)*2^28.
.cosA       dcd 268430403           ; cos((pi/4)/sin_nb)*2^28.

.make_sinus
  ldr       r1,sinA                 ; r1=sinA*2^28.
  ldr       r2,cosA                 ; r2=cosA*2^28.
  mov       r3,#0                   ; r3=sin0*2^28.
  mov       r4,#1<<28               ; r4=cos0*2^28.
  mov       r5,#sin_nb+1
.make_one_sinus
  mov       r6,r4,lsr #28-sin_shift ; r6=cosN*2^shift.
  str       r6,[r0,#sin_nb*2*4]     ; Save sin(N+pi/2)=cosN.
  mov       r6,r3,lsr #28-sin_shift ; r6=sinN*2^shift.
  str       r6,[r0],#4              ; Save sinN.
  umul64 r6,r7,r1,r3,r8,r9,r10      ; [r6|r7]=sinN*sinA.
  umul64 r8,r9,r2,r4,r10,r11,r12    ; [r8|r9]=cosN*cosA.
  sub64 r6,r7,r8,r9,r6,r7           ; [r6|r7]=cos(N+1)=cosN*sin1-sinN*sin1.
  umul64 r3,r8,r3,r2,r9,r10,r11     ; [r3|r8]=sinN*cosA.
  umul64 r4,r9,r4,r1,r10,r11,r12    ; [r4|r9]=cosN*sinA.
  add64 r3,r8,r3,r8,r4,r9           ; [r3|r8]=sin(N+1)=sinN*cos1+cosN*sin1.
  adjust64 r3,r3,r8,28              ; r1=sin(N+1)=sinN*cos1+cosN*sin1.
  adjust64 r4,r6,r7,28              ; r2=cos(N+1)=cosN*sin1-sinN*sin1.
  subS      r5,r5,#1                ; One sinus processed.
  bNE       make_one_sinus

  sub       r0,r0,#4                ; Complete the table by stupid copy.
  mov       r1,r0                   ; Point on the position which are like
  add       r2,r0,#sin_nb*8         ;  (pi/4+k*(pi/2))   0<=k<=4
  mov       r3,r2
  add       r4,r2,#sin_nb*8
  mov       r5,r4
  add       r6,r4,#sin_nb*8
  mov       r7,r6
  add       r8,r6,#sin_nb*8
  mov       r9,r8
  mov       r10,#sin_nb+1           ; Copy sin_nb+1 values.
._make_sinus_copy
  ldr       r11,[r0],#-4
  str       r11,[r3],#4             ; sin(pi-X)=sinX.
  str       r11,[r8],#-4            ; sin(2*pi+X)=sinX.
  rsb       r11,r11,#0
  str       r11,[r4],#-4            ; sin(pi+X)=-sinX.
  str       r11,[r7],#4             ; sin(2*pi-X)=-sinX.
  ldr       r11,[r2],#-4
  str       r11,[r1],#4             ; cos(-X)=cosX.
  subS      r10,r10,#1              ; One value copied.
  strNE     r11,[r9],#4             ; cos(2*pi+X)=cosX. No copy if r10=0.
  rsb       r11,r11,#0
  str       r11,[r5],#4             ; cos(pi-X)=-cosX.
  str       r11,[r6],#-4            ; cos(pi+X)=-cosX.
  bNE       _make_sinus_copy