Alain BROBECKER Dracula / Positivity (STe) rte de Dardagny Baah / Arm's Tech (Archie) 01630 CHALLEX Baah (PC) FRANCE ---------------------------------------------------------------- 7 october 95 - MEMORY WAR - ============ This text is dedicated to ArmOric, my fave precalc-tables dealer. ;) FOREWORD This text is aimed to assembly programmer, I doubt people programming in high level languages will appreciate it much. All algorithms, ideas and assembly code in here were made by me. (else I mention the author) If you use it in one of your programs, please send me a free copy of it and credit me. If you appreciated this text, I would be glad to know it. INTRODUCTION The most effective packer is not a program, but your own grey cells. In a program, the size of the code (if it is in full assembly language I mean) is ridiculous compared to the size used by datas! So, instead of generating your precalculations table and store them on the disk, why don' t you code a small routine which will generate the table and save it in memory just before the main program is launched? Maybe you will consider it a waste of time, but the owners of modests computers (I own a 1 Mb floppy based Archimedes 3000) will thank you a lot! And well, by programming datas generator in assembly language, you often learn a lot! INVERSES TABLE Since the Archimedes processor does not have a division instruction, we generally use a table of inverses and then replace divisions by multiplications. Making the table in basic and including it as binary datas is a real waste of disk space, because a routine which will create the inverses table is really small and easy to code. In appendix A, you will find such a routine. It is 68 bytes long, so if you generate a 1024 inverses table, the 'packing ratio' is 1:60. I really doubt your fave packer can achieve the same. SINUS TABLE Sinus are certainly one of the most widely used datas when it comes to demo or game programming. The generation of such a table is not as easy as for the inverses, because we will need some (simple) mathematics in order to create it. Here are the needed formulas... 1) cosX=sin(X+pi/2) 2) sin(X+pi)=-sinX 3) sin(pi-X)=sinX 4) sin(pi/2-X)=sin(X+pi/2) because cos(-X)=cosX 5) cos(X+Y)=cosX*cosY-sinX*sinY 6) sin(X+Y)=cosX*sinY+sinX*cosY The formula 1 tells us a very important thing: sinus and cosinus are tightly bound. From now on I consider the cosinus table is the same as the sinus table, but with a slight shift. With the formulas 1-4, we easily deduct that all sinus values could be found if only we knew the sinus and cosinus values for the angles between [0;pi/4]. So, the problem is now smaller, we only need to calculate those values and then construct the rest of the table with formulas 1-4. Now consider the well known formulas 5 and 6. Let us call A the minimum angle you want in your table, (must be a divisor of pi/4) and let' s say we have sinA and cosA already calculated. (included them as datas for example, takes 8 bytes) By using formulas 5 and 6, we immediatly obtain sin(2A) and cos(2A), by using them anew we obtain sin(3A) and cos(3A), and we continue in this iterative way until we reach n*A=pi/4. Here is a small basic proggy which is simple enough to be considered as the algorithm... REM Note that sin((n+1)*A) and cos((n+1)*A) will be calculated, REM but they won' t be saved. n=64 : REM nb of angles between [0;pi/4[. DIM sinus(8*n-1) sinA=SIN((PI/4)/n) : REM Better to have them as constants. cosA=COS((PI/4)/n) s=0 : REM s=SIN(0) c=1 : REM c=COS(0) FOR i=0 TO n+1 sinus(i)=s : REM Save sin(i) and cos(i) in table. sinus(i+2*n)=c temp=c*cosA-s*sinA s=c*sinA+s*cosA c=temp NEXT i There are two problems with this algorithm... The first one is that it is not as easy to implement in assembly language as the inverses table creator, because it requires high precision fixed point 'mathematics' in order to give precise results, even for small tables. In appendix B you will find my asm sinus creator. With the given values, the resulting table is perfect, so you needn' t worry too much about precision unless you need a very big table. The routine is 360 bytes long, and creates a 1280 sinus table, so the 'packing ratio' is 1:14. It is not as impressive as for inverses, but still very good. (it can even be smaller, if you merge the two loops, but you must make sure the offset for ldr/str won' t exceed 4096 bytes) Oh, by the way, I arbitrarily took 28 bits for the fractionnary part of my fixed point reals. You can take a bigger value, (it is recommended if you need higher precision) my choice was directed by habit since it is the value I have used for some of my proggys. Also, you can increase the precision by rounding to nearest integer in the adjust64 macro. (No need to do it in other macros, it won' t be noticeable when the result will be adjusted to 32 bits) The second problem appears when you need a very huge and precise table. Whatever A you take, you will have an error on sinA and cosA. (because cos(pi/4) is an irrationnal number, and so cannot be calculated with rationnal ones) And the iterative algorithm above is so that the error between real sinus values and the one calculated increases at each iteration. If you need a lot of iterations, the error may be terrific. By chance I dicovered a trick which minimizes the number of iterations. Quite simply, suppose you have already calculated all sinus and cosinus between [0;k*A], then you can calculate all sinus for angles [(k+1)*A;2*k*a] with only one more iteration.Let' s say n is a number between [1;k], then... sin((k+n)*A)=cos(kA)*sin(nA)+sin(kA)*cos(nA) cos((k+n)*A)=cos(kA)*cos(nA)-sin(kA)*sin(nA) Again, I give you a basic proggy, which I consider as the algorithm for this second, more precise, method to create your sinus table. There is a restriction which is that the nb of angles between [0;pi/4[ must be a power of 2. Maybe it can be avoided, but I don' t think this restriction is very annoying, so I din' t try to avoid it. m=12 n=2^m : REM nb of angles between [0;pi/4[. DIM sinus(8*n-1) sinus(0)=0 : REM Save the first values of sinus. sinus(2*n)=1 sinus(1)=SIN((PI/4)/n) sinus(2*n+1)=COS((PI/4)/n) FOR i=1 TO m z=2^(i-1) sinB=sinus(z) : REM Will be used in the second loop. cosB=sin(z+2*n) FOR j=1 TO o sinus(z+j)=cosB*sinus(j)+sinB*sinus(j+2*n) sinus(z+j+2*n)=cosB*sinus(j+2*n)-sinB*sinus(j) NEXT j NEXT i It is sad, but I have not written an assembly version of this algorithm, because up to now I have never needed higher precision sinus table. So, for once you will have to work by your own! :) I expect the routine to be a bit bigger in size, but since it will be devoted to very big tables it will certainly be worth its price anyway. Another thing not to forget is that you will use values saved in the table, so you must save them with maximum precision. EXP-LOG TABLE Again, something I have never used, but ArmOric gave me the basic idea, and maybe some people will have the use of it. (As far as I know, you can find logarithms in some fractals, so it is not as uninteresting as it may seem) Quite simply, consider the following formulas... 7) exp(A+B)=expA*expB 8) log(A*B)=logA*logB They both look similar to the one given in sinus tables, no? (Except that they are stand-alone formulas) I bet I don' t need to say much, the way 7 & 8 must be handled is clear. Thoses tables will be much bigger than sinus table, because they are not periodic, so you will be forced to minimize the number of iterations here too. I recommend an algorithm looking like the second one given for sinus. - THE END - ;**************************************************************************** ;***** ***** ;***** APPENDIX A ***** ;***** ***** ;**************************************************************************** ; This routine calculates all the division between r1/1 and r1/r2, and ; save them into [r0]. Each inverse is contained in a longword. ; Entry parameters... ; r0 = adress where to generate the table. ; r1 = premultiplication factor for the inverses. ; r2 = last number to take the inverse of. .make_inverses_table mov r3,#1 ; Used by the division routine. mov r4,#1 ; r4 is the current divisor. ._make_one_inverse mov r5,r4 ; r5=divisor. mov r6,r1 ; r6=premul factor=dividend. mov r7,#0 ; r7 will contain the quotient of r1/r4. mov r8,#15 ; Shift for the division. ._divide_one_step cmp r6,r5,lsl r8 ; dividend bigger than divisor<<r8? subGE r6,r6,r5,lsl r8 ; Yes, then dividend-=divisor<<r8, addGE r7,r7,r3,lsl r8 ; and add 1<<r8 to the quotient. subS r8,r8,#1 ; Next shift. bPL _divide_one_step cmp r5,r6,lsl #1 ; Flags=divisor-2*rest. addLE r7,r7,#1 ; Round to nearest integer. str r7,[r0],#4 ; Save r1/r4. add r4,r4,#1 ; r4 is the next divisor. cmp r4,r2 ; It was the last one? bLE _make_one_inverse ;**************************************************************************** ;***** ***** ;***** APPENDIX B ***** ;***** ***** ;**************************************************************************** ; Things are more complicated, so I divided the problem in smaller parts by ; using macros which performs high precision calculi. (64 bits) A 64 bits ; integer is noted [a|b] where a contains the upper longword. ; This macro performs an unsigned 32*32 bits multiply. ; [m0|m1]=m2*m3. You can choose [m2|m3]=[m0|m1] ; It destroys m0,m1,m4,m5,m6 and the flags. (C is the carry flag) macro umul64 m0,m1,m2,m3,m4,m5,m6 { mov m4,m2,lsr #16 ; m4=up(m2). sub m5,m2,m4,lsl #16 ; m5=down(m2). mov m0,m3,lsr #16 ; m0=up(m3). sub m1,m3,m0,lsl #16 ; m1=down(m3). mul m6,m5,m0 ; m6=down(m2)*up(m3). mul m0,m4,m0 ; m0=up(m2)*up(m3). mlaS m6,m1,m4,m6 ; C+m6=up(m2)*down(m3)+down(m2)*up(m3). adc m0,m0,m6,lsr #16 mul m1,m5,m1 ; m1=down(m2)*down(m3). addS m1,m1,m6,lsl #16 adc m0,m0,#0 ; [m0|m1]=m2*m3. } ; This macro adjusts the 64 bits result to 32 bits, according to the fixed ; point shift factor. (c0) ; m0=[m1|m2]>>c0. You can choose m1=m0. ; It destroys m0. macro adjust64 m0,m1,m2,c0 { mov m0,m1,lsl #32-c0 add m0,m0,m2,lsr #c0 } ; This macro performs a 64 bits addition. ; [m0|m1]=[m2|m3]+[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1]. ; It destroys [m0|m1] and the flags. macro add64 m0,m1,m2,m3,m4,m5 { addS m1,m3,m5 adc m0,m2,m4 } ; This macro performs a 64 bits substract. ; [m0|m1]=[m2|m3]-[m4|m5]. You can choose [m2|m3] or [m4|m5]=[m0|m1]. ; It destroys [m0|m1] and the flags. macro sub64 m0,m1,m2,m3,m4,m5 { subS m1,m3,m5 sbc m0,m2,m4 } ;---------------------------------------------------------------------------- ; This routine calculates the sinus table, for angles between 0 and 5*pi/2. ; (So the cos table is between pi/2 and 5*pi/2) I take in account the fact ; that all sinus and cosinus values between 0 and pi/4 are positive. ; When I write sinN, it is in fact sin(N*A). ; Entry parameters... ; r0 = adress where to generate the table. #set sin_shift=20 ; Shift for saved values. Must be =<28. #set sin_nb=128 ; Nb of angles between [0;pi/4[. .sinA dcd 1647089 ; sin((pi/4)/sin_nb)*2^28. .cosA dcd 268430403 ; cos((pi/4)/sin_nb)*2^28. .make_sinus ldr r1,sinA ; r1=sinA*2^28. ldr r2,cosA ; r2=cosA*2^28. mov r3,#0 ; r3=sin0*2^28. mov r4,#1<<28 ; r4=cos0*2^28. mov r5,#sin_nb+1 .make_one_sinus mov r6,r4,lsr #28-sin_shift ; r6=cosN*2^shift. str r6,[r0,#sin_nb*2*4] ; Save sin(N+pi/2)=cosN. mov r6,r3,lsr #28-sin_shift ; r6=sinN*2^shift. str r6,[r0],#4 ; Save sinN. umul64 r6,r7,r1,r3,r8,r9,r10 ; [r6|r7]=sinN*sinA. umul64 r8,r9,r2,r4,r10,r11,r12 ; [r8|r9]=cosN*cosA. sub64 r6,r7,r8,r9,r6,r7 ; [r6|r7]=cos(N+1)=cosN*sin1-sinN*sin1. umul64 r3,r8,r3,r2,r9,r10,r11 ; [r3|r8]=sinN*cosA. umul64 r4,r9,r4,r1,r10,r11,r12 ; [r4|r9]=cosN*sinA. add64 r3,r8,r3,r8,r4,r9 ; [r3|r8]=sin(N+1)=sinN*cos1+cosN*sin1. adjust64 r3,r3,r8,28 ; r1=sin(N+1)=sinN*cos1+cosN*sin1. adjust64 r4,r6,r7,28 ; r2=cos(N+1)=cosN*sin1-sinN*sin1. subS r5,r5,#1 ; One sinus processed. bNE make_one_sinus sub r0,r0,#4 ; Complete the table by stupid copy. mov r1,r0 ; Point on the position which are like add r2,r0,#sin_nb*8 ; (pi/4+k*(pi/2)) 0<=k<=4 mov r3,r2 add r4,r2,#sin_nb*8 mov r5,r4 add r6,r4,#sin_nb*8 mov r7,r6 add r8,r6,#sin_nb*8 mov r9,r8 mov r10,#sin_nb+1 ; Copy sin_nb+1 values. ._make_sinus_copy ldr r11,[r0],#-4 str r11,[r3],#4 ; sin(pi-X)=sinX. str r11,[r8],#-4 ; sin(2*pi+X)=sinX. rsb r11,r11,#0 str r11,[r4],#-4 ; sin(pi+X)=-sinX. str r11,[r7],#4 ; sin(2*pi-X)=-sinX. ldr r11,[r2],#-4 str r11,[r1],#4 ; cos(-X)=cosX. subS r10,r10,#1 ; One value copied. strNE r11,[r9],#4 ; cos(2*pi+X)=cosX. No copy if r10=0. rsb r11,r11,#0 str r11,[r5],#4 ; cos(pi-X)=-cosX. str r11,[r6],#-4 ; cos(pi+X)=-cosX. bNE _make_sinus_copy