Alain BROBECKER Dracula / Positivity (STe) rte de Dardagny Baah / Arm's Tech (Archie) 01630 CHALLEX Baah (PC) FRANCE ----------------------------------------------------------- 22-24 december 95 - SHADING EFFECTS - =================FOREWORD This text is aimed to assembly programmer, I doubt people programming in high level languages will appreciate it much. All algorithms, ideas and assembly code in here were made by me. (else I mention the author) If you use it in one of your programs, please send me a free copy of it and credit me. If you appreciated this text, I would be glad to know it. INTRODUCTION Everybody now speaks about environment mapping, doom engines and all the like. But from time to time, some coders are doing demos based upon older technics, and when this is mixed with nice formulaes, the result sometimes is really nice and refreshing. (As an example of this, I will mention the 'Darkroom' and 'Oberon' demos on Amiga...) So, I will now describe a way of performing an old effect called shading. As far as I know, this technic takes its roots on the Amiga, and more precisely in the well known paint-program called "Deluxe Paint". This program was allowing the user to darken or brighten the part of the image which was under the brush shape. Coders then saw that even without being an artist, one could get nice patterns by simply making large movements with the mouse, with use of the brighten function. shadesprites were born! In this article, I will only speak about shadedots, but the way bigger brushes (sprites, boxes...) must be handled is quite similar, especially if you want to use generated code to go at maximum speed. THE PROBLEM IS... I think you' ve already guessed that both functions (darken, brighten) are based upon the same idea. Since our goal is to do shading as fast as possible, we have to organise things in a friendly way even before we start thinking about code optimisation and the like. That' s the reason why I assume that, from now on, our color table contains one unique raster, organised in a linear way. According to this, we now only need to increment the value in order to brighten the pixel, and decrement the value to darken it. A problem then arises at the edges of the palette, when the pixel value is the minimum or the maximum one. (Generally 0 or (2^n)+1) MODULO SHADING The first way of solving the problem, is to make a modulo. For the ones who are not familiar with this mathematical operation, I give a small explanation right now... Suppose you have p,q two integers. You learned in primary school that you can always make the Euclidian division of p by q. (q positive) This division gives two integers: the dividend a and the rest b, which are so that p=(a*q)+b. If b is positive, we define p mod(q)=b , and if b is negative, we define p mod(q)=b+q. Here are some examples... 42 mod 5 = 2 ; 42=8*5+2. -8 mod 3 = 1 ; -8=-3*2-2, and -2+3=1. Well, those this means that, in order to perform the modulo operation, we do need to make an Euclidian division, which is a quite complex operation? No, because we' ll make some restrictions, and then the problem will be reduced to a very simple operation... When we are working in base ten, a number is decomposed in the units, tens, hundreds... If we want to have the number mod 10, quite simply we keep only the units digit, if we want the number mod 100, we keep the units and tens digit. (Only in the case p is positive) So if our needs are reasonables (modulo by a power of ten), we have modified our problem to the clearing of digits. But the case of p negative is still annoying. But, a computer works in base 2, let' s switch to this particular base and see how the modulo operation behaves, with the same restriction as above. (ie we want the modulo by a power of 2) %01011010 mod %1000 = %00000010 ; 90 mod 8 = 2. %10111011 mod %100 = %00000011 ; -69 mod 4 = 3. %11111111 mod %1000 = %00000001 ; -1 mod 8 = 7. As for the base 10, we only need to clear the bits (binary digits) which are above the factor of the power of 2 used as the modulo number. (This means the bits above n for a modulo by 2^n) As you have seen in the examples, this also works for the negative numbers. (This is due to the representation of negative numbers in binary, which is 256-abs(p)) Now, you only need to know that clearing bits can be made very easily by using the AND operation. From now on, p mod(2^n) will be replaced by p and not(&ffffffff lsl n). Generally, the amount of colors at disposal is a power of 2, (16,32,256) so our solution will work. I won' t examine the case when the modulo is not a power of two, because this (almost) never happen. (Anyway we do all our best to avoid it) Well, I' ve spent lotsa time explaining what a modulo is, but it is something you will find a lot when coding in assembly language, so I recommend you really pay attention to it! (You will find it as soon as you will need a 'wrapping' image, screen, or the like) Maybe giving you a pseudo code which will perform the shading of a dot in 8bpp and 4 bpp mode is a good idea, and they are presented as macros in the appendix A. As you will see, there is a big use of the modulo operation, and it will allow you to practise if you are not familiar with this concept. FASTER MODULO SHADING Humm, it' s well known that we often learn things which we don' t use. I' ve bored you with the modulo, but there is a faster way of performing modulo shading which is much easier to understand. Sorry... Anyway, I hope you understood the modulo operation, because it is an important one, you' ll find it many many times. (Am I going senile?) Suppose you only need to shade pixels in one 'direction', then by choosing the colors so that it is a decrement, a faster shading can be performed with the following code... and m9,m3,m7 lsr m8 ; m9=isolated pixie. subS m9,m9,#1 ; m9=new pixel value. orrMI m7,m7,m3,lsl m8 ; If new_pix<0, set it to 15 in lword. subPL m7,m7,m1,lsl m8 ; Else decrement it in lword. str m7,[m6] ; Save the modified long. This part of code advantageously replace part of the code you can find in the mod_shade_4bpp macro in appendix A, just after the pixel shift has been calculated in m8. (We gain one cycle, and it is easier to understand when you are not familiar with modulo) Maybe you can find a similar trick when incrementing the pixel values, but I' ve not investigated deeper, so you' ll have to work by your own. BOUNDED SHADING All this is good, but with the brighten function of "Deluxe Paint", there is no modulo effect, so if a pixel is at maximum intensity you can' t brighten it anymore, and one must admit it' s then much nicer this way. We will have this in mind, and even more... Let' s suppose we have N colors used for a logo, and all other colors consist in the raster, what we want is to shade a pixel with the bound effect if the pixel is not already at maximum value, or if its color is used in the logo. (So the logo will be left unaltered) As above, I suppose the shading operation is in fact a decrement, then we choose color palette so that the logo colors are for values 0 -> (N-1), and the raster is in the rest of the palette. Here is the main idea for the way of performing bounded shading... and m9,m3,m7 lsr m8 ; m9=isolated pixie. subS m9,m9,#N ; Pixie color lower or equal to N? subGT m7,m7,m1,lsl m8 ; No, then shade pixel in the long, strGT m7,[m6] ; and save the modified long. For this particular case I have written a small proggy, which draw shadedots in a random maner. You will notice that the logo is left unaltered, though I don' t redraw him at each VBl. The bounded shading is much faster than modulo shading, and much nicer, we' re quite lucky. The only problem is for the unshade operation, since we must then care about the two edges of the color table: don' t unshade if color<N-1 or if color is already full dark. (ie (2^n)-1) Again, I' ve not yet dwelt on the matter, so I left it up to you. Let' s say it' s the homeworks for next time. Oh, by the way, here is the code for the 8 bpp mode. You will notice that it' s not very mysterious... Also not it is much faster. cmp m4,#N ; Pixie color lower or equal to N? subGT m4,m4,#1 ; No, then shade pixel, strBGT m4,[m0,m3] ; and save it. Also, in case you have not 'locked' some colors (ie N=0) you can even optimise this by writing... subS m4,m4,#1 ; Shade pixel. strBPL m4,[m0,m3] ; Save it if result is positive. GENERATED CODE First, I must credit Frederic ELISEI (ArmOric/Arm' s Tech) who is the guy behind the idea of using generated code. (I was a bit ashamed when he spoke about this, because I' m generally the one who puts generated code everywhere) Finally, we both came with exactly the same code (though Fred wrote it much faster than I did), and I used it in the randomly shaded splines (RSS) in the CakeHead 2 demo. (This demo will be out as soon as I have the final version of the music by Cry/Xperience) As we' ve seen, shading in 8bpp mode is much faster than the one in 4bpp mode, because we don' t need to use shifts in order to isolate the pixie. But in some particular cases, you will be allowed to use tricks which will fasten things a lot for 4bpp mode. (At least on Arm2) Let' s see the case of the RSS. Basically, the drawing of a RSS consist in drawing a random filament of X points at a given position. In a first version, I was performing the random walk of the filament in realtime, and the random walk was totaly similar to the one given in the ShadeDots source. Then, Fred told me I could improve things by making some routines which where drawing a given filament, and then randomly choose between one of those filament. When you generate the code you must consider you know the position of the pixel in the byte, and so you can use the following code.... If the x position is even: ldrB r6,[r7,#offset] ; Load byte. tst r6,#&f ; 4 lowerbits of r1=0? subNE r6,r6,#1 ; No, then shade the pixie. strB r6,[r7,#offset] ; And save modified byte. If the x position is odd: ldrB r6,[r7,#offset] ; Load byte. subS r6,r6,#1<<4 ; Shade pixel in the 4 upperbits. strPLB r6,[r7,#offset] ; If pixel color>=0, save byte. As you see, it goes much faster, but the restrictions are also a lot more annoying. In case you want to draw only a dot I doubt using this method will be worth its price since you will need a branch. For the case of the RSS, it was ok because I was drawing 13 pixels one after another, with all positions determined. In case the shape is more regular than the filaments, you will certainly have interest to make a longword access instead of a byte access, and then use a code quite similar to the one above. If you want a regular shape and modulo shading, I think that the best way is not to shade one pixel after another, but to shade as many as possible during the same operation, this mean you isolate all odd pixies of the longword and put them in a long, you add &10101010 to this long, then you perform the modulo with &f0f0f0f0, do the same for even pixels and then merge the resultings longs. In fact, it will be a bit harder because the increment mask won' t be set for all pixels, but it won' t matter since it will be handled by the code generator once and for all. APPLICATIONS Many people will say: "Well, all this is nice, but shading has no applications, except random shadedots and the like..." Of course, I' m here to demonstrate the contrary. First of all, you can use shadedots for all particles systems, and this can give very nice results, if the particle system gives nice movements! (This is an alternative to the standard bitplanes effects on can find on Amiga or Atari ST) Remember that the Darkroom demo won the second prize at the Assembly 94 party, just because it was nice. (No 3d or the like) Also, one can always find new ideas which are not all that bad. To tell the truth I' m quite proud of the RSS effect in the CakeHead 2. And the only thing you need to do in order to create an original effect is to mix two technics. (splines and random shadedots) Among other things, one can use the random shadedots technic to do logos, and I have also thought about a burning papersheet. (You only need to make movements less random, and choose other colors) As soon as you will dwell on the subject, I' m sure you will find ideas of your own, and the result will certainly be more original than the classical texture mapping you were working on. Also, I would like to point out the fact that coding a particles system is not all that dull, and to find nice formulaes is, by my point of view, as interesting as making 'brutal' code. (I like both ways of coding, as long as it is made in assembly, but many people deny 'unbrutal' code!) - THE END - ;**************************************************************************** ;***** ***** ;***** APPENDIX A ***** ;***** ***** ;**************************************************************************** ; Here are two shading routines, one corresponding to the mode9 of the ; Archimedes, and the second one corresponding to a 256 colors mode. (Not ; the mode13 of the Archimedes, but one with redefinable colors) ; To perform an unshade operation, you only need to change the add #1 ; into a sub #1. The mode9 routine takes a lot of register, but it is ; normally quite fast. (Maybe things can go faster ;---------------------------------------------------------------------------- ; We calculate the adress of the long containing the pixel and the shift to ; access it in the longword. Then, we keep a track of the longword but ; without the pixie, we increment the pixie in the full long, without caring ; about overflows on other pixies, then we take the result of pix+=1 mod16, ; and merge it with the long without the pixie. ; Parameters for this macro are ; m0=videoram adress | m3=&f (for mod16) ; m1=&1 | m4=x (0-319) ; m2=&7<<2 | m5=y (0-255) ; m6-m9 are temporary registers. (We can choose m6=m5 and m8=m4) macro mod_shade_4bpp m0,m1,m2,m3,m4,m5,m6,m7,m8,m9 { add m6,m5,m5,lsl #2 ; Make m6 point on good line. add m6,m0,m6,lsl #5 mov m7,m4,lsr #3 ; m7=int(x/8). ldr m7,[m6,m7,lsl #2]! ; m6=longword adress and load it in m7. and m8,m2,m4,lsl #2 ; m8=4*(x mod8)=shift to access pixel. bic m9,m7,m3,lsl m8 ; m9=longword without pixie to shade. add m7,m7,m1,lsl m8 ; m7=longword with pixie incremented. and m7,m7,m3,lsl m8 ; m7=pixie incremented mod16. add m7,m9,m7 ; Merge long without pixie and new pixie. str m7,[m6] ; Save the modified long. } ;---------------------------------------------------------------------------- ; We calculate the adress of the byte containing the pixel, we increment ; it, and then we save it. You must notice that the mod256 operation is ; performed by saving the result as a byte. ; Parameters for this macro are ; m0=videoram adress ; m1=x (0-319) ; m2=y (0-255) ; m3-m9 are temporary registers. (We can choose m6=m5 and m8=m4) macro mod_shade_8bpp m0,m1,m2,m3,m4,m5,m6,m7,m8,m9 { add m3,m2,m2,lsl #2 ; Make m3 point on good byte. add m3,m1,m3,lsl #6 ; (m3=x+320*y) ldrB m4,[m0,m3] ; Load byte containing pixel. add m4,m4,#1 ; Increment pixie. strB m4,[m0,m3] ; Save modified pixie. }

Snapshots from the RSS part in the CakeHead 2 demo