[ Introduction | Modulo Shading | Bounded Shading | Generated Code | Applications | Appendix A ]

Alain BROBECKER                                    Dracula / Positivity (STe)
rte de Dardagny                                    Baah / Arm's Tech (Archie)
 01630 CHALLEX                                                      Baah (PC)
    FRANCE
----------------------------------------------------------- 22-24 december 95

                             - SHADING EFFECTS -
                              =================


Snapshots from the RSS part in the CakeHead 2 demo  


FOREWORD

    This text is aimed to assembly programmer, I doubt people programming
  in high level languages will appreciate it much. All algorithms, ideas
  and assembly code in here were made by me. (else I mention the author)
  If you use it in one of your programs, please send me a free copy of it
  and credit me. If you appreciated this text, I would be glad to know it. 



INTRODUCTION

    Everybody now speaks about environment mapping, doom engines and all the
  like. But from time to time, some coders are doing demos based upon older
  technics, and when this is mixed with nice formulaes, the result sometimes
  is really nice and refreshing. (As an example of this, I will mention the
  'Darkroom' and 'Oberon' demos on Amiga...)
    So, I will now describe a way of performing an old effect called shading.
  As far as I know, this technic takes its roots on the Amiga, and more
  precisely in the well known paint-program called "Deluxe Paint". This
  program was allowing the user to darken or brighten the part of the image
  which was under the brush shape. Coders then saw that even without being
  an artist, one could get nice patterns by simply making large movements
  with the mouse, with use of the brighten function. shadesprites were born!
    In this article, I will only speak about shadedots, but the way bigger
  brushes (sprites, boxes...) must be handled is quite similar, especially
  if you want to use generated code to go at maximum speed.
   
  
  
THE PROBLEM IS...

    I think you' ve already guessed that both functions (darken, brighten)
  are based upon the same idea. Since our goal is to do shading as fast as
  possible, we have to organise things in a friendly way even before we
  start thinking about code optimisation and the like. That' s the reason
  why I assume that, from now on, our color table contains one unique raster,
  organised in a linear way.
    According to this, we now only need to increment the value in order to
  brighten the pixel, and decrement the value to darken it. A problem then
  arises at the edges of the palette, when the pixel value is the minimum or
  the maximum one. (Generally 0 or (2^n)+1)



MODULO SHADING

    The first way of solving the problem, is to make a modulo. For the ones
  who are not familiar with this mathematical operation, I give a small
  explanation right now...
    Suppose you have p,q two integers. You learned in primary school that
  you can always make the Euclidian division of p by q. (q positive) This
  division gives two integers: the dividend a and the rest b, which are so
  that p=(a*q)+b. If b is positive, we define p mod(q)=b , and if b is
  negative, we define p mod(q)=b+q. Here are some examples...
            42 mod 5 = 2            ; 42=8*5+2.
            -8 mod 3 = 1            ; -8=-3*2-2, and -2+3=1.
    Well, those this means that, in order to perform the modulo operation,
  we do need to make an Euclidian division, which is a quite complex 
  operation? No, because we' ll make some restrictions, and then the problem
  will be reduced to a very simple operation...
    When we are working in base ten, a number is decomposed in the units,
  tens, hundreds... If we want to have the number mod 10, quite simply we
  keep only the units digit, if we want the number mod 100, we keep the
  units and tens digit. (Only in the case p is positive) So if our needs are
  reasonables (modulo by a power of ten), we have modified our problem to
  the clearing of digits. But the case of p negative is still annoying.
    But, a computer works in base 2, let' s switch to this particular base
  and see how the modulo operation behaves, with the same restriction as
  above. (ie we want the modulo by a power of 2)
            %01011010 mod %1000 = %00000010     ; 90 mod 8 = 2.
            %10111011 mod  %100 = %00000011     ; -69 mod 4 = 3.
            %11111111 mod %1000 = %00000001     ; -1 mod 8 = 7.
    As for the base 10, we only need to clear the bits (binary digits)
  which are above the factor of the power of 2 used as the modulo number.
  (This means the bits above n for a modulo by 2^n) As you have seen in the
  examples, this also works for the negative numbers. (This is due to the
  representation of negative numbers in binary, which is 256-abs(p))
    Now, you only need to know that clearing bits can be made very easily
  by using the AND operation. From now on, p mod(2^n) will be replaced by
  p and not(&ffffffff lsl n).
    Generally, the amount of colors at disposal is a power of 2, (16,32,256)
  so our solution will work. I won' t examine the case when the modulo is
  not a power of two, because this (almost) never happen. (Anyway we do all
  our best to avoid it)
    Well, I' ve spent lotsa time explaining what a modulo is, but it is
  something you will find a lot when coding in assembly language, so I
  recommend you really pay attention to it! (You will find it as soon as
  you will need a 'wrapping' image, screen, or the like)
    Maybe giving you a pseudo code which will perform the shading of a dot
  in 8bpp and 4 bpp mode is a good idea, and they are presented as macros
  in the appendix A. As you will see, there is a big use of the modulo
  operation, and it will allow you to practise if you are not familiar with
  this concept.



FASTER MODULO SHADING    

    Humm, it' s well known that we often learn things which we don' t use.
  I' ve bored you with the modulo, but there is a faster way of performing
  modulo shading which is much easier to understand. Sorry...
    Anyway, I hope you understood the modulo operation, because it is an
  important one, you' ll find it many many times. (Am I going senile?)
    Suppose you only need to shade pixels in one 'direction', then by
  choosing the colors so that it is a decrement, a faster shading can be
  performed with the following code...
            and     m9,m3,m7 lsr m8    ; m9=isolated pixie.
            subS    m9,m9,#1           ; m9=new pixel value.
            orrMI   m7,m7,m3,lsl m8    ; If new_pix<0, set it to 15 in lword.
            subPL   m7,m7,m1,lsl m8    ; Else decrement it in lword.
            str     m7,[m6]            ; Save the modified long.      
    This part of code advantageously replace part of the code you can find
  in the mod_shade_4bpp macro in appendix A, just after the pixel shift has
  been calculated in m8. (We gain one cycle, and it is easier to understand
  when you are not familiar with modulo)
    Maybe you can find a similar trick when incrementing the pixel values,
  but I' ve not investigated deeper, so you' ll have to work by your own.
  


BOUNDED SHADING
 
    All this is good, but with the brighten function of "Deluxe Paint",
  there is no modulo effect, so if a pixel is at maximum intensity you 
  can' t brighten it anymore, and one must admit it' s then much nicer this
  way. We will have this in mind, and even more... Let' s suppose we have
  N colors used for a logo, and all other colors consist in the raster,
  what we want is to shade a pixel with the bound effect if the pixel is
  not already at maximum value, or if its color is used in the logo. (So the
  logo will be left unaltered)
    As above, I suppose the shading operation is in fact a decrement, then
  we choose color palette so that the logo colors are for values 0 -> (N-1),
  and the raster is in the rest of the palette. Here is the main idea for
  the way of performing bounded shading...
            and     m9,m3,m7 lsr m8    ; m9=isolated pixie.
            subS    m9,m9,#N           ; Pixie color lower or equal to N?
            subGT   m7,m7,m1,lsl m8    ; No, then shade pixel in the long,
            strGT   m7,[m6]            ;   and save the modified long.      
    For this particular case I have written a small proggy, which draw
  shadedots in a random maner. You will notice that the logo is left
  unaltered, though I don' t redraw him at each VBl.
    The bounded shading is much faster than modulo shading, and much nicer,
  we' re quite lucky. The only problem is for the unshade operation, since
  we must then care about the two edges of the color table: don' t unshade
  if color<N-1 or if color is already full dark. (ie (2^n)-1)
    Again, I' ve not yet dwelt on the matter, so I left it up to you. Let' s
  say it' s the homeworks for next time.
    Oh, by the way, here is the code for the 8 bpp mode. You will notice that
  it' s not very mysterious... Also not it is much faster.
            cmp     m4,#N              ; Pixie color lower or equal to N?
            subGT   m4,m4,#1           ; No, then shade pixel,
            strBGT  m4,[m0,m3]         ;   and save it.      
    Also, in case you have not 'locked' some colors (ie N=0) you can even
  optimise this by writing...
            subS    m4,m4,#1           ; Shade pixel.
            strBPL  m4,[m0,m3]         ; Save it if result is positive.



GENERATED CODE

    First, I must credit Frederic ELISEI (ArmOric/Arm' s Tech) who is the
  guy behind the idea of using generated code. (I was a bit ashamed when he
  spoke about this, because I' m generally the one who puts generated code
  everywhere) Finally, we both came with exactly the same code (though Fred
  wrote it much faster than I did), and I used it in the randomly shaded
  splines (RSS) in the CakeHead 2 demo. (This demo will be out as soon as
  I have the final version of the music by Cry/Xperience)

    As we' ve seen, shading in 8bpp mode is much faster than the one in 4bpp
  mode, because we don' t need to use shifts in order to isolate the pixie.
  But in some particular cases, you will be allowed to use tricks which
  will fasten things a lot for 4bpp mode. (At least on Arm2)
    Let' s see the case of the RSS. Basically, the drawing of a RSS consist
  in drawing a random filament of X points at a given position. In a first
  version, I was performing the random walk of the filament in realtime, and
  the random walk was totaly similar to the one given in the ShadeDots
  source. Then, Fred told me I could improve things by making some routines
  which where drawing a given filament, and then randomly choose between
  one of those filament. When you generate the code you must consider you
  know the position of the pixel in the byte, and so you can use the
  following code....
    If the x position is even:
            ldrB     r6,[r7,#offset]    ; Load byte.
            tst      r6,#&f             ; 4 lowerbits of r1=0?
            subNE    r6,r6,#1           ; No, then shade the pixie.
            strB     r6,[r7,#offset]    ; And save modified byte.
    If the x position is odd:
            ldrB     r6,[r7,#offset]    ; Load byte.
            subS     r6,r6,#1<<4        ; Shade pixel in the 4 upperbits.
            strPLB   r6,[r7,#offset]    ; If pixel color>=0, save byte.
    As you see, it goes much faster, but the restrictions are also a lot
  more annoying. In case you want to draw only a dot I doubt using this
  method will be worth its price since you will need a branch. For the
  case of the RSS, it was ok because I was drawing 13 pixels one after
  another, with all positions determined.

    In case the shape is more regular than the filaments, you will certainly
  have interest to make a longword access instead of a byte access, and then
  use a code quite similar to the one above.
    If you want a regular shape and modulo shading, I think that the best
  way is not to shade one pixel after another, but to shade as many as
  possible during the same operation, this mean you isolate all odd pixies
  of the longword and put them in a long, you add &10101010 to this long,
  then you perform the modulo with &f0f0f0f0, do the same for even pixels
  and then merge the resultings longs. In fact, it will be a bit harder
  because the increment mask won' t be set for all pixels, but it won' t
  matter since it will be handled by the code generator once and for all.



APPLICATIONS

    Many people will say: "Well, all this is nice, but shading has no
  applications, except random shadedots and the like..." Of course, I' m
  here to demonstrate the contrary.
    First of all, you can use shadedots for all particles systems, and this
  can give very nice results, if the particle system gives nice movements!
  (This is an alternative to the standard bitplanes effects on can find on
  Amiga or Atari ST) Remember that the Darkroom demo won the second prize
  at the Assembly 94 party, just because it was nice. (No 3d or the like)
    Also, one can always find new ideas which are not all that bad. To tell
  the truth I' m quite proud of the RSS effect in the CakeHead 2. And the
  only thing you need to do in order to create an original effect is to
  mix two technics. (splines and random shadedots)
    Among other things, one can use the random shadedots technic to do
  logos, and I have also thought about a burning papersheet. (You only need
  to make movements less random, and choose other colors)
    As soon as you will dwell on the subject, I' m sure you will find ideas
  of your own, and the result will certainly be more original than the
  classical texture mapping you were working on.
    Also, I would like to point out the fact that coding a particles system
  is not all that dull, and to find nice formulaes is, by my point of view,
  as interesting as making 'brutal' code. (I like both ways of coding, as
  long as it is made in assembly, but many people deny 'unbrutal' code!)


                                 - THE END -



;****************************************************************************
;*****                                                                  *****
;*****                            APPENDIX A                            *****
;*****                                                                  *****
;****************************************************************************
;   Here are two shading routines, one corresponding to the mode9 of the
; Archimedes, and the second one corresponding to a 256 colors mode. (Not
; the mode13 of the Archimedes, but one with redefinable colors)
;   To perform an unshade operation, you only need to change the add #1
; into a sub #1. The mode9 routine takes a lot of register, but it is
; normally quite fast. (Maybe things can go faster   

;----------------------------------------------------------------------------
; We calculate the adress of the long containing the pixel and the shift to
; access it in the longword. Then, we keep a track of the longword but
; without the pixie, we increment the pixie in the full long, without caring
; about overflows on other pixies, then we take the result of pix+=1 mod16,
; and merge it with the long without the pixie.
; Parameters for this macro are
;     m0=videoram adress  |   m3=&f (for mod16)
;     m1=&1               |   m4=x (0-319)
;     m2=&7<<2            |   m5=y (0-255)
;     m6-m9 are temporary registers. (We can choose m6=m5 and m8=m4)
macro mod_shade_4bpp m0,m1,m2,m3,m4,m5,m6,m7,m8,m9  
{ add       m6,m5,m5,lsl #2         ; Make m6 point on good line.
  add       m6,m0,m6,lsl #5
  mov       m7,m4,lsr #3            ; m7=int(x/8).
  ldr       m7,[m6,m7,lsl #2]!      ; m6=longword adress and load it in m7.
  and       m8,m2,m4,lsl #2         ; m8=4*(x mod8)=shift to access pixel.
  bic       m9,m7,m3,lsl m8         ; m9=longword without pixie to shade.
  add       m7,m7,m1,lsl m8         ; m7=longword with pixie incremented.
  and       m7,m7,m3,lsl m8         ; m7=pixie incremented mod16.
  add       m7,m9,m7                ; Merge long without pixie and new pixie.
  str       m7,[m6]                 ; Save the modified long.      
}

;----------------------------------------------------------------------------
;   We calculate the adress of the byte containing the pixel, we increment
; it, and then we save it. You must notice that the mod256 operation is
; performed by saving the result as a byte.
; Parameters for this macro are
;     m0=videoram adress
;     m1=x (0-319)
;     m2=y (0-255)
;     m3-m9 are temporary registers. (We can choose m6=m5 and m8=m4)
macro mod_shade_8bpp m0,m1,m2,m3,m4,m5,m6,m7,m8,m9  
{ add       m3,m2,m2,lsl #2         ; Make m3 point on good byte.
  add       m3,m1,m3,lsl #6         ;     (m3=x+320*y) 
  ldrB      m4,[m0,m3]              ; Load byte containing pixel.
  add       m4,m4,#1                ; Increment pixie.
  strB      m4,[m0,m3]              ; Save modified pixie.
}