Can't remove these constructions? 
Author Message
 Can't remove these constructions?

Hello,
        I am currently working on a windows screensaver that draws flaming
meteors on the screen. The c++ performance was less than ideal, so I
converted the main fire routine to asm(TASM) and called it as a function
from c++. It was a good idea and I got a major performance gain, but
there are some strange problems with it. There are two lines in the
code, an "inc edi" and an "xor ah, ah", which do nothing(they are
leftovers from previous attempts). Whenever I try to remove either of
these lines, the performance drops considerably. The two lines in
question are the ones with the stars next to them(code at the end of
post).
Performance on my PII 333Mhz running Windows95
        -No lines removed: 49.75fps
        -"inc edi" removed: 37.67fps
        -"xor ah, ah" removed: 31.18fps
        -both removed: 17.98fps
As you can see, these are serious hits. The xor function doesn't matter
and could stay without causing a problem, but the inc screws up the
logic. Any help you can give would be greatly appreciated. I apologize
if the formating on the code is ugly, I tried my best and it looked OK
in my newsreader.

.386
.MODEL USE32 FLAT
.radix 10
PUBLIC _do_fire
.DATA
X_RES equ 640  ;Horizontal Resolution
Y_RES equ 480  ;Vertical Resolution
.CODE

;do_fire (void *dest, void *source)
;dest is the buffer to burn from
;source is the buffer to burn to
;What it does:
;       -Applies a fire effect from source to dest
;       -Copies dest to source
_do_fire proc C dest:DATAPTR, source:DATAPTR
   mov ecx, dest        ;ecx is dest
   add ecx, X_RES+1             ;ecx is dest + dest index
   mov edi, source              ;edi is source
   mov edx, edi                 ;edx is source
   add edi, X_RES+1     ;edi is source + source index
   add edx, (Y_RES-2)*X_RES+X_RES-3 ;edx is source + MAX as a                                              
;stopping point for the loop
   xor ah, ah           ;ah=0
   TheLoop:
      xor bx, bx                                        ;bx=0
      mov al, byte ptr [edi]  ;al is source[index](the current pixel)
      add bx, ax
      dec edi                 ;edi is now index-1
      mov al, byte ptr [edi]  ;al is source[index-1]
                              ;(the pixel to the left)
      add bx, ax
      add edi, 2              ;edi is index+1
      mov al, byte ptr [edi]  ;al is source[index+1]
                              ;(the pixel to the right)
      add bx, ax
      push edi                      ;save index+1 for later
      add edi, X_RES-1              ;edi is index+X_RES
      mov al, byte ptr [edi]        ;al is source[index+X_RES]
                                    ;(the pixel below)
      pop edi                       ;get index+1 back
******xor ah, ah                    ;nothing, a problem
      add bx, ax
      shr bx, 2                     ;divide sum of pixels by 4
      mov byte ptr [ecx], bl        ;store new value in dest
******inc edi                       ;next srcindex, unnecessary, screws
                                    ;output
      inc ecx                       ;next dest index
      cmp edi, edx                  ;have we finished?
      jne TheLoop                   ;if not, repeat
   cld                              ;make sure dir flag is right
   mov esi, dest                    ;esi->source
   mov edi, source                  ;edi->dest
   mov ecx, X_RES*Y_RES/4           ;ecx is (#of bytes to move)/4
   rep movsd                        ;move dwords
   ret                              ;were out!
_do_fire endp
end

--
/->John  Woehler<-\       |"Scientists belive that the universe is made
/Georgia Institute\       |of hydrogen, because they claim it's the most
/  of Technology  \       |plentiful ingredient. I claim that the most



Wed, 26 Sep 2001 03:00:00 GMT  
 Can't remove these constructions?
You might try something like this for the core of the loop...this isn't
optimized (and I haven't even tried it), but te concept is going to be better
than what you've got now...

        xor     ebx,ebx
        xor     eax,eax
   TheLoop:
        mov     al,[edi-1]
        add     ebx,eax
        mov     al,[edi]
        add     ebx,eax
        mov     al,[edi+1]
        add     ebx,eax
        add     al,[edi+X_RES]
        add     eax,ebx
        shr     eax,2
        mov     [ecx],al

This stays away from 16 bit'isms (and their attendent prefix overrides that
hose up the pipe).  The loads/add of the 1st 3 adjacent pixels should happen
fairly quickly in most cases as they'll usually be within the same L1 cache
line.



Wed, 26 Sep 2001 03:00:00 GMT  
 Can't remove these constructions?

Quote:
>******xor ah, ah                        ;nothing, a problem
>      add bx, ax

You ought to use 32 bit registers in 32 bit modes. It's faster. =)  As far as
the xor ah,ah goes, I don't know why that would be a problem to take out,
unless it had some alignment thingy that was wrong or if you were conditionally
using AH/AX/EAX but you don't seem to be doing that.

Quote:
>      shr bx, 2                     ;divide sum of pixels by 4
>      mov byte ptr [ecx], bl        ;store new value in dest
>******inc edi                           ;next srcindex, unnecessary, screws
>                                        ;output

Well, from what I can tell, if you inc edi, you'd have an infinite loop, but I
probably just missed something, maybe. In any case, you're conditionally
comparing EDI below so if you add to it now, you could be accelerating the
process, thus making it faster as it doesn't have to go through as many loops.

Quote:
>      inc ecx                       ;next dest index
>      cmp edi, edx                  ;have we finished?
>      jne TheLoop                   ;if not,

  - vulture a.k.a. Sean Stanek


Wed, 26 Sep 2001 03:00:00 GMT  
 Can't remove these constructions?

Quote:

> You might try something like this for the core of the
>loop...this isn't
> optimized (and I haven't even tried it), but te concept is going
>to be better
> than what you've got now...

<snip some code>

> This stays away from 16 bit'isms (and their attendent prefix
>overrides that
> hose up the pipe).

I tried it with the 32bit regs, but the 16bit one were faster. Was
there a reason that you switched eax and ebx at the end?
        Thanks For Your Help!

--
/->John  Woehler<-\       |"Scientists belive that the universe is made
/Georgia Institute\       |of hydrogen, because they claim it's the most
/  of Technology  \       |plentiful ingredient. I claim that the most



Wed, 26 Sep 2001 03:00:00 GMT  
 Can't remove these constructions?

Quote:

> >******xor ah, ah                   ;nothing, a problem
> >      add bx, ax

> You ought to use 32 bit registers in 32 bit modes. It's faster. =)  As

I tried that, but it runs faster using the 16 bit regs.

Quote:
> >******inc edi                 ;next srcindex, unnecessary, screws
> >                              ;output

>Well, from what I can tell, if you inc edi, you'd have an infinite loop, but I

inc edi makes it skip every other pixel, luckily it still doesn't crash
i.e. it still ends at the right place.

Quote:
>comparing EDI below so if you add to it now, you could be accelerating the
>process, thus making it faster as it doesn't have to go through as many loops.

That make sense. I suppose the reason it goes sho much faster with inc
edi is that it only does half as many loops as are necessary.
        Thanks For Your Help!

--
/->John  Woehler<-\       |"Scientists belive that the universe is made
/Georgia Institute\       |of hydrogen, because they claim it's the most
/  of Technology  \       |plentiful ingredient. I claim that the most



Wed, 26 Sep 2001 03:00:00 GMT  
 Can't remove these constructions?
if these lines do not modify anything, why in the next line you do

******xor ah, ah                    ;nothing, a problem
      add bx, ax <--

and access ax register. Therefore, second instruction is dependent on first
one.



Sat, 29 Sep 2001 03:00:00 GMT  
 Can't remove these constructions?

Quote:

> if these lines do not modify anything, why in the next line you do

> ******xor ah, ah                    ;nothing, a problem
>       add bx, ax <--

> and access ax register. Therefore, second instruction is dependent
> on first
> one.

ah is not modified at any time in the loop. It is set to 0 at the
begining of the loop and only al is modified thereafter. ax is used in
the expression that follows because I cannot add bx, al.
        Thanks for you help!

--
/->John  Woehler<-\       |"Scientists belive that the universe is made
/Georgia Institute\       |of hydrogen, because they claim it's the most
/  of Technology  \       |plentiful ingredient. I claim that the most



Sat, 29 Sep 2001 03:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Using CGI module with 'canned queries'

2. It's not bad canned meat...

3. It's not bad canned meat...

4. It's not bad canned meat...

5. It's not bad canned meat...

6. 2nd cfp MPC'98: Mathematics of Program Construction '98

7. 'Compiler Construction' by Niklaus Wirth

8. cfp: Mathematics of Program Construction '98

9. Fast construction of Python list from large C array of int's - Extension Module

10. removing an 'advise'

11. removing 'duplicate' assertions

12. CA Cans VO ?

 

 
Powered by phpBB® Forum Software