Fast data fill 
Author Message
 Fast data fill

Hi all,

I have been experimenting with some code to fast fill memory with the same
value (FillChar) and I came up with
the following algorithm (the code is in Delphi's inline assembler. Great
stuff):

procedure FillChar2(var X; Count: Integer; Value: Byte);
asm
{     ->EAX     Pointer to destination  }
{       EDX     count   }
{       CL      value   }
{ Pre: EDX has to be > 0.}

        PUSH    EDI
        PUSH    EBX
        MOV     EDI,EAX { Point EDI to destination              }
        MOV     CH,CL   { Fill EAX with value repeated 4 times  }
        MOV     EAX,ECX
        SHL     EAX,16
        MOV     AX,CX
        MOV     ECX,EDX  { Count in ECX }

      AND     EBX,3





      MOV  [EDI],AL
      INC  EDI
      DEC  ECX

      MOV  [EDI],AL
      INC  EDI
      DEC  ECX

      MOV  [EDI],AL
      INC  EDI
      DEC  ECX

      MOV  EDX,ECX
      SAR  ECX,2

      REP  STOSD      { You can use the above optimazation for this one too!
(unrolling the loop) }
      MOV  ECX,EDX
      MOV  EBX,EDX
      AND  ECX,3


        POP     EBX
        POP     EDI
end;

Comments:
1. I really like Vector.Pointer lookup. Too bad it takes so much space.
2. The pre and post loop are there to make sure the destination data is 4
byte aligned.
3. Does anybody know a simpeler way to get the same speed? I want to use
this code in the very code of my
SVGA lib so I need to make sure I'm not missing something.



Fri, 29 Nov 2002 03:00:00 GMT  
 Fast data fill


Quote:
> I have been experimenting with some code to fast fill memory
> with the same value (FillChar) and I came up with the following
> algorithm (the code is in Delphi's inline assembler. Great
> stuff):

> procedure FillChar2(var X; Count: Integer; Value: Byte);
> asm
> {     ->EAX     Pointer to destination  }
> {       EDX     count   }
> {       CL      value   }
> { Pre: EDX has to be > 0.}

>         PUSH    EDI
>         PUSH    EBX
>         MOV     EDI,EAX { Point EDI to destination              }
>         MOV     CH,CL   { Fill EAX with value repeated 4 times  }
>         MOV     EAX,ECX
>         SHL     EAX,16
>         MOV     AX,CX
>         MOV     ECX,EDX  { Count in ECX }
>         MOV     EBX,EDI

>       AND     EBX,3







>       MOV  [EDI],AL
>       INC  EDI
>       DEC  ECX


>       MOV  [EDI],AL
>       INC  EDI
>       DEC  ECX


>       MOV  [EDI],AL
>       INC  EDI
>       DEC  ECX


>       MOV  EDX,ECX
>       SAR  ECX,2

>       REP  STOSD      { You can use the above optimazation for this one too!
> (unrolling the loop) }
>       MOV  ECX,EDX
>       MOV  EBX,EDX
>       AND  ECX,3



>         POP     EBX
>         POP     EDI
> end;

> Comments:
> 1. I really like Vector.Pointer lookup. Too bad it takes so much space.
> 2. The pre and post loop are there to make sure the destination data is 4
> byte aligned.
> 3. Does anybody know a simpeler way to get the same speed? I want to use
> this code in the very code of my SVGA lib so I need to make sure I'm not
> missing something.

The routine is overly complex.  No extra registers are needed.
I would call it with the address in EDI, count in ECX, and data
in AL.  I would rewrite it as follows (unverified).

procedure FillChar2(var X; Count: Integer; Value: Byte);
asm
{    ->EDI     Pointer to destination  }
{      ECX     count   }
{      AL      value   }
{ Pre: ECX has to be > 0.}

       TEST  EDI,1    ;does destination have an odd byte?

       STOSB
       DEC   ECX

       TEST  EDI,2    ;does destination have an odd word?

       STOSB
       DEC   ECX

       STOSB
       DEC   ECX

       MOV   AH,AL    ;copy AL to all bytes of EAX...
       PUSH  AX
       SHL   EAX,16
       POP   AX

       PUSH  CX
       SHR   ECX,2    ;number of full dwords
       REP   STOSD
       POP   CX
       AND   ECX,3    ;remaining odd bytes if any

Hope this helps.

--
Richard Pavlicek
Web site: http://www.gate.net/~pavlicek



Fri, 29 Nov 2002 03:00:00 GMT  
 Fast data fill


Quote:
> I have been experimenting with some code to fast fill memory
> with the same value (FillChar) and I came up with the following
> algorithm (the code is in Delphi's inline assembler. Great
> stuff):

> procedure FillChar2(var X; Count: Integer; Value: Byte);
> asm
> {     ->EAX     Pointer to destination  }
> {       EDX     count   }
> {       CL      value   }
> { Pre: EDX has to be > 0.}

>         PUSH    EDI
>         PUSH    EBX
>         MOV     EDI,EAX { Point EDI to destination              }
>         MOV     CH,CL   { Fill EAX with value repeated 4 times  }
>         MOV     EAX,ECX
>         SHL     EAX,16
>         MOV     AX,CX
>         MOV     ECX,EDX  { Count in ECX }
>         MOV     EBX,EDI

>       AND     EBX,3







>       MOV  [EDI],AL
>       INC  EDI
>       DEC  ECX


>       MOV  [EDI],AL
>       INC  EDI
>       DEC  ECX


>       MOV  [EDI],AL
>       INC  EDI
>       DEC  ECX


>       MOV  EDX,ECX
>       SAR  ECX,2

>       REP  STOSD      { You can use the above optimazation for this one too!
> (unrolling the loop) }
>       MOV  ECX,EDX
>       MOV  EBX,EDX
>       AND  ECX,3



>         POP     EBX
>         POP     EDI
> end;

> Comments:
> 1. I really like Vector.Pointer lookup. Too bad it takes so much space.
> 2. The pre and post loop are there to make sure the destination data is 4
> byte aligned.
> 3. Does anybody know a simpeler way to get the same speed? I want to use
> this code in the very code of my SVGA lib so I need to make sure I'm not
> missing something.

The routine is overly complex.  No extra registers are needed.
I would call it with the address in EDI, count in ECX, and data
in AL.  I would rewrite it as follows (unverified).

procedure FillChar2(var X; Count: Integer; Value: Byte);
asm
{    ->EDI     Pointer to destination  }
{      ECX     count   }
{      AL      value   }
{ Pre: ECX has to be > 0.}

       TEST  EDI,1    ;does destination have an odd byte?

       STOSB
       DEC   ECX

       TEST  EDI,2    ;does destination have an odd word?

       STOSB
       DEC   ECX

       STOSB
       DEC   ECX

       MOV   AH,AL    ;copy AL to all bytes of EAX...
       PUSH  AX
       SHL   EAX,16
       POP   AX

       PUSH  CX
       SHR   ECX,2    ;number of full dwords
       REP   STOSD
       POP   CX
       AND   ECX,3    ;remaining odd bytes if any

Hope this helps.

--
Richard Pavlicek
Web site: http://www.gate.net/~pavlicek



Fri, 29 Nov 2002 03:00:00 GMT  
 Fast data fill
I like the changes you have made to the beginning of the code (the TESTs).
The code looks much better.
But about the REP STOSB in the end:
Isn't the REP slow if you have to copy only 3 bytes max? I thought the
overhead
of REP was too much, and therefore made the complex byte-fill in the
beginning.
Quote:
> The routine is overly complex.  No extra registers are needed.
> I would call it with the address in EDI, count in ECX, and data
> in AL.  I would rewrite it as follows (unverified).

> procedure FillChar2(var X; Count: Integer; Value: Byte);
> asm
> {    ->EDI     Pointer to destination  }
> {      ECX     count   }
> {      AL      value   }
> { Pre: ECX has to be > 0.}

>        TEST  EDI,1    ;does destination have an odd byte?

>        STOSB
>        DEC   ECX


>        TEST  EDI,2    ;does destination have an odd word?

>        STOSB
>        DEC   ECX

>        STOSB
>        DEC   ECX


>        MOV   AH,AL    ;copy AL to all bytes of EAX...
>        PUSH  AX
>        SHL   EAX,16
>        POP   AX

>        PUSH  CX
>        SHR   ECX,2    ;number of full dwords
>        REP   STOSD
>        POP   CX
>        AND   ECX,3    ;remaining odd bytes if any
>        REP   STOSB




Sat, 30 Nov 2002 03:00:00 GMT  
 Fast data fill

asked for a small piece of information...

Quote:
> Isn't the REP slow if you have to copy only 3 bytes max? I thought the
> overhead of REP was too much, and therefore made the complex byte-fill
> in the beginning.

REP has overhead. But a jump table has more because it is poorly
predicted; I'd do a TEST/JZ/MOV:

 TEST EDI,1

 MOV [EDI],AL

 TEST EDI,2

 MOV [EDI],AX

 ...
 POP ECX
 TEST ECX,2

 MOV [EDI],AX

 TEST ECX,1


--
|_  _  _ __
|_)(_)| ) ,'
-------- '-._

--
Posted from relay2.inwind.it [212.141.53.73]
via Mailgate.ORG Server - http://www.Mailgate.ORG



Sat, 30 Nov 2002 03:00:00 GMT  
 Fast data fill

Quote:
> I like the changes you have made to the beginning of the code (the TESTs).
> The code looks much better.
> But about the REP STOSB in the end:
> Isn't the REP slow if you have to copy only 3 bytes max? I thought the
> overhead of REP was too much, and therefore made the complex byte-fill
> in the beginning.

Yes, the speed could be increased by replacing the final
REP STOSB with other code, but the difference would be
negligible considering that most memory fills are thousands
of bytes. Also note that when the odd-byte residue is zero
(the typical case) the overhead of the REP prefix is minimal.
Code SIZE is also a consideration, and it didn't seem to be
worth the expansion.

Also, I am wondering why you even bother to consider odd-byte
cases for memory fills. In all good programming, buffers are
dword aligned, and fills can be in whole dwords (even if it
means filling a few unneeded bytes).  For memory fills, I
have never needed anything but:

       MOV   EDI,offset BUFFER
       MOV   ECX,COUNT
       MOV   EAX,VALUE
       REP   STOSD

--
Richard Pavlicek
Web site: http://www.gate.net/~pavlicek



Sat, 30 Nov 2002 03:00:00 GMT  
 Fast data fill

Hi,

Quote:
>Also, I am wondering why you even bother to consider odd-byte
>cases for memory fills. In all good programming, buffers are
>dword aligned, and fills can be in whole dwords (even if it
>means filling a few unneeded bytes).

This routine is meant for my SVGA routines and when you write to
the display, it can be anywhere. So the odd-byte case will be noticed.
Further, I don't really mind the code size. Speed is much more important
for this core routine.

Greetings, Ferns / Gerrit.



Sun, 01 Dec 2002 03:00:00 GMT  
 Fast data fill
Hi,

Quote:
>Also, I am wondering why you even bother to consider odd-byte
>cases for memory fills. In all good programming, buffers are
>dword aligned, and fills can be in whole dwords (even if it
>means filling a few unneeded bytes).

This routine is meant for my SVGA routines and when you write to
the display, it can be anywhere. So the odd-byte case will be noticed.
Further, I don't really mind the code size. Speed is much more important
for this core routine.

Greetings, Ferns / Gerrit.



Sun, 01 Dec 2002 03:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Fastest way to fill data...

2. Ms SQL fileerror 22001=String data, right truncation when I fill data into file

3. Fast filled rectangles in mode 12h...

4. Fast memory fill

5. Fast filled polygons

6. Can I fill Browse box with queue data ?

7. Imaging - scanning images and auto-filling data

8. Filling listbox with data from Text or Excel file

9. Continuously building an array filled up with data coming from a dbl line

10. Horizontal Fill Slide question (two data points)

11. Using PHP to fill data on a PDF

12. need ideas to build gui form for data filling

 

 
Powered by phpBB® Forum Software