PutPixel Optimisation
Author Message PutPixel Optimisation

Does anyone know of any optimizations for this PutPixel routine.
It must use Pascal-style stack ordering.

I am particularly interested in the instruction timings (for Pentium and/or
AMD K6-2), and which pipelines the instruction is executing in. Assume the
first
instruction is in the U (X) pipe, and that no previous instruction has been
executed.

The init procedure is only used to set GS to 0A000h, so I dont have to do it
everytime
I enter the PutPixel procedure.  Im not interested in optimizing that
procedure, just
the PutPixel.

<----------- CODE BEGINS HERE ------------>

CGR_Init PROC NEAR
MOV AX, 0A000h
MOV GS, AX
RETN
CGR_Init ENDP

CGR_PutPixel PROC NEAR
POP SI                 ; SI = Return Address                   : UV    1 U
POP BX                 ; BX = Color                            : UV    1 V
POP AX                 ; AX = Y                                                : UV    1 U
MOV DX, 320            ; CX = 320                              : UV    1 V
MUL DX                 ; AX = AX * 320 = Y * 320                               : NP       11 U
POP DI                 ; DI = X                                                : UV    1 U
ADD DI, AX             ; DI = DI + AX = 320Y+X                 : UV    1 V
MOV GS:[DI], BL                        ; Draw The Pixel
: UV    1 U
CGR_PutPixel ENDP

<----------- CODE ENDS HERE ------------>

Sat, 30 Jun 2001 03:00:00 GMT  PutPixel Optimisation

Quote:
>  MOV DX, 320            ; CX = 320                              : UV    1 V
>  MUL DX                 ; AX = AX * 320 = Y * 320                               : NP

Hint: 256 + 64 = 320  (use SHL instead)

Another way to do it is to make a table of 200 word entries:

0, 320, 640, etc. etc.

and just look up the Y-position and add the x position

--

Sat, 30 Jun 2001 03:00:00 GMT  PutPixel Optimisation

Quote:

>Does anyone know of any optimizations for this PutPixel routine.
>It must use Pascal-style stack ordering.

>I am particularly interested in the instruction timings (for Pentium and/or
>AMD K6-2), and which pipelines the instruction is executing in. Assume the
>first
>instruction is in the U (X) pipe, and that no previous instruction has been
>executed.

>The init procedure is only used to set GS to 0A000h, so I dont have to do
it
>everytime
>I enter the PutPixel procedure.  Im not interested in optimizing that
>procedure, just
>the PutPixel.

><----------- CODE BEGINS HERE ------------>

>CGR_Init PROC NEAR
> MOV AX, 0A000h
> MOV GS, AX
> RETN
>CGR_Init ENDP

>CGR_PutPixel PROC NEAR
> POP SI ; SI = Return Address : UV 1 U
> POP BX ; BX = Color : UV 1 V
> POP AX ; AX = Y                 : UV 1 U
> MOV DX, 320 ; CX = 320 : UV 1 V
> MUL DX ; AX = AX * 320 = Y * 320                 : NP       11 U
> POP DI ; DI = X                 : UV 1 U
> ADD DI, AX ; DI = DI + AX = 320Y+X : UV 1 V
> MOV GS:[DI], BL                 ; Draw The Pixel
>: UV 1 U
>CGR_PutPixel ENDP

><----------- CODE ENDS HERE ------------>

Hello...
You can do some optimisations on the above code.
First of all do not use the MUL instruction... It is slow.
You want to multiply by 320 which is the same with multiplying with (256 +
64) Therefore you can use the SHL instruction... example
For example assume that AX holds X and BX holds Y
MOV CX,BX    ; MOV Y to CX
SHL BX,8         ; In effect multiplys BX by 256 in a fast way
SHL CX,6         ; Multiplys CX by 64 in a fast way
ADD BX,CX     ; BX now holds the value Y*(256+64) = Y*320, and the
calculations have been done in a relatively quick way
Now BX holds the address that you have to write the color.

Also it is faster to use MOV's instead of POP's to get the variables...
I let you the pleasure to do it byyourself....if you can't, let me know, and
I will post the routine....

Hope I helped...

Sat, 30 Jun 2001 03:00:00 GMT  PutPixel Optimisation

Quote:

>  MOV DX, 320            ; CX = 320                              : UV    1 V
>  MUL DX                 ; AX = AX * 320 = Y * 320

by the way .... *320 could be replaced by  (x shl 8) + (x shl 6)
(256+64=320) .....

Sat, 30 Jun 2001 03:00:00 GMT  PutPixel Optimisation

Does anyone know of any optimizations for this PutPixel routine.
It must use Pascal-style stack ordering.

I am particularly interested in the instruction timings (for Pentium and/or
AMD K6-2), and which pipelines the instruction is executing in. Assume the first
instruction is in the U (X) pipe, and that no previous instruction has been executed.

The init procedure is only used to set GS to 0A000h, so I dont have to do it everytime
I enter the PutPixel procedure.  Im not interested in optimizing that procedure, just
the PutPixel.

<----------- CODE BEGINS HERE ------------>

CGR_Init PROC NEAR
MOV AX, 0A000h
MOV GS, AX
RETN
CGR_Init ENDP

CGR_PutPixel PROC NEAR
POP SI                  ; SI = Return Address              : UV    1 U
POP BX                 ; BX = Color                            : UV    1 V
POP AX                 ; AX = Y                                 : UV    1 U
MOV DX, 320         ; CX = 320                              : UV    1 V
MUL DX                 ; AX = AX * 320 = Y * 320        : NP   11 U
POP DI                  ; DI = X                                  : UV    1 U
ADD DI, AX            ; DI = DI + AX = 320Y+X          : UV    1 V
MOV GS:[DI], BL    ; Draw The Pixel                     : UV    1 U
CGR_PutPixel ENDP

<----------- CODE ENDS HERE ------------>

Sat, 30 Jun 2001 03:00:00 GMT  PutPixel Optimisation

Quote:

>CGR_PutPixel PROC NEAR
> POP SI                        ; SI = Return Address   : UV    1 U
> POP BX                        ; BX = Color            : UV    1 V
> POP AX                        ; AX = Y                : UV    1 U
> MOV DX, 320                   ;?CX = 320              : UV    1 V
> MUL DX                        ; AX = AX * 320 = Y*320 : NP   11 U
> POP DI                        ; DI = X                : UV    1 U
> ADD DI, AX                    ; DI = DI + AX = 320Y+X : UV    1 V
> MOV GS:[DI], BL               ; Draw The Pixel        : UV    1 U
>CGR_PutPixel ENDP

Hi,

I'm a beginner at pentium so this is only tentative, but:

(1) isn't the floating multiply actually faster than integer.....?
remember the register length should be the norm for the segment
i.e. 16-bit in 16-bit protected mode [or real mode] and 32-bit in
32-bit protected.

(2) this code can't go much faster as such, because there's
a CRITICAL PATH through it:  you have to load the data before you
multiply the data loaded, before you add to the product, before the
cycle when you use the sum to make the address for the indexed store,
before you execute the indexed store.

What you can do instead is to execute other operations alongside that:
if this is in a loop to output N lots of (X, Y, P) values, you might
do better to have an assembler subroutine to carry out the whole loop.

Quote:
>Hint: 256 + 64 = 320  (use SHL instead)
>Another way to do it is to make a table of 200 word entries:
>0, 320, 640, etc. etc.
>and just look up the Y-position and add the x position

//U.                //V.
mov ax,4[sp];       pop si;    //note critical path in LH column
shl ax,2;           pop bx;    //of succesive ax, then di, computations
shl ax,6;           pop di;
/*pause one step for prefix, would do anyway for address dependency*/
mov GS:[di],bl;     /***/
/***/               jmp si;   //owzat -- 8 cycles !!!!!!!!!

[I think that works --- pop simply doesn't produce dependencies
on SP for following instructions].

In a 32 bit segment you might also try......

mov eax,8[esp];      pop esi; //set eax up here 2stop address depndncy
/***/                pop ebx; //             pop  pixel EBX
lea eax,[eax*4+eax]; pop ecx; //mul eax * 5, junk the value in ECX
shl eax,6;           pop edi; //nul eax *64, pop X value in EDI
/*pause one step for prefix,  would do anyway for address dependency*/
mov GS:[edi+eax],bl; /***/
/***/                jmp esi;

--

Sun, 01 Jul 2001 03:00:00 GMT  PutPixel Optimisation

I am aware of the SHL 8, SHL 6 method, but on my AMD K6-2, it is slower than
using
a MUL.

Sun, 01 Jul 2001 03:00:00 GMT  PutPixel Optimisation

Quote:
>Does anyone know of any optimizations for this PutPixel routine.
>It must use Pascal-style stack ordering.
>I am particularly interested in the instruction timings (for Pentium and/or
>AMD K6-2), and which pipelines the instruction is executing in. Assume the
>first
>instruction is in the U (X) pipe, and that no previous instruction has been
>executed.

The timings for the code which do the actual putpixel do not compare
with the timings needed to setup the call to your routine.
You can get much better performance using in-line code (and I'm
talking factors here, not percentages!!!) rather than a procedure:

mem[\$A000:x + (y shl 8) + (y shl 6)] := colour;

Of course, for other primitives (lines, circles) you use dedicated
routines (ie NOT using a PutPixel routine) to prevent the
(x,y) -> address calculations for every pixel.

Herman

Mon, 02 Jul 2001 03:00:00 GMT

 Page 1 of 1 [ 8 post ]

Relevant Pages
 2. putpixel 3. putpixel 7. putpixel