PutPixel Optimisation 
Author Message
 PutPixel Optimisation

Does anyone know of any optimizations for this PutPixel routine.
It must use Pascal-style stack ordering.

I am particularly interested in the instruction timings (for Pentium and/or
AMD K6-2), and which pipelines the instruction is executing in. Assume the
first
instruction is in the U (X) pipe, and that no previous instruction has been
executed.

The init procedure is only used to set GS to 0A000h, so I dont have to do it
everytime
I enter the PutPixel procedure.  Im not interested in optimizing that
procedure, just
the PutPixel.


<----------- CODE BEGINS HERE ------------>

CGR_Init PROC NEAR
 MOV AX, 0A000h
 MOV GS, AX
 RETN
CGR_Init ENDP

CGR_PutPixel PROC NEAR
 POP SI                 ; SI = Return Address                   : UV    1 U
 POP BX                 ; BX = Color                            : UV    1 V
 POP AX                 ; AX = Y                                                : UV    1 U
 MOV DX, 320            ; CX = 320                              : UV    1 V
 MUL DX                 ; AX = AX * 320 = Y * 320                               : NP       11 U
 POP DI                 ; DI = X                                                : UV    1 U
 ADD DI, AX             ; DI = DI + AX = 320Y+X                 : UV    1 V
 MOV GS:[DI], BL                        ; Draw The Pixel
: UV    1 U
 JMP SI                 ; Return To Program                                     : PV    2 V
CGR_PutPixel ENDP

<----------- CODE ENDS HERE ------------>



Sat, 30 Jun 2001 03:00:00 GMT  
 PutPixel Optimisation


Quote:
>  MOV DX, 320            ; CX = 320                              : UV    1 V
>  MUL DX                 ; AX = AX * 320 = Y * 320                               : NP      

Hint: 256 + 64 = 320  (use SHL instead)

Another way to do it is to make a table of 200 word entries:

0, 320, 640, etc. etc.

and just look up the Y-position and add the x position

--





Sat, 30 Jun 2001 03:00:00 GMT  
 PutPixel Optimisation

Quote:

>Does anyone know of any optimizations for this PutPixel routine.
>It must use Pascal-style stack ordering.

>I am particularly interested in the instruction timings (for Pentium and/or
>AMD K6-2), and which pipelines the instruction is executing in. Assume the
>first
>instruction is in the U (X) pipe, and that no previous instruction has been
>executed.

>The init procedure is only used to set GS to 0A000h, so I dont have to do
it
>everytime
>I enter the PutPixel procedure.  Im not interested in optimizing that
>procedure, just
>the PutPixel.


><----------- CODE BEGINS HERE ------------>

>CGR_Init PROC NEAR
> MOV AX, 0A000h
> MOV GS, AX
> RETN
>CGR_Init ENDP

>CGR_PutPixel PROC NEAR
> POP SI ; SI = Return Address : UV 1 U
> POP BX ; BX = Color : UV 1 V
> POP AX ; AX = Y                 : UV 1 U
> MOV DX, 320 ; CX = 320 : UV 1 V
> MUL DX ; AX = AX * 320 = Y * 320                 : NP       11 U
> POP DI ; DI = X                 : UV 1 U
> ADD DI, AX ; DI = DI + AX = 320Y+X : UV 1 V
> MOV GS:[DI], BL                 ; Draw The Pixel
>: UV 1 U
> JMP SI ; Return To Program                 : PV 2 V
>CGR_PutPixel ENDP

><----------- CODE ENDS HERE ------------>

Hello...
You can do some optimisations on the above code.
First of all do not use the MUL instruction... It is slow.
You want to multiply by 320 which is the same with multiplying with (256 +
64) Therefore you can use the SHL instruction... example
For example assume that AX holds X and BX holds Y
MOV CX,BX    ; MOV Y to CX
SHL BX,8         ; In effect multiplys BX by 256 in a fast way
SHL CX,6         ; Multiplys CX by 64 in a fast way
ADD BX,CX     ; BX now holds the value Y*(256+64) = Y*320, and the
calculations have been done in a relatively quick way
ADD BX,AX      ; ADD X to BX
Now BX holds the address that you have to write the color.

Also it is faster to use MOV's instead of POP's to get the variables...
I let you the pleasure to do it byyourself....if you can't, let me know, and
I will post the routine....

Hope I helped...



Sat, 30 Jun 2001 03:00:00 GMT  
 PutPixel Optimisation

Quote:

>  MOV DX, 320            ; CX = 320                              : UV    1 V
>  MUL DX                 ; AX = AX * 320 = Y * 320  

by the way .... *320 could be replaced by  (x shl 8) + (x shl 6)
(256+64=320) .....


Sat, 30 Jun 2001 03:00:00 GMT  
 PutPixel Optimisation

Does anyone know of any optimizations for this PutPixel routine.
It must use Pascal-style stack ordering.

I am particularly interested in the instruction timings (for Pentium and/or
AMD K6-2), and which pipelines the instruction is executing in. Assume the first
instruction is in the U (X) pipe, and that no previous instruction has been executed.

The init procedure is only used to set GS to 0A000h, so I dont have to do it everytime
I enter the PutPixel procedure.  Im not interested in optimizing that procedure, just
the PutPixel.


<----------- CODE BEGINS HERE ------------>

CGR_Init PROC NEAR
 MOV AX, 0A000h
 MOV GS, AX
 RETN
CGR_Init ENDP

CGR_PutPixel PROC NEAR
 POP SI                  ; SI = Return Address              : UV    1 U
 POP BX                 ; BX = Color                            : UV    1 V
 POP AX                 ; AX = Y                                 : UV    1 U
 MOV DX, 320         ; CX = 320                              : UV    1 V
 MUL DX                 ; AX = AX * 320 = Y * 320        : NP   11 U
 POP DI                  ; DI = X                                  : UV    1 U
 ADD DI, AX            ; DI = DI + AX = 320Y+X          : UV    1 V
 MOV GS:[DI], BL    ; Draw The Pixel                     : UV    1 U
 JMP SI                   ; Return To Program               : PV    2 V
CGR_PutPixel ENDP

<----------- CODE ENDS HERE ------------>



Sat, 30 Jun 2001 03:00:00 GMT  
 PutPixel Optimisation

Quote:

>CGR_PutPixel PROC NEAR
> POP SI                        ; SI = Return Address   : UV    1 U
> POP BX                        ; BX = Color            : UV    1 V
> POP AX                        ; AX = Y                : UV    1 U
> MOV DX, 320                   ;?CX = 320              : UV    1 V
> MUL DX                        ; AX = AX * 320 = Y*320 : NP   11 U
> POP DI                        ; DI = X                : UV    1 U
> ADD DI, AX                    ; DI = DI + AX = 320Y+X : UV    1 V
> MOV GS:[DI], BL               ; Draw The Pixel        : UV    1 U
> JMP SI                        ; Return To Program     : PV    2 V
>CGR_PutPixel ENDP

 Hi,

 I'm a beginner at pentium so this is only tentative, but:

 (1) isn't the floating multiply actually faster than integer.....?
 remember the register length should be the norm for the segment
 i.e. 16-bit in 16-bit protected mode [or real mode] and 32-bit in
 32-bit protected.

 (2) this code can't go much faster as such, because there's
 a CRITICAL PATH through it:  you have to load the data before you
 multiply the data loaded, before you add to the product, before the
 cycle when you use the sum to make the address for the indexed store,
 before you execute the indexed store.

 What you can do instead is to execute other operations alongside that:
 if this is in a loop to output N lots of (X, Y, P) values, you might
 do better to have an assembler subroutine to carry out the whole loop.



Quote:
>Hint: 256 + 64 = 320  (use SHL instead)
>Another way to do it is to make a table of 200 word entries:
>0, 320, 640, etc. etc.
>and just look up the Y-position and add the x position

 <Hits self on head!!!>

 //U.                //V.
 mov ax,4[sp];       pop si;    //note critical path in LH column
 shl ax,2;           pop bx;    //of succesive ax, then di, computations
 add ax,0[sp];       pop cx;
 shl ax,6;           pop di;
 add di,ax;          /***/
 /*pause one step for prefix, would do anyway for address dependency*/
 mov GS:[di],bl;     /***/
 /***/               jmp si;   //owzat -- 8 cycles !!!!!!!!!

 [I think that works --- pop simply doesn't produce dependencies
 on SP for following instructions].  

 In a 32 bit segment you might also try......

 mov eax,8[esp];      pop esi; //set eax up here 2stop address depndncy
 /***/                pop ebx; //             pop  pixel EBX
 lea eax,[eax*4+eax]; pop ecx; //mul eax * 5, junk the value in ECX
 shl eax,6;           pop edi; //nul eax *64, pop X value in EDI
 /*pause one step for prefix,  would do anyway for address dependency*/
 mov GS:[edi+eax],bl; /***/
 /***/                jmp esi;    

--




Sun, 01 Jul 2001 03:00:00 GMT  
 PutPixel Optimisation
Thank you everyone for your reply's.

I am aware of the SHL 8, SHL 6 method, but on my AMD K6-2, it is slower than
using
a MUL.



Sun, 01 Jul 2001 03:00:00 GMT  
 PutPixel Optimisation

Quote:
>Does anyone know of any optimizations for this PutPixel routine.
>It must use Pascal-style stack ordering.
>I am particularly interested in the instruction timings (for Pentium and/or
>AMD K6-2), and which pipelines the instruction is executing in. Assume the
>first
>instruction is in the U (X) pipe, and that no previous instruction has been
>executed.

The timings for the code which do the actual putpixel do not compare
with the timings needed to setup the call to your routine.
You can get much better performance using in-line code (and I'm
talking factors here, not percentages!!!) rather than a procedure:

  mem[$A000:x + (y shl 8) + (y shl 6)] := colour;

Of course, for other primitives (lines, circles) you use dedicated
routines (ie NOT using a PutPixel routine) to prevent the
(x,y) -> address calculations for every pixel.

Herman



Mon, 02 Jul 2001 03:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. PutPixel Optimisation

2. putpixel

3. putpixel

4. PutPixel in PMODE

5. PutPixel 320*200: how can i do it?

6. vesa Putpixel proc?

7. putpixel

8. MSVC CHOKES on this ASM putpixel routine!

9. help with true color putpixel - Targa_co.cpp

10. PutPixel Routine

11. need a FAST putpixel routine

12. Fast putpixel procdedure

 

 
Powered by phpBB® Forum Software