fastest putpixel
Author Message
fastest putpixel

well thats right I looking for a faster way to putpixel... right now my
putpixel is around 60-70 clock ticks... but I heard it can go down to 40 or
some.

So if you have that code... lend me a hand.

Duan Nguyen

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

On Tue, 16 Sep 1997 18:07:15 -0700, "Duan Nguyen aka IC"

Quote:

> well thats right I looking for a faster way to putpixel... right now my
>putpixel is around 60-70 clock ticks... but I heard it can go down to 40 or
>some.

>So if you have that code... lend me a hand.

>Duan Nguyen

I know of two procedures. I understand the first one but I think the
second one is faster, even though I dont quite understand it. I have
no idea of the clock ticks. Im pretty sure the second one is the
fastest way possible (I took it from denthors graphics tutors). oh
yeah make sure you have 286 instructions enabled for the second one.

(the first one)
CONST VGA = \$a000;
Procedure MEMPutpixel (X,Y : Integer; Col : Byte);
{ This puts a pixel on the screen by writing directly to memory. }
BEGIN
Mem [VGA:X+(Y*320)]:=Col;
END;

(the second one)
Procedure Putpixel (X,Y : Integer; Col : Byte; where:word); assembler;
{ This puts a pixel on the screen by writing directly to memory. }
Asm
mov     ax,[where]
mov     es,ax
mov     bx,[X]
mov     dx,[Y]
mov     di,bx
mov     bx, dx                  {; bx = dx}
shl     dx, 8
shl     bx, 6
add     dx, bx                  {; dx = dx + bx (ie y*320)}
add     di, dx                  {; finalise location}
mov     al, [Col]
stosb
End;

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:
>  well thats right I looking for a faster way to putpixel... right now my
> putpixel is around 60-70 clock ticks... but I heard it can go down to 40
or
> some.

> So if you have that code... lend me a hand.

This is a specific routine to put a pixel in 320X200X256:

Var
Scrn    : Array[0..199,0..319] of Byte Absolute \$a000:0000;

Procedure Put(x, y : Integer; c : Byte);
begin
Scrn[Y,X] := C;
end;

I don't know whether it's faster than yours. Try it out.

--
Brian Pedersen, System Specialist, Alta Copenhagen
http://www.alta.dk
Personal homepage:
http://home6.inet.tele.dk/brianp

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

=>>  well thats right I looking for a faster way to putpixel... right now my
=>> putpixel is around 60-70 clock ticks... but I heard it can go down to 40
=>or
=>> some.
=>>
=>> So if you have that code... lend me a hand.
=>
=>This is a specific routine to put a pixel in 320X200X256:
=>
=>Var
=>  Scrn    : Array[0..199,0..319] of Byte Absolute \$a000:0000;
=>
=>Procedure Put(x, y : Integer; c : Byte);
=>begin
=>  Scrn[Y,X] := C;
=>end;

it's faster if you don't make it a procedure at all.

just insert the code into your program the procedure call takes 60-80%
of the time of the procedure.

Bevyn.

--
disclaimer

not only did i not post this i have never seen any of these letters before in my life.

these opinions represent no one least of all myself.

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:

> On Tue, 16 Sep 1997 18:07:15 -0700, "Duan Nguyen aka IC"

> > well thats right I looking for a faster way to putpixel... right now my
> >putpixel is around 60-70 clock ticks... but I heard it can go down to 40 or
> >some.

> >So if you have that code... lend me a hand.

> >Duan Nguyen
> I know of two procedures. I understand the first one but I think the
> second one is faster, even though I dont quite understand it. I have
> no idea of the clock ticks. Im pretty sure the second one is the
> fastest way possible (I took it from denthors graphics tutors). oh
> yeah make sure you have 286 instructions enabled for the second one.

> (the first one)
> CONST VGA = \$a000;
> Procedure MEMPutpixel (X,Y : Integer; Col : Byte);
>   { This puts a pixel on the screen by writing directly to memory. }
> BEGIN
>   Mem [VGA:X+(Y*320)]:=Col;
> END;

> (the second one)
> Procedure Putpixel (X,Y : Integer; Col : Byte; where:word); assembler;
>   { This puts a pixel on the screen by writing directly to memory. }
> Asm
>   mov     ax,[where]
>   mov     es,ax
>   mov     bx,[X]
>   mov     dx,[Y]
>   mov     di,bx
>   mov     bx, dx                  {; bx = dx}
>   shl     dx, 8
>   shl     bx, 6
>   add     dx, bx                  {; dx = dx + bx (ie y*320)}
>   add     di, dx                  {; finalise location}
>   mov     al, [Col]
>   stosb
> End;

The second one is close. But there are wasted instructions in there.

Asm
mov ax,[where]           ; Prepare the es register
mov es,ax                ; with the proper memory segment
mov dx,[Y]               ; dx = [Y] : temporary holding area
mov di,dx                ; di = [Y] * 1
shl di,2                 ; di = [Y] * 4
add di,dx                ; di = [Y] * 5
shl di,6                 ; di = [Y] * 320
add di,[X]               ; di = [Y] * 320 + [X]
mov al,[Col]             ; al = color of the pixel
mov es:[di],al           ; put the pixel in the proper location
End;

With that said... plotting pixel by pixel is not the fastest way to get
things done.

If you back up a step and look at the bigger "picture" you may be able
to computer the memory locations less often.

In line drawing you need to computer the full memory location only once.

Quote:
>From there you just need to add or subtract from the previos location 1

of 8 offsets.

Up_Left    = -321
Up         = -320
Up_Right   = -319
Left       = -1
Right      =  1
Down_Left  = 319
Down       = 320
Down_Right = 321

Using this plus other "tricks" can dramatically increase line drawing
speeds.

Simular tricks can be used for triangle filling etc.

mykey

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:

> On Tue, 16 Sep 1997 18:07:15 -0700, "Duan Nguyen aka IC"

> > well thats right I looking for a faster way to putpixel... right now my
> >putpixel is around 60-70 clock ticks... but I heard it can go down to 40 or
> >some.

> >So if you have that code... lend me a hand.

> >Duan Nguyen
> I know of two procedures. I understand the first one but I think the
> second one is faster, even though I dont quite understand it. I have
> no idea of the clock ticks. Im pretty sure the second one is the
> fastest way possible (I took it from denthors graphics tutors). oh
> yeah make sure you have 286 instructions enabled for the second one.

> (the first one)
> CONST VGA = \$a000;
> Procedure MEMPutpixel (X,Y : Integer; Col : Byte);
>   { This puts a pixel on the screen by writing directly to memory. }
> BEGIN
>   Mem [VGA:X+(Y*320)]:=Col;
> END;

Well, it works :) Oh, and no point in "Col: byte", since it will push
a word to the stack anyway, so just make it an integer or word whatever

- Show quoted text -

Quote:
> (the second one)
> Procedure Putpixel (X,Y : Integer; Col : Byte; where:word); assembler;
>   { This puts a pixel on the screen by writing directly to memory. }
> Asm
>   mov     ax,[where]
>   mov     es,ax
>   mov     bx,[X]
>   mov     dx,[Y]
>   mov     di,bx
>   mov     bx, dx                  {; bx = dx}
>   shl     dx, 8
>   shl     bx, 6
>   add     dx, bx                  {; dx = dx + bx (ie y*320)}
>   add     di, dx                  {; finalise location}
>   mov     al, [Col]
>   stosb
> End;

Well, if we "break down" 320, we get 256 + 64, so,
Y * 320 = Y * 256 + Y * 64

256 and 64 can easily be translated into shifts, since 256 is 8 bit, and 64 is 6 bit...

oh, and perhaps this variant is faster... It req. that you don't use FS anywhere else
in the program, and should work in _most_ cases, although, it's a kinda cheat so
don't count on it TOO much... (i haven't had a problem with it yet though)

call this somewhere near the begining of the program, before you put the pixels

procedure InitPutPixel(where: word); assembler;
asm
mov ax, [where]
db  \$8e,\$e0 {mov fs, ax}
(* you can also use GS instead:
db  \$8e,\$e8 {mov gs, ax} *)
end;

Procedure Putpixel (X,Y: Integer; Col:word); assembler;
{ This puts a pixel on the screen by writing directly to memory. }
Asm
mov     dx,[Y]
mov     di,[X]
mov     bx, dx                  {; bx = dx}
shl     dx, 8
shl     bx, 6
add     di, bx                  {; dx = dx + bx (ie y*320)}
add     di, dx                  {; finalise location}
mov     ax, [Col]
db \$64; mov [di], al { mov fs:[di],al  use 65 instead for GS }
End;

Oh, and this req. 386 processor or better...
Haven't done any pairing either...

Anyway, don't use putpixel to fill polygons and stuff...

If anyone care, my 3d engine is 99% turbo Pascal (the routine that
blasts the double buffer to screen is the only asm routine) is still
(relatively) very fast...

http://home.sol.no/~bheid/Programming%Files/

--
- Asbj?rn

http://home.sol.no/~bheid/

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:
>Subject: fastest putpixel

>Date: Tue, 16 Sep 1997 18:07:15 -0700

> well thats right I looking for a faster way to putpixel... right now my
>putpixel is around 60-70 clock ticks... but I heard it can go down to 40 or
>some.

>So if you have that code... lend me a hand.

I reviewed the few code posted and find a slightly faster one:
(Once again this is *solely* for mode 13h)

mov ax, A000h
mov es, ax
mov bh, byte ptr [Y]
mov ah, bh
shl  ax, 2    ;note: replace this by two shl ax, 1 on the 286 to save
;3 cycles
mov al, [Col]
mov es:[bx], al

To me, savings of 20 cycles from 60 downto 40 is hardly significant.  Even
on a 25 Mhz 386 processor (slow by today's standards), this saving only
makes a difference of a second only if repeated 1,250,000 times, and the
mode 13h screen only has 64000 bytes!

If you ARE repeatedly doing putpixel, you may wanted to consider the following:

1. in 286+ processors putting a word to memory takes the same time as
putting a byte; in 386+ processors putting a doubleword to memory takes the
same time as putting a word/byte.  So it is faster to put two/four
consecutive pixels at once than each separately-- in fact *twice*/*four*
times as fast.
2. putting a row of pixels on the screen only requires you to calculate
the address once per row because consecutive pixels can be taken care of
using the REP STOS/MOVS instruction.

And finally, the 60/40-cycle figures given are probably only ideal cases;
in most case alignments and the speed of RAM/VRAM/data bus etc. can further
slow things down.

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

I wrote earlier:

Quote:
>mov ax, A000h
>mov es, ax
>mov bh, byte ptr [Y]
>mov ah, bh
>shl  ax, 2    ;note: replace this by two shl ax, 1 on the 286 to save
>                   ;3 cycles
>mov al, [Col]
>mov es:[bx], al

Once again I goofed up.  The correct code is as follows:
mov ax, A000h
mov es, ax
mov ah, byte ptr [Y]  ; *not* bh,
mov di, ax  ;  *not* ah, bh
shr  di, 2    ; note *shr*, *not* shl ax, 2
add di, ax  ; *not* bx, ax
add di, [x] ;  *not* bx, [x]
mov al, [Col]
mov es:[di], al  ;  *not*  es:[bx]

In case anyone was wondering, this is faster than the code other people
had posted so far only because it eliminated one shl instruction through
the clever use of byte-size registers....not a significant timesaver at
all...

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:

> I wrote earlier:
> >mov ax, A000h
> >mov es, ax
> >mov bh, byte ptr [Y]
> >mov ah, bh
> >shl  ax, 2    ;note: replace this by two shl ax, 1 on the 286 to save
> >                   ;3 cycles
> >mov al, [Col]
> >mov es:[bx], al

> Once again I goofed up.  The correct code is as follows:
> mov ax, A000h
> mov es, ax
> mov ah, byte ptr [Y]  ; *not* bh,
> mov di, ax  ;  *not* ah, bh
> shr  di, 2    ; note *shr*, *not* shl ax, 2
> add di, ax  ; *not* bx, ax
> add di, [x] ;  *not* bx, [x]
> mov al, [Col]
> mov es:[di], al  ;  *not*  es:[bx]

> In case anyone was wondering, this is faster than the code other people
> had posted so far only because it eliminated one shl instruction through
> the clever use of byte-size registers....not a significant timesaver at
> all...

I haven't really been following this thread, but one thing does jump out
at me....why are you constantly reloading ES?  If this is a routine
which is called only every so often such that ES gets changed (by Turbo
Pascal), then why bother shaving a few cycles here and there?  And if
it's not, why reload every time?  What you could do, on a 386+ processor
anyway, is to use FS or GS.  Granted, you will have to manually code the
opcode for MOV FS, AX and the segment override FS: because BASM doesn't
understand them, but you *know* Turbo Pascal will *never* change these

--

(If you are a human, then you can figure out my real address.)

Come see me at my web site:
http://www.geocities.com/SiliconValley/Pines/9447

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:

>I wrote earlier:
[snip]
>Once again I goofed up.  The correct code is as follows:
>mov ax, A000h
>mov es, ax
>mov ah, byte ptr [Y]  ; *not* bh,
>mov di, ax  ;  *not* ah, bh
>shr  di, 2    ; note *shr*, *not* shl ax, 2
>add di, ax  ; *not* bx, ax
>add di, [x] ;  *not* bx, [x]
>mov al, [Col]
>mov es:[di], al  ;  *not*  es:[bx]

>In case anyone was wondering, this is faster than the code other people
>had posted so far only because it eliminated one shl instruction through
>the clever use of byte-size registers....not a significant timesaver at
>all...

While the quest for the fastest putpixel is silly, I'll add a few
entries/ideas to the pot.  First, why use shifts at all?  One could
easily use a LUT (LookUpTable).  It would only use 640 bytes of memory
and be faster than using integer math.

So:

procedure putpixel(x,y:word;colour:byte); assembler;
asm
mov     es,sega000
mov     bx,y
mov     di,x
mov     al,colour
mov     es:[di],al
end;

Secondly, if you really want to use shifts, you can do either of the
following:

; ebx=y, eax=x, cl=colour, ES=SegA000

lea     ebx,[ebx+ebx*4]
shl     ebx,6
mov     es:[ebx+eax],cl

or

; eax=x, ebx=y, cl=colour, ES=SegA000

shl     ebx,6
mov     es:[eax+4*ebx],cl

I'll leave it as an exercise for the reader to convert them into BASM or
inline() statements. :)

--Mark Iuzzolino

http://www.monstersoft.com
The MonsterSoft Email Verification & Intercept Layer (MS_EVIL) is active
and awaiting spam.

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:

> I haven't really been following this thread, but one thing does jump out
> at me....why are you constantly reloading ES?  If this is a routine
> which is called only every so often such that ES gets changed (by Turbo
> Pascal), then why bother shaving a few cycles here and there?  And if
> it's not, why reload every time?  What you could do, on a 386+ processor
> anyway, is to use FS or GS.  Granted, you will have to manually code the
> opcode for MOV FS, AX and the segment override FS: because BASM doesn't
> understand them, but you *know* Turbo Pascal will *never* change these

> --

> (If you are a human, then you can figure out my real address.)

> Come see me at my web site:
> http://www.geocities.com/SiliconValley/Pines/9447

Just because the compiler doesn't change the FS, GS registers doesn't
mean nothing else (i.e. Drivers, TSRs) makes the same mistake you are
suggesting and assumes their values are reliable.

Until you can get into the flat model segment "slowdowns" are a fact of
life.

mykey

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:

> Just because the compiler doesn't change the FS, GS registers doesn't
> mean nothing else (i.e. Drivers, TSRs) makes the same mistake you are
> suggesting and assumes their values are reliable.

> Until you can get into the flat model segment "slowdowns" are a fact of
> life.

> mykey

There are two ways a TSR or driver can get control of the system: 1) via
an interrupt or 2) by being called directly.  In the first case, they
should **NEVER** change *ANY* register, as this would cause the system
to become unstable.  (How would you like to set up a string move
instruction with CX = \$0100, and suddenly have CX set to \$FFFF?)  In the
second case, this would have to be taken into consideration, but you
would at least know when such a thing was going to happen and provide
for that.  Besides, I've never seen a TSR or driver yet (not that there
ISN'T one, mind you) that used either FS or GS.  Therefore, your
statement is invalid and mine still stands.

--

(If you are a human, then you can figure out my real address.)

Come see me at my web site:
http://www.geocities.com/SiliconValley/Pines/9447

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:

> > Just because the compiler doesn't change the FS, GS registers doesn't
> > mean nothing else (i.e. Drivers, TSRs) makes the same mistake you are
> > suggesting and assumes their values are reliable.

> > Until you can get into the flat model segment "slowdowns" are a fact of
> > life.

> > mykey

> There are two ways a TSR or driver can get control of the system: 1) via
> an interrupt or 2) by being called directly.  In the first case, they
> should **NEVER** change *ANY* register, as this would cause the system
> to become unstable.  (How would you like to set up a string move
> instruction with CX = \$0100, and suddenly have CX set to \$FFFF?)  In the
> second case, this would have to be taken into consideration, but you
> would at least know when such a thing was going to happen and provide
> for that.  Besides, I've never seen a TSR or driver yet (not that there
> ISN'T one, mind you) that used either FS or GS.  Therefore, your
> statement is invalid and mine still stands.

> --

> (If you are a human, then you can figure out my real address.)

> Come see me at my web site:
> http://www.geocities.com/SiliconValley/Pines/9447

You used the S word...

Nothing SHOULD change these things, but does that mean it never happens?

You place a value in the FS or GS and hope it's still there next time.
That's a hard bug to track down.

Much like using the EAX, some unknown code could be clobbering the upper
half of EAX without you knowing it.

Slowly but surely these violators will get eradicated, but they aren't
extinct yet.

You haven't seen.. therfore my statement is invalid????

My statement may in fact be invalid but not because of your emperical
evidence.

mykey

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:
><snip>
>I haven't really been following this thread, but one thing does jump out
>at me....why are you constantly reloading ES?  If this is a routine
>which is called only every so often such that ES gets changed (by Turbo
>Pascal), then why bother shaving a few cycles here and there?

I made that point myself in my e-mail (not the one you've read but the one
I posted earlier with the incorrect asm code.  The person who started this
chain of message was looking for some way to get a 60-cycles putpixel
downto 40-cycles.  In fact, under even a 25 MHz 386 processor, the 20-cycle
saving will make a difference of one second only if putpixel is called
1,250,000 times, and the screen itself is only 64,000 bytes!  I also
mentioned that moving individual pixels is *very* inefficient.

Quote:
>And if
>it's not, why reload every time?  What you could do, on a 386+ processor
>anyway, is to use FS or GS.  Granted, you will have to manually code the
>opcode for MOV FS, AX and the segment override FS: because BASM doesn't
>understand them, but you *know* Turbo Pascal will *never* change these
>segment registers.

That's an interesting idea.

Quote:

Well...here are the timings for MOV reg, immed and
MOV segreg, reg:

MOV reg, immed                      MOV segreg, reg
286:  2  (cycles)                           286: 2  [PM: 17]
386:  2                                           386: 2 [PM: 18]
486:  1                                           486: 3 [PM: 9]

So unless you're in protect mode it's not that slow after all.

Wed, 18 Jun 1902 08:00:00 GMT
fastest putpixel

Quote:
>While the quest for the fastest putpixel is silly,

I agree absolutely. I posted earlier here a follow-up with the incorrect
asm code and have mentioned how little the saving is.  But then again,
earlier someone in the NG was searching for the fastest way to clear the
screen without using CRT unit...

Quote:
>entries/ideas to the pot.  First, why use shifts at all?  One could
>easily use a LUT (LookUpTable).  It would only use 640 bytes of memory
>and be faster than using integer math.

>So:

>procedure putpixel(x,y:word;colour:byte); assembler;
>asm
>    mov     es,sega000

Since when can you load a value directly into the segment registers? ;-)
Quote:
>    mov     bx,y
>    mov     di,x
>    mov     al,colour
>    mov     es:[di],al
>end;

memory, which can sometimes slow things down slightly.  (Notice how many
"slightly"s I'm using in the sentence.)

Quote:
>Secondly, if you really want to use shifts, you can do either of the
>following:

>; ebx=y, eax=x, cl=colour, ES=SegA000

>    lea     ebx,[ebx+ebx*4]
>    shl     ebx,6
>    mov     es:[ebx+eax],cl

>or

>; eax=x, ebx=y, cl=colour, ES=SegA000

>    shl     ebx,6