breaking the 64K data segment 
Author Message
 breaking the 64K data segment


: I am working on some graphics routines for regular VGA.  In order to speed
: up the graphics routines, I have decided to buffer the graphics.  This
: improves scrolling routines because copying the buffer into video memory
: requires only one access into video memory.

Buffering the graphics is not really done to speed up the graphics, it is
done to reduce the amount of flicker (either by doing a block copy of the
buffer, or by flipping between multiple pages).

: The problem I have come to is that the resolution I am using, 320x200 256
: colors, requires a 64K data segment by itself.

Mode 13h is notoriously bad for scrollers, because it does not support
multiple pages and you can't have virtual screen sizes larger than the
visible screen (both really handy for scrollers).

: I am however afraid that I am loosing to many clock cycles in the
: plotting routines because the data to be plotted is on one data segment
: while the buffer is in the other.
[ couple lines of deletion ]

It shouldn't matter which segment your buffer is in (rep movsw should be
the same speed regardless of where the source segment is).  Have you timed
your code to find out what is really the slow part?

: And thus the question, can I create segments larger than 64K?  If so, how?

On a normal 8086 you can't create segments larger than 64K.

Get thee X mode!  X mode allows you to access 4 times as much video memory,
meaning that in 320x200 mode, you can have 4 different video pages (no
memory is used outside the 64K video segment).  The basic principle is
that you can draw to the pages that are not on the screen and then simply
poke a couple values into VGA registers to flip the page.  Also, you can
do other neat tricks (split screen, smooth scrolling).

For a good intro to X mode, check out Xlib, a free X mode graphics library
deal.  Xlib is probably available at x2ftp.oulu.fi.  Alternatively, you
could get a book that discusses it (Abrash's new book "Zen of Graphics
Programming" for example-- totally awesome book!).

Jason P. Hoerner

(Grey Cat/  CORE)



Tue, 12 Aug 1997 04:25:03 GMT  
 breaking the 64K data segment

Quote:


>: I am working on some graphics routines for regular VGA.  In order to speed
>: up the graphics routines, I have decided to buffer the graphics.  This
>: improves scrolling routines because copying the buffer into video memory
>: requires only one access into video memory.

>Buffering the graphics is not really done to speed up the graphics, it is
>done to reduce the amount of flicker (either by doing a block copy of the
>buffer, or by flipping between multiple pages).

Actually, it is done for both. Depending on how the scrolling is done
it is sometimes faster to use double buffering.

Quote:

>: The problem I have come to is that the resolution I am using, 320x200 256
>: colors, requires a 64K data segment by itself.

>Mode 13h is notoriously bad for scrollers, because it does not support
>multiple pages and you can't have virtual screen sizes larger than the
>visible screen (both really handy for scrollers).

Actually, it is possible (and pretty easy) to get 2 pages out of mode 13h.
The problem with this is that it is video memory, and so you are losing
some speed for memory. Personally I'd rather have the speed at the expense
for the memory.

Quote:

>: I am however afraid that I am loosing to many clock cycles in the
>: plotting routines because the data to be plotted is on one data segment
>: while the buffer is in the other.

>Get thee X mode!  X mode allows you to access 4 times as much video memory,
>meaning that in 320x200 mode, you can have 4 different video pages (no
>memory is used outside the 64K video segment).  The basic principle is
>that you can draw to the pages that are not on the screen and then simply
>poke a couple values into VGA registers to flip the page.  Also, you can
>do other neat tricks (split screen, smooth scrolling).

On the benchmarks I have done, it is actually faster to draw a screen from
memory to mode 13h, than it is to draw a screen to modex.  Modex is very
nice if the pixels you want to draw are all of the same color (ie a fast
fill-poly routine).

Quote:
>Alternatively, you
>could get a book that discusses it (Abrash's new book "Zen of Graphics
>Programming" for example-- totally awesome book!).

Yes, Zen of Graphics Programming is an awesome book, but still lacks in some
areas. One of these is drawing sprites without using zeroes. M. Abrash still
uses masking to get sprites to appear transparent. He also does it a byte
at a time, which could be improved a lot if he didn't use modex (by masking
doublewords, words, and then bytes). I have seen some very impressive
parallax scrolling done with masking (and in under a 100 lines of assembly),
but it uses twice as much memory and is slower than the method I use (you
have the image and it's mask in memory, plus you are still doing a lot
of unneccesary movs).

Quote:

>Jason P. Hoerner

>(Grey Cat/  CORE)

TCA of NewOrder


Tue, 12 Aug 1997 18:35:17 GMT  
 breaking the 64K data segment


Quote:



>>: I am working on some graphics routines for regular VGA.  In order to speed
>>: up the graphics routines, I have decided to buffer the graphics.  This
>>: improves scrolling routines because copying the buffer into video memory
>>: requires only one access into video memory.

>>Buffering the graphics is not really done to speed up the graphics, it is
>>done to reduce the amount of flicker (either by doing a block copy of the
>>buffer, or by flipping between multiple pages).

>Actually, it is done for both. Depending on how the scrolling is done
>it is sometimes faster to use double buffering.

>>: The problem I have come to is that the resolution I am using, 320x200 256
>>: colors, requires a 64K data segment by itself.

>>Mode 13h is notoriously bad for scrollers, because it does not support
>>multiple pages and you can't have virtual screen sizes larger than the
>>visible screen (both really handy for scrollers).

>Actually, it is possible (and pretty easy) to get 2 pages out of mode 13h.
>The problem with this is that it is video memory, and so you are losing
>some speed for memory. Personally I'd rather have the speed at the expense
>for the memory.

>>: I am however afraid that I am loosing to many clock cycles in the
>>: plotting routines because the data to be plotted is on one data segment
>>: while the buffer is in the other.

>>Get thee X mode!  X mode allows you to access 4 times as much video memory,
>>meaning that in 320x200 mode, you can have 4 different video pages (no
>>memory is used outside the 64K video segment).  The basic principle is
>>that you can draw to the pages that are not on the screen and then simply
>>poke a couple values into VGA registers to flip the page.  Also, you can
>>do other neat tricks (split screen, smooth scrolling).

>On the benchmarks I have done, it is actually faster to draw a screen from
>memory to mode 13h, than it is to draw a screen to modex.  Modex is very
>nice if the pixels you want to draw are all of the same color (ie a fast
>fill-poly routine).

>>Alternatively, you
>>could get a book that discusses it (Abrash's new book "Zen of Graphics
>>Programming" for example-- totally awesome book!).

>Yes, Zen of Graphics Programming is an awesome book, but still lacks in some
>areas. One of these is drawing sprites without using zeroes. M. Abrash still
>uses masking to get sprites to appear transparent. He also does it a byte
>at a time, which could be improved a lot if he didn't use modex (by masking
>doublewords, words, and then bytes). I have seen some very impressive
>parallax scrolling done with masking (and in under a 100 lines of assembly),
>but it uses twice as much memory and is slower than the method I use (you
>have the image and it's mask in memory, plus you are still doing a lot
>of unneccesary movs).

The approach I prefer is to "compile" objects into machine code which expects
DI to point to the start of the object on-screen, AL to contain the bitplane
mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain
the pre-set-up pointer to the bitplane select register.

Then the code for an object like this:

|   |   |   |
ABBBA   ABBBA
ABCCBA ABCCBA
DDDDDDDDDDDDD

[the |'s being there for visual aid only]
would look like this:

  out dx,al
  mov [word di   ],0A0Ah
  mov [word di+ 2],0A0Ah
  mov [word di+80],0B0Ah
  mov [word di+82],0A0Bh
  add di,160
  mov [word di   ],0D0Dh
  mov [word di+ 2],0D0Dh
  rl  al
  adc di,65536-160
  mov [byte di   ],0Bh
  mov [byte di+ 2],0Bh
  mov [word di+80],0A0Bh
  mov [byte di+ 2],0Ch
  add di,160
  mov [word di   ],0D0Dh
  mov [byte di   ],0Dh
  rl  al
  adc di,65536-160

etc.

The objects become rather large in memory, but on a 386 without a code cache,
the above code runs efficiently, at an average rate for large shapes fo well
under 1 instruction per pixel.
--
-------------------------------------------------------------------------------

 John Payson         |   Un animal si beau qu'un chat."          |  ( o o )



Thu, 14 Aug 1997 02:12:16 GMT  
 breaking the 64K data segment

Quote:

>>Yes, Zen of Graphics Programming is an awesome book, but still lacks in some
>>areas. One of these is drawing sprites without using zeroes. M. Abrash still
>>uses masking to get sprites to appear transparent. He also does it a byte
>>at a time, which could be improved a lot if he didn't use modex (by masking
>>doublewords, words, and then bytes). I have seen some very impressive
>>parallax scrolling done with masking (and in under a 100 lines of assembly),
>>but it uses twice as much memory and is slower than the method I use (you
>>have the image and it's mask in memory, plus you are still doing a lot
>>of unneccesary movs).

>The approach I prefer is to "compile" objects into machine code which expects
>DI to point to the start of the object on-screen, AL to contain the bitplane
>mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain
>the pre-set-up pointer to the bitplane select register.

I have seen this method before. Ugh. This method is as bad as checking for
zeros. Changing the bitplanes takes up far too much time for this method to
be practical. And the space it takes up is a waste.
[code deleted]

Quote:
>The objects become rather large in memory, but on a 386 without a code cache,
>the above code runs efficiently, at an average rate for large shapes fo well
>under 1 instruction per pixel.

Well, that's nice that it is an average rate of under 1 instruction per
pixel.  It's a shame, though, that there aren't many instructions that are
1 byte. So, how do you handle images that take up more than 64K of space?

Quote:
>-------------------------------------------------------------------------------

> John Payson         |   Un animal si beau qu'un chat."          |  ( o o )

TCA of NewOrder


Thu, 14 Aug 1997 08:00:34 GMT  
 breaking the 64K data segment


Quote:

>>The approach I prefer is to "compile" objects into machine code which expects
>>DI to point to the start of the object on-screen, AL to contain the bitplane
>>mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain
>>the pre-set-up pointer to the bitplane select register.

>I have seen this method before. Ugh. This method is as bad as checking for
>zeros. Changing the bitplanes takes up far too much time for this method to
>be practical. And the space it takes up is a waste.

It is only necessary to change bitplanes four times for each object plotted.
I can't imagine any way one could do better in X mode.  As for space, the
average is probably around 3 bytes/pixel.  A bit bloated, but not horribly.

Quote:
>>The objects become rather large in memory, but on a 386 without a code cache,
>>the above code runs efficiently, at an average rate for large shapes fo well
>>under 1 instruction per pixel.

>Well, that's nice that it is an average rate of under 1 instruction per
>pixel.  It's a shame, though, that there aren't many instructions that are
>1 byte. So, how do you handle images that take up more than 64K of space?

First off, the average number of bytes storing the image probably works out
to 3 bytes/pixel.  Less if one optimizes the compiled code [note word values
that get used more than once, or bytes that get used more than twice, and
keep them in registers].  Again, it's a bit bloated, but only 50% worse than
using masks, and much faster than using explicit check-for-zero.  As for
really big objects (20,000 pixels is pretty big) those are generally best
handled as a collection of smaller objects.  In particular, use fast mem to
mem moves for the large solid sections of the objects, and "compiled objects"
to fill in the edges etc.

Also, it's a time-space tradeoff that comes out differently on different PCs.
For example, if SI is set up to some useful data, it's more compact to say:

movsw
inc si
inc si
movsw

[total: 8 bytes; 4 bytes code, 4 bytes of external data]

than to say:
mov [di],1234h
mov [di+4],5678h

[total: 9 bytes]

but the latter will be faster on, I think, anything 286 or later.
--
-------------------------------------------------------------------------------

 John Payson         |   Un animal si beau qu'un chat."          |  ( o o )



Thu, 14 Aug 1997 10:19:20 GMT  
 breaking the 64K data segment

Quote:




>>>The approach I prefer is to "compile" objects into machine code which expects
>>>DI to point to the start of the object on-screen, AL to contain the bitplane
>>>mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain
>>>the pre-set-up pointer to the bitplane select register.

>>I have seen this method before. Ugh. This method is as bad as checking for
>>zeros. Changing the bitplanes takes up far too much time for this method to
>>be practical. And the space it takes up is a waste.

>It is only necessary to change bitplanes four times for each object plotted.
>I can't imagine any way one could do better in X mode.  As for space, the
>average is probably around 3 bytes/pixel.  A bit bloated, but not horribly.

Ok, a fair bit better than what I envisioned.  Although there are STILL better
ways of doing it, even in ModeX, at half the space. (In my original post I
said that my friend used masking, but it was slower and took up more memory
than the method I use).

Quote:

>First off, the average number of bytes storing the image probably works out
>to 3 bytes/pixel.  Less if one optimizes the compiled code [note word values

3 bytes/pixel? One byte per pixel should be enough! Think of all the development
time wasted on compiling these bitmaps and trying to figure out how to optimize
them.

Quote:
>that get used more than once, or bytes that get used more than twice, and
>keep them in registers].  Again, it's a bit bloated, but only 50% worse than
>using masks, and much faster than using explicit check-for-zero.  As for

Yes, I will agree that you are moving in the right direction, but there
is still (at least) one more method for doing this.

[rest deleted]

Quote:

> John Payson         |   Un animal si beau qu'un chat."          |  ( o o )

TCA of NewOrder
And thus spake the master programmer:
Though a program be but three lines long, someday it will have to be maintained.


Thu, 14 Aug 1997 20:01:17 GMT  
 breaking the 64K data segment


Quote:
>>First off, the average number of bytes storing the image probably works out
>>to 3 bytes/pixel.  Less if one optimizes the compiled code [note word values

>3 bytes/pixel? One byte per pixel should be enough! Think of all the development
>time wasted on compiling these bitmaps and trying to figure out how to optimize
>them.

The optimization may be done by the same program that compiles the bitmaps.
Given that most bitmaps used in games are run through several programs anyway
(e.g. to combine many bitmaps into one file, to remap colors, etc.) adding one
more program to the mix doesn't really cost much.

Quote:

>>that get used more than once, or bytes that get used more than twice, and
>>keep them in registers].  Again, it's a bit bloated, but only 50% worse than
>>using masks, and much faster than using explicit check-for-zero.  As for

>Yes, I will agree that you are moving in the right direction, but there
>is still (at least) one more method for doing this.

What method do you prefer, that manages an average of less than one
instruction per pixel plotted, and no instructions for pixels not plotted?
--
-------------------------------------------------------------------------------

 John Payson         |   Un animal si beau qu'un chat."          |  ( o o )


Fri, 15 Aug 1997 02:30:32 GMT  
 breaking the 64K data segment

Quote:


>>Yes, I will agree that you are moving in the right direction, but there
>>is still (at least) one more method for doing this.

>What method do you prefer, that manages an average of less than one
>instruction per pixel plotted, and no instructions for pixels not plotted?

Well, I'm very tempted to tell, but because I have never seen anyone use
this method I'd rather not (just yet).  What I will say is that I convert
my bitmaps from bitmaps to something else that knows where the zeroes are.
The size of the bitmaps usually decrease, except in the case of a checkboard.
Then they get 5 times larger.  There is no compression involved. There is
no checking for zeroes (except beforehand).

Quote:

> John Payson         |   Un animal si beau qu'un chat."          |  ( o o )

TCA of NewOrder


Fri, 15 Aug 1997 04:18:53 GMT  
 breaking the 64K data segment


Quote:
>Well, I'm very tempted to tell, but because I have never seen anyone use
>this method I'd rather not (just yet).  What I will say is that I convert
>my bitmaps from bitmaps to something else that knows where the zeroes are.
>The size of the bitmaps usually decrease, except in the case of a checkboard.
>Then they get 5 times larger.  There is no compression involved. There is
>no checking for zeroes (except beforehand).

Ah, so you use the good ol' trick:

draw:
        xor     cx,cx
        add     di,[si]
        mov     cl,[si]
        add     si,3
dl_start:
        rep movsb
        add     di,[si]
        add     si,3
        add     cl,[si-1]
        jnz     dl_start

[or some variation which allows for movsw]
Not a bad trick, depending upon how solid the object in question is.  I was
under the impression that mov immed. to memory was as fast as a rep mov on
machines 386+ (though not on 386sx), but perhaps I'm mistaken.
--
-------------------------------------------------------------------------------

 John Payson         |   Un animal si beau qu'un chat."          |  ( o o )



Fri, 15 Aug 1997 05:40:37 GMT  
 breaking the 64K data segment
Quote:


>Ah, so you use the good ol' trick:

>draw:
>    xor     cx,cx
>    add     di,[si]
>    mov     cl,[si]

Should this be mov cl,[si+2]?

Quote:
>    add     si,3
>dl_start:
>    rep movsb
>    add     di,[si]
>    add     si,3
>    add     cl,[si-1]
>    jnz     dl_start

You're getting closer. Since this is similar enough, I'll post mine.

        lds     si,BlitSprites[ebx*4]
        lodsw  

        mov     cx,ds:[si+2]
        add     si,4
        shr     cx,1
        pushf
        shr     cx,1
        rep     movsd





That moves doublewords (if possible) at a time, without zeroes, without
comparing, without compiled bitmaps. Lemme see, 17 instructions, 160K of
sprite data....that's about 9411 pixels per instruction. Much better than 3
bytes per pixel. I've been using this method for about two years, and have
never seen anyone else use it (Although I suspect that Accolade uses it in
Starcon).  Even Tran compares for zeroes in Timeless.  I'm using this stuff
in a parallax scrolling engine that get pretty good (past 70fps on my
486-33LB) frame rates.  This really helps drawing the foreground, since I
am only drawing what I need to, not comparing 64000 pixels for zero.  Getting
Blits to clip, however, is a nightmare.

--TCA of NewOrder



Fri, 15 Aug 1997 06:54:58 GMT  
 breaking the 64K data segment


[code deleted]

Quote:
>That moves doublewords (if possible) at a time, without zeroes, without
>comparing, without compiled bitmaps. Lemme see, 17 instructions, 160K of
>sprite data....that's about 9411 pixels per instruction. Much better than 3
>bytes per pixel. I've been using this method for about two years, and have
>never seen anyone else use it (Although I suspect that Accolade uses it in
>Starcon).  Even Tran compares for zeroes in Timeless.  I'm using this stuff
>in a parallax scrolling engine that get pretty good (past 70fps on my
>486-33LB) frame rates.  This really helps drawing the foreground, since I
>am only drawing what I need to, not comparing 64000 pixels for zero.  Getting
>Blits to clip, however, is a nightmare.

First off, I'd suggest you not use pushf/popf, as that can cause excessive
overhead when in v86 mode (stupid design decision on Intel's part, but such
is life...)

Secondly, your code can probably be improved (at the expense of redoing the
preprocessing of your objects) as follows:

        ; Assumes: DS:SI points to object list
        ; Assumes: ES:DI points to start of destination on-screen

        ; First, process all runs which begin on even pixels and end on odd
        ; pixels
        mov  cx,[si]
        jcxz evenodd_done
evenodd_loop:
        add  di,[si+2]
        add  si,4
        rep  movsw
        add  cx,[si]
        jnz  evenodd_loop
evenodd_done:
        ; Next, process all runs beginning and ending on even pixels
        add  cx,[si+2]
        js   eveneven_done
eveneven_loop:
        add  di,[si+4]
        add  si,6
        dec  cx
        rep  movsw
        mov  al,[si]
        mov  [es:di],al
        add  cx,[si+2]
        jns  eveneven_loop
eveneven_done:
        ; Next, process all runs starting on odd, ending on even

etc.

  This routine assumes the display list is organized as four lists of items.
The first list is for objects which consist only of whole words; each object
contains the number of words of data and the ammount to offset DI before
plotting these words, followed by an appropriate list of data.  The data
ends when the count of objects == 0.  The next list is for objects which
end with an "extra" pixel; these objects end when the count value == 8000h.

  Note that if di is initially even and the display list "add" values
for di are always even, words will always be written to the screen aligned.
On many display adapters, this can pay off bigtime.  In the event that the
sprite has to be able to move horizontally, it may be worthwhile to have
two copies of the sprite, offset by a pixel [trick going back to Apple II
games].

--
-------------------------------------------------------------------------------

 John Payson         |   Un animal si beau qu'un chat."          |  ( o o )



Fri, 15 Aug 1997 08:27:19 GMT  
 breaking the 64K data segment


Quote:


>         movsb  


> That moves doublewords (if possible) at a time, without zeroes, without
> comparing, without compiled bitmaps. Lemme see, 17 instructions, 160K of

             ^^^^^^^^^^^^^^^^^^^^^^^^^

Huh ? What Raw bitmaps have line mods in them ?
And besides that this is just a slightly modified version of bytrun
encoding in iff's which E.A. Started in ye olde amiga days.....

--
Robert J. Hill            The opinions in this letter are not nescessarily
                          the same as my employers , and are entirely my
                          OWN. Blame Me. Everyone else does.

          '''             PGP2.6 Public Encryption Key Available By Request
         (o o)        



Sat, 16 Aug 1997 20:14:51 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. Another problem with the 64K boundary on data segments

2. 64K limit for data segments

3. Automatic data segment plus heap exceed 64K

4. MS fortran 5: huge arrays and the 64K limit for data segments

5. data segment vs code segment

6. Group or Segment Class Exceeds 64k

7. Scanning a whole 64k segment

8. Fixups with >64k segments

9. ERROR segment text exceeds 64K

10. Help: gp-relative segments together exceed 64k bytes

11. Segment Sizes larger than 64K?

12. Segment Size Exceeds 64K

 

 
Powered by phpBB® Forum Software