breaking the 64K data segment
Author |
Message |
Jason Hoern #1 / 13
|
 breaking the 64K data segment
: I am working on some graphics routines for regular VGA. In order to speed : up the graphics routines, I have decided to buffer the graphics. This : improves scrolling routines because copying the buffer into video memory : requires only one access into video memory. Buffering the graphics is not really done to speed up the graphics, it is done to reduce the amount of flicker (either by doing a block copy of the buffer, or by flipping between multiple pages). : The problem I have come to is that the resolution I am using, 320x200 256 : colors, requires a 64K data segment by itself. Mode 13h is notoriously bad for scrollers, because it does not support multiple pages and you can't have virtual screen sizes larger than the visible screen (both really handy for scrollers). : I am however afraid that I am loosing to many clock cycles in the : plotting routines because the data to be plotted is on one data segment : while the buffer is in the other. [ couple lines of deletion ] It shouldn't matter which segment your buffer is in (rep movsw should be the same speed regardless of where the source segment is). Have you timed your code to find out what is really the slow part? : And thus the question, can I create segments larger than 64K? If so, how? On a normal 8086 you can't create segments larger than 64K. Get thee X mode! X mode allows you to access 4 times as much video memory, meaning that in 320x200 mode, you can have 4 different video pages (no memory is used outside the 64K video segment). The basic principle is that you can draw to the pages that are not on the screen and then simply poke a couple values into VGA registers to flip the page. Also, you can do other neat tricks (split screen, smooth scrolling). For a good intro to X mode, check out Xlib, a free X mode graphics library deal. Xlib is probably available at x2ftp.oulu.fi. Alternatively, you could get a book that discusses it (Abrash's new book "Zen of Graphics Programming" for example-- totally awesome book!). Jason P. Hoerner
(Grey Cat/ CORE)
|
Tue, 12 Aug 1997 04:25:03 GMT |
|
 |
NewOrder Demo Gro #2 / 13
|
 breaking the 64K data segment
Quote:
>: I am working on some graphics routines for regular VGA. In order to speed >: up the graphics routines, I have decided to buffer the graphics. This >: improves scrolling routines because copying the buffer into video memory >: requires only one access into video memory. >Buffering the graphics is not really done to speed up the graphics, it is >done to reduce the amount of flicker (either by doing a block copy of the >buffer, or by flipping between multiple pages).
Actually, it is done for both. Depending on how the scrolling is done it is sometimes faster to use double buffering. Quote: >: The problem I have come to is that the resolution I am using, 320x200 256 >: colors, requires a 64K data segment by itself. >Mode 13h is notoriously bad for scrollers, because it does not support >multiple pages and you can't have virtual screen sizes larger than the >visible screen (both really handy for scrollers).
Actually, it is possible (and pretty easy) to get 2 pages out of mode 13h. The problem with this is that it is video memory, and so you are losing some speed for memory. Personally I'd rather have the speed at the expense for the memory. Quote: >: I am however afraid that I am loosing to many clock cycles in the >: plotting routines because the data to be plotted is on one data segment >: while the buffer is in the other. >Get thee X mode! X mode allows you to access 4 times as much video memory, >meaning that in 320x200 mode, you can have 4 different video pages (no >memory is used outside the 64K video segment). The basic principle is >that you can draw to the pages that are not on the screen and then simply >poke a couple values into VGA registers to flip the page. Also, you can >do other neat tricks (split screen, smooth scrolling).
On the benchmarks I have done, it is actually faster to draw a screen from memory to mode 13h, than it is to draw a screen to modex. Modex is very nice if the pixels you want to draw are all of the same color (ie a fast fill-poly routine). Quote: >Alternatively, you >could get a book that discusses it (Abrash's new book "Zen of Graphics >Programming" for example-- totally awesome book!).
Yes, Zen of Graphics Programming is an awesome book, but still lacks in some areas. One of these is drawing sprites without using zeroes. M. Abrash still uses masking to get sprites to appear transparent. He also does it a byte at a time, which could be improved a lot if he didn't use modex (by masking doublewords, words, and then bytes). I have seen some very impressive parallax scrolling done with masking (and in under a 100 lines of assembly), but it uses twice as much memory and is slower than the method I use (you have the image and it's mask in memory, plus you are still doing a lot of unneccesary movs). Quote: >Jason P. Hoerner
>(Grey Cat/ CORE)
TCA of NewOrder
|
Tue, 12 Aug 1997 18:35:17 GMT |
|
 |
John Pays #3 / 13
|
 breaking the 64K data segment
Quote:
>>: I am working on some graphics routines for regular VGA. In order to speed >>: up the graphics routines, I have decided to buffer the graphics. This >>: improves scrolling routines because copying the buffer into video memory >>: requires only one access into video memory. >>Buffering the graphics is not really done to speed up the graphics, it is >>done to reduce the amount of flicker (either by doing a block copy of the >>buffer, or by flipping between multiple pages). >Actually, it is done for both. Depending on how the scrolling is done >it is sometimes faster to use double buffering. >>: The problem I have come to is that the resolution I am using, 320x200 256 >>: colors, requires a 64K data segment by itself. >>Mode 13h is notoriously bad for scrollers, because it does not support >>multiple pages and you can't have virtual screen sizes larger than the >>visible screen (both really handy for scrollers). >Actually, it is possible (and pretty easy) to get 2 pages out of mode 13h. >The problem with this is that it is video memory, and so you are losing >some speed for memory. Personally I'd rather have the speed at the expense >for the memory. >>: I am however afraid that I am loosing to many clock cycles in the >>: plotting routines because the data to be plotted is on one data segment >>: while the buffer is in the other. >>Get thee X mode! X mode allows you to access 4 times as much video memory, >>meaning that in 320x200 mode, you can have 4 different video pages (no >>memory is used outside the 64K video segment). The basic principle is >>that you can draw to the pages that are not on the screen and then simply >>poke a couple values into VGA registers to flip the page. Also, you can >>do other neat tricks (split screen, smooth scrolling). >On the benchmarks I have done, it is actually faster to draw a screen from >memory to mode 13h, than it is to draw a screen to modex. Modex is very >nice if the pixels you want to draw are all of the same color (ie a fast >fill-poly routine). >>Alternatively, you >>could get a book that discusses it (Abrash's new book "Zen of Graphics >>Programming" for example-- totally awesome book!). >Yes, Zen of Graphics Programming is an awesome book, but still lacks in some >areas. One of these is drawing sprites without using zeroes. M. Abrash still >uses masking to get sprites to appear transparent. He also does it a byte >at a time, which could be improved a lot if he didn't use modex (by masking >doublewords, words, and then bytes). I have seen some very impressive >parallax scrolling done with masking (and in under a 100 lines of assembly), >but it uses twice as much memory and is slower than the method I use (you >have the image and it's mask in memory, plus you are still doing a lot >of unneccesary movs).
The approach I prefer is to "compile" objects into machine code which expects DI to point to the start of the object on-screen, AL to contain the bitplane mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain the pre-set-up pointer to the bitplane select register. Then the code for an object like this: | | | | ABBBA ABBBA ABCCBA ABCCBA DDDDDDDDDDDDD [the |'s being there for visual aid only] would look like this: out dx,al mov [word di ],0A0Ah mov [word di+ 2],0A0Ah mov [word di+80],0B0Ah mov [word di+82],0A0Bh add di,160 mov [word di ],0D0Dh mov [word di+ 2],0D0Dh rl al adc di,65536-160 mov [byte di ],0Bh mov [byte di+ 2],0Bh mov [word di+80],0A0Bh mov [byte di+ 2],0Ch add di,160 mov [word di ],0D0Dh mov [byte di ],0Dh rl al adc di,65536-160 etc. The objects become rather large in memory, but on a 386 without a code cache, the above code runs efficiently, at an average rate for large shapes fo well under 1 instruction per pixel. -- -------------------------------------------------------------------------------
John Payson | Un animal si beau qu'un chat." | ( o o )
|
Thu, 14 Aug 1997 02:12:16 GMT |
|
 |
NewOrder Demo Gro #4 / 13
|
 breaking the 64K data segment
Quote: >>Yes, Zen of Graphics Programming is an awesome book, but still lacks in some >>areas. One of these is drawing sprites without using zeroes. M. Abrash still >>uses masking to get sprites to appear transparent. He also does it a byte >>at a time, which could be improved a lot if he didn't use modex (by masking >>doublewords, words, and then bytes). I have seen some very impressive >>parallax scrolling done with masking (and in under a 100 lines of assembly), >>but it uses twice as much memory and is slower than the method I use (you >>have the image and it's mask in memory, plus you are still doing a lot >>of unneccesary movs). >The approach I prefer is to "compile" objects into machine code which expects >DI to point to the start of the object on-screen, AL to contain the bitplane >mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain >the pre-set-up pointer to the bitplane select register.
I have seen this method before. Ugh. This method is as bad as checking for zeros. Changing the bitplanes takes up far too much time for this method to be practical. And the space it takes up is a waste. [code deleted] Quote: >The objects become rather large in memory, but on a 386 without a code cache, >the above code runs efficiently, at an average rate for large shapes fo well >under 1 instruction per pixel.
Well, that's nice that it is an average rate of under 1 instruction per pixel. It's a shame, though, that there aren't many instructions that are 1 byte. So, how do you handle images that take up more than 64K of space? Quote: >-------------------------------------------------------------------------------
> John Payson | Un animal si beau qu'un chat." | ( o o )
TCA of NewOrder
|
Thu, 14 Aug 1997 08:00:34 GMT |
|
 |
John Pays #5 / 13
|
 breaking the 64K data segment
Quote:
>>The approach I prefer is to "compile" objects into machine code which expects >>DI to point to the start of the object on-screen, AL to contain the bitplane >>mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain >>the pre-set-up pointer to the bitplane select register. >I have seen this method before. Ugh. This method is as bad as checking for >zeros. Changing the bitplanes takes up far too much time for this method to >be practical. And the space it takes up is a waste.
It is only necessary to change bitplanes four times for each object plotted. I can't imagine any way one could do better in X mode. As for space, the average is probably around 3 bytes/pixel. A bit bloated, but not horribly. Quote: >>The objects become rather large in memory, but on a 386 without a code cache, >>the above code runs efficiently, at an average rate for large shapes fo well >>under 1 instruction per pixel. >Well, that's nice that it is an average rate of under 1 instruction per >pixel. It's a shame, though, that there aren't many instructions that are >1 byte. So, how do you handle images that take up more than 64K of space?
First off, the average number of bytes storing the image probably works out to 3 bytes/pixel. Less if one optimizes the compiled code [note word values that get used more than once, or bytes that get used more than twice, and keep them in registers]. Again, it's a bit bloated, but only 50% worse than using masks, and much faster than using explicit check-for-zero. As for really big objects (20,000 pixels is pretty big) those are generally best handled as a collection of smaller objects. In particular, use fast mem to mem moves for the large solid sections of the objects, and "compiled objects" to fill in the edges etc. Also, it's a time-space tradeoff that comes out differently on different PCs. For example, if SI is set up to some useful data, it's more compact to say: movsw inc si inc si movsw [total: 8 bytes; 4 bytes code, 4 bytes of external data] than to say: mov [di],1234h mov [di+4],5678h [total: 9 bytes] but the latter will be faster on, I think, anything 286 or later. -- -------------------------------------------------------------------------------
John Payson | Un animal si beau qu'un chat." | ( o o )
|
Thu, 14 Aug 1997 10:19:20 GMT |
|
 |
NewOrder Demo Gro #6 / 13
|
 breaking the 64K data segment
Quote:
>>>The approach I prefer is to "compile" objects into machine code which expects >>>DI to point to the start of the object on-screen, AL to contain the bitplane >>>mask in both nybbles [i.e. bitplane 1 would be 00100010], and DX to contain >>>the pre-set-up pointer to the bitplane select register. >>I have seen this method before. Ugh. This method is as bad as checking for >>zeros. Changing the bitplanes takes up far too much time for this method to >>be practical. And the space it takes up is a waste. >It is only necessary to change bitplanes four times for each object plotted. >I can't imagine any way one could do better in X mode. As for space, the >average is probably around 3 bytes/pixel. A bit bloated, but not horribly.
Ok, a fair bit better than what I envisioned. Although there are STILL better ways of doing it, even in ModeX, at half the space. (In my original post I said that my friend used masking, but it was slower and took up more memory than the method I use). Quote: >First off, the average number of bytes storing the image probably works out >to 3 bytes/pixel. Less if one optimizes the compiled code [note word values
3 bytes/pixel? One byte per pixel should be enough! Think of all the development time wasted on compiling these bitmaps and trying to figure out how to optimize them. Quote: >that get used more than once, or bytes that get used more than twice, and >keep them in registers]. Again, it's a bit bloated, but only 50% worse than >using masks, and much faster than using explicit check-for-zero. As for
Yes, I will agree that you are moving in the right direction, but there is still (at least) one more method for doing this. [rest deleted] Quote:
> John Payson | Un animal si beau qu'un chat." | ( o o )
TCA of NewOrder And thus spake the master programmer: Though a program be but three lines long, someday it will have to be maintained.
|
Thu, 14 Aug 1997 20:01:17 GMT |
|
 |
John Pays #7 / 13
|
 breaking the 64K data segment
Quote: >>First off, the average number of bytes storing the image probably works out >>to 3 bytes/pixel. Less if one optimizes the compiled code [note word values >3 bytes/pixel? One byte per pixel should be enough! Think of all the development >time wasted on compiling these bitmaps and trying to figure out how to optimize >them.
The optimization may be done by the same program that compiles the bitmaps. Given that most bitmaps used in games are run through several programs anyway (e.g. to combine many bitmaps into one file, to remap colors, etc.) adding one more program to the mix doesn't really cost much. Quote: >>that get used more than once, or bytes that get used more than twice, and >>keep them in registers]. Again, it's a bit bloated, but only 50% worse than >>using masks, and much faster than using explicit check-for-zero. As for >Yes, I will agree that you are moving in the right direction, but there >is still (at least) one more method for doing this.
What method do you prefer, that manages an average of less than one instruction per pixel plotted, and no instructions for pixels not plotted? -- --------------------------------------------------------------------------- ----
John Payson | Un animal si beau qu'un chat." | ( o o )
|
Fri, 15 Aug 1997 02:30:32 GMT |
|
 |
NewOrder Demo Gro #8 / 13
|
 breaking the 64K data segment
Quote:
>>Yes, I will agree that you are moving in the right direction, but there >>is still (at least) one more method for doing this. >What method do you prefer, that manages an average of less than one >instruction per pixel plotted, and no instructions for pixels not plotted?
Well, I'm very tempted to tell, but because I have never seen anyone use this method I'd rather not (just yet). What I will say is that I convert my bitmaps from bitmaps to something else that knows where the zeroes are. The size of the bitmaps usually decrease, except in the case of a checkboard. Then they get 5 times larger. There is no compression involved. There is no checking for zeroes (except beforehand). Quote:
> John Payson | Un animal si beau qu'un chat." | ( o o )
TCA of NewOrder
|
Fri, 15 Aug 1997 04:18:53 GMT |
|
 |
John Pays #9 / 13
|
 breaking the 64K data segment
Quote: >Well, I'm very tempted to tell, but because I have never seen anyone use >this method I'd rather not (just yet). What I will say is that I convert >my bitmaps from bitmaps to something else that knows where the zeroes are. >The size of the bitmaps usually decrease, except in the case of a checkboard. >Then they get 5 times larger. There is no compression involved. There is >no checking for zeroes (except beforehand).
Ah, so you use the good ol' trick: draw: xor cx,cx add di,[si] mov cl,[si] add si,3 dl_start: rep movsb add di,[si] add si,3 add cl,[si-1] jnz dl_start [or some variation which allows for movsw] Not a bad trick, depending upon how solid the object in question is. I was under the impression that mov immed. to memory was as fast as a rep mov on machines 386+ (though not on 386sx), but perhaps I'm mistaken. -- -------------------------------------------------------------------------------
John Payson | Un animal si beau qu'un chat." | ( o o )
|
Fri, 15 Aug 1997 05:40:37 GMT |
|
 |
NewOrder Demo Gro #10 / 13
|
 breaking the 64K data segment
Quote:
>Ah, so you use the good ol' trick: >draw: > xor cx,cx > add di,[si] > mov cl,[si]
Should this be mov cl,[si+2]? Quote: > add si,3 >dl_start: > rep movsb > add di,[si] > add si,3 > add cl,[si-1] > jnz dl_start
You're getting closer. Since this is similar enough, I'll post mine. lds si,BlitSprites[ebx*4] lodsw
mov cx,ds:[si+2] add si,4 shr cx,1 pushf shr cx,1 rep movsd
That moves doublewords (if possible) at a time, without zeroes, without comparing, without compiled bitmaps. Lemme see, 17 instructions, 160K of sprite data....that's about 9411 pixels per instruction. Much better than 3 bytes per pixel. I've been using this method for about two years, and have never seen anyone else use it (Although I suspect that Accolade uses it in Starcon). Even Tran compares for zeroes in Timeless. I'm using this stuff in a parallax scrolling engine that get pretty good (past 70fps on my 486-33LB) frame rates. This really helps drawing the foreground, since I am only drawing what I need to, not comparing 64000 pixels for zero. Getting Blits to clip, however, is a nightmare. --TCA of NewOrder
|
Fri, 15 Aug 1997 06:54:58 GMT |
|
 |
John Pays #11 / 13
|
 breaking the 64K data segment
[code deleted] Quote: >That moves doublewords (if possible) at a time, without zeroes, without >comparing, without compiled bitmaps. Lemme see, 17 instructions, 160K of >sprite data....that's about 9411 pixels per instruction. Much better than 3 >bytes per pixel. I've been using this method for about two years, and have >never seen anyone else use it (Although I suspect that Accolade uses it in >Starcon). Even Tran compares for zeroes in Timeless. I'm using this stuff >in a parallax scrolling engine that get pretty good (past 70fps on my >486-33LB) frame rates. This really helps drawing the foreground, since I >am only drawing what I need to, not comparing 64000 pixels for zero. Getting >Blits to clip, however, is a nightmare.
First off, I'd suggest you not use pushf/popf, as that can cause excessive overhead when in v86 mode (stupid design decision on Intel's part, but such is life...) Secondly, your code can probably be improved (at the expense of redoing the preprocessing of your objects) as follows: ; Assumes: DS:SI points to object list ; Assumes: ES:DI points to start of destination on-screen ; First, process all runs which begin on even pixels and end on odd ; pixels mov cx,[si] jcxz evenodd_done evenodd_loop: add di,[si+2] add si,4 rep movsw add cx,[si] jnz evenodd_loop evenodd_done: ; Next, process all runs beginning and ending on even pixels add cx,[si+2] js eveneven_done eveneven_loop: add di,[si+4] add si,6 dec cx rep movsw mov al,[si] mov [es:di],al add cx,[si+2] jns eveneven_loop eveneven_done: ; Next, process all runs starting on odd, ending on even etc. This routine assumes the display list is organized as four lists of items. The first list is for objects which consist only of whole words; each object contains the number of words of data and the ammount to offset DI before plotting these words, followed by an appropriate list of data. The data ends when the count of objects == 0. The next list is for objects which end with an "extra" pixel; these objects end when the count value == 8000h. Note that if di is initially even and the display list "add" values for di are always even, words will always be written to the screen aligned. On many display adapters, this can pay off bigtime. In the event that the sprite has to be able to move horizontally, it may be worthwhile to have two copies of the sprite, offset by a pixel [trick going back to Apple II games]. -- -------------------------------------------------------------------------------
John Payson | Un animal si beau qu'un chat." | ( o o )
|
Fri, 15 Aug 1997 08:27:19 GMT |
|
 |
Robert J. Hil #12 / 13
|
 breaking the 64K data segment
Quote:
> movsb
> That moves doublewords (if possible) at a time, without zeroes, without > comparing, without compiled bitmaps. Lemme see, 17 instructions, 160K of
^^^^^^^^^^^^^^^^^^^^^^^^^ Huh ? What Raw bitmaps have line mods in them ? And besides that this is just a slightly modified version of bytrun encoding in iff's which E.A. Started in ye olde amiga days..... -- Robert J. Hill The opinions in this letter are not nescessarily the same as my employers , and are entirely my OWN. Blame Me. Everyone else does. ''' PGP2.6 Public Encryption Key Available By Request (o o)
|
Sat, 16 Aug 1997 20:14:51 GMT |
|
|
|