Programming the Diamond Stealth 64 VRAM video card (S3 Vision 964) 
Author Message
 Programming the Diamond Stealth 64 VRAM video card (S3 Vision 964)

     I am looking for some assistance in programming a Diamond Stealth 64
VRAM video card.  The graphics chip is an S3 Vision964 chip.  The purpose of
the routines is to first initialize the graphics coprocessor chip, and
second, blast a screen of data as fast as possible utilizing the graphics
coprocessor.  These routines will be used to animate scientific data, and
should be fortran callable.

     I am using Microsoft Power Fortran version 1.0 for initializing the
graphics mode ( VESA 106h, 1280 x 1024 x 4 bits/pixel ), and MASM 6.11d with
the /coff option, as required by Microsoft Power Fortran.

     The problem seems to be with unlocking the S3 registers.  When I attempt
to enable access to the Enhanced Command registers by setting bit 0 in CR40
to 1, either a mode change seems to occur or something else happens.  I am
not sure if there are other registers I need to set before or after enabling
the Enhanced Command registers.  After this, enabling the enhanced mode
functions freezes the machine.

     Additionally, there may be other errors that I may not be familiar with.
With the structure I have specified, I believe MASM takes care of pushing the
registers and return pointer on the stack at the beginning of the procedure,
and then popping everything off at the end.  But perhaps the experts out
there see many more problems?

     Finally I have a question about optimizing the loop "blast", which
writes the data to the Pixel Data Transfer Register.  The loop is as follows:

blast:  
        mov     eax, [ebx + ecx*4]      ; eax contains an array element
        out     dx, eax                 ; write to pixel data transfer reg.
        inc     ecx                     ; increase ecx by 1
        cmp     ecx, 00020000h          ; 20000h = 512kb / 4
        jnz blast                       ; loop until ecx is 20000h

Would this loop be faster using "outs" or some form of "mov mov out out"?  
Additionally, the Pixel Data Transfer Register can be memory mapped.  Would
writing to the mapped memory be faster?  Or are any optimizations unnecessary
due to the limitations of the PCI Bus?

Here are the routines and FORTRAN code:

;       s3rkey.asm - unlocks S3 registers
;
;       INTERFACE TO SUBROUTINE s3rkey
;       END
;

        .486
        .MODEL  FLAT, STDCALL

S3RKEY  PROTO   STDCALL

        .CODE

S3RKEY  PROC    STDCALL

;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;       Unlock S3 registers                 ;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Write code to CR38 to provide access to the S3 VGA registers (CR30-CR3F)
;
        mov     dx, 3D4h        ; copy index register address into dx    
        mov     al, 38h         ; copy index for CR38 register into al
        out     dx, al          ; write index to index register
        inc     dx              ; increment dx to 3D5h (data register address)
        mov     al, 48h         ; copy unlocking code (01xx10xxb) to al
        out     dx, al          ; write the unlocking code to the data reg.
        dec     dx              ; restore the index register address to dx
;
; Write code to CR39 to provide access to the System Control and System
; Extension registers (CR40-CRFF)
;
; dx is already loaded with 3D4h because of the previous instruction
;
        mov     al, 39h         ; copy index for CR39 register into al
        out     dx, al          ; write index to index register
        inc     dx              ; increment dx to 3D5h (data register address)
        mov     al, 0A5h        ; copy unlocking code to al (the code A5h
                                ; also unlocks access to configuration
                                ; registers CR36, CR37 and CR68
        out     dx, al          ; write the unlocking code to the data reg.
        dec     dx              ; restore the index register address to dx
;
; Set bit 0 in CR40 to enable access to the Enhanced Command registers
;
; dx is already loaded with 3D4h because of previous instruction
;
;
; This routine seems to change the video mode as well or does something
; besides just unlocking the Enhanced Command registers.
;
;
        mov     al, 40h         ; copy index for CR40 register into al
        out     dx, al          ; write index to index register
        inc     dx              ; increment dx to 3D5h (data register address)
        in      al, dx          ; read register data for read/modify/write op
        or      al, 00000001b   ; set bit 0 to 1
        out     dx, al          ; write the unlocking code to the data reg.
        dec     dx              ; restore the index register address to dx
;
; Enable Enhanced mode functions
;
; This routine will then freeze the system up
;
;
        mov     dx, 4AE8h       ; Advanced Function Control Register
        in      al, dx          ; Read register data for read/modify/write op
        or      al, 5           ; set bit 0 to 1 and bit 2 to 1
        out     dx, al          ; write code to adv. Function control reg.

        ret

S3RKEY  ENDP

        END

;       s64bitbl.asm - Blasts a 1024 x 1024 square of 4-bit per pixel
;                      data ( 512kB ), 32-bits at a time, to the Stealth 64
;                      video card using the pixel data transfer register.
;                      The program must be assembled into a COFF object
;                      file and called from Microsoft Power Fortran
;
;       INTERFACE TO SUBROUTINE bitblt ( ipxls )
;       INTEGER*4 ipxls
;       END
;

        .486
        .MODEL  FLAT, STDCALL

BITBLT  PROTO   STDCALL, ipxls:PTR SDWORD

        .CODE

BITBLT  PROC    STDCALL, ipxls:PTR SDWORD

        mov     ebx, ipxls              ; Move the array starting ptr -> ebx
        mov     dx, 9AE8h               ; Graphics processor status reg.
;
; The machine will get stuck in the following loop if the S3 registers are
; not properly unlocked.
;

fifo:  
        in      ax, dx                  ; status -> ax
        bt      ax, 2                   ; place bit 2 into carry flag (6 open)
        jb      fifo                    ; loop until 6 open slots

        mov     dx, 0BAE8h              ; foreground mix register -> dx
        mov     ax, 0047h               ; foreground mix into ax
        out     dx, ax                  ; 0047h -> foreground mix
        mov     dx, 0BEE8h              ; Pixel control register -> dx
        mov     ax, 0A000h              ;
        out     dx, ax                  ; foreground mix is source of color
        mov     dx, 86E8h               ; starting x position register
        mov     ax, 0080h
        out     dx, ax                  ; horizontal corner is 128
        mov     dx, 82E8h               ; starting y position reg.
        mov     ax, 0000h
        out     dx, ax                  ; vertical corner is 0
        mov     dx, 96E8h               ; major axis pixel count register
        mov     ax, 03FFh
        out     dx, ax                  ; (width - 1) = (1024 - 1 ) pixels
        mov     dx, 0BEE8h              ; minor axis pixel count register
        out     dx, ax                  ; (height - 1 ) = ( 1024 - 1 ) pixels

        mov     dx, 9AE8h               ; Graphics processor status register

;
; The machine will get stuck in the following loop if the S3 registers are
; not properly unlocked.
;

genb:  
        in      ax, dx                  ; statux -> ax
        bt      ax, 9                   ; place bit 9 into carry flag
        jb      genb                    ; loop until graphics engine not busy

        mov     dx, 9AE8h               ; drawing command register
        mov     ax, 0101010110110001b
        out     dx, ax                  ; Command for data cpu -> grafix eng.

        mov     dx, 0E2E8h              ; load dx with pixel transfer reg.
        mov     ecx, 00000000h          ; load up the counter with 0

;
; This is the critical loop to optimize
;

blast:  
        mov     eax, [ebx + ecx*4]      ; eax contains an array element
        out     dx, eax                 ; write to pixel data transfer reg.
        inc     ecx                     ; increase ecx by 1
        cmp     ecx, 00020000h          ; 20000h = 512kb / 4
        jnz blast                       ; loop until ecx is 20000h

        ret

BITBLT  ENDP

        END

And here is the calling FORTRAN code:

      INTERFACE TO SUBROUTINE bitblt(ipxls)
      integer*4 ipxls(131072)
      END

      INTERFACE TO SUBROUTINE s3rkey()
      END

      INCLUDE 'FGRAPH.FI'
      INCLUDE 'FGRAPH.FD'

      parameter(in=131072)
      parameter(inxy=1024)
      integer*2 modestatus, dummy, ix, iy
      integer*4 i,j,icount,i1,i2,i3,i4,i5,i6,i7,i8
      integer*4 ipxls(in),iay(inxy,inxy)
      real*8 r,rc,x,y,d

      rc = DFLOAT(inxy)/2.0d0 + 1.0d0

c      
c This loop generates a representitave density matrix with 16 contours
c
      do 20 j=1,inxy
          do 10 i=1,inxy

              x = DFLOAT(i) - rc
              y = rc - DFLOAT(j)

              r = DSQRT(x*x + y*y)/DFLOAT(rc)

              d = (1.0d0 - 3.0d0*r**2.0d0 + 3.0d0*r**4 - r**6) * 15.0d0

              iay(i,j) = IDNINT(d)

              if (iay(i,j).gt.15) iay(i,j)=15
              if (iay(i,j).lt.0) iay(i,j)=0

10        continue
20    continue

c
c This loop packs the pixels.  There should be 8 pixels in each Integer*4
c index.  This loop may not be correct at this time, but it is irrelevant
c to dubugging the assembly language code.
c

      icount = 0
      do 40 j = 1, inxy
          do 30 i = 1, inxy, 8
                icount = icount + 1

                i1 = iay(i,j)
                if (i1.lt.8) then
                     i1 = (i1 * 2) * 134217728
                     i2 = iay(i+1,j) * 16777216
                     i3 = iay(i+2,j) * 1048576
                     i4 = iay(i+3,j) * 65536
                     i5 = iay(i+4,j) * 4096
                     i6 = iay(i+5,j) * 256
                     i7 = iay(i+6,j) * 16
                     i8 = iay(i+7,j)
                else
                     i1 = ((iay(i,j) - 16) * 2 ) * 134217728
                     i2 = (iay(i+1,j) - 16) * 16777216
                     i3 = (iay(i+2,j) - 16) * 1048576
                     i4 = (iay(i+3,j) - 16) * 65536
                     i5 = (iay(i+4,j) - 16) * 4096
                     i6 = (iay(i+5,j) - 16) * 256
                     i7 = (iay(i+6,j) - 16) * 16
                     i8 = (iay(i+7,j) - 16)
                endif

                ipxls(icount) = i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8

30        continue
40    continue

      write(*,*)'Press Return to change video mode'
      read(*,*)

c
c     The following should set the VESA video mode 106h (1280 x 1024 x
c                                                             4 bits/pixel)
c
      modestatus = SETVIDEOMODE( $ZRES16COLOR )

c
c     This just writes a test pixel in the middle of the screen
c
      ix = 640
      iy = 512
      dummy = SETPIXEL( ix,iy )

      write(*,*)'Press Return to unlock the registers'
      read(*,*)

      call s3rkey()

      write(*,*)'Press Return to Blast the data'
      read(*,*)

      call bitblt(ipxls)

      write(*,*)'Press Return to End'
      read(*,*)

      end

     If anyone can tell me all the mistakes I am making, I would appreciate
it.  Thank you very much for your time.  Please email me, and if you feel
others would also be interested, post a follow-up message.

David Bachman

************************************************
Damnit Jim, I'm a physicist, not a programmer.



Mon, 05 Oct 1998 03:00:00 GMT  
 
 [ 1 post ] 

 Relevant Pages 

1. PCI Video Cards and ISA Video Cards

2. PCI Video Cards and ISA Video Cards

3. Diamond Ships Stealth 3D 3000XL

4. Programming the S3 VGA card..

5. GForce2 Video card programming with HLA ?

6. Programming VGA VIDEO CARD

7. GForce2 video card programming with HLA ?

8. Video Card Programming

9. Programming Vga video card

10. S3 video Problems?

11. S3 video board problem

12. Problem with S3 video board

 

 
Powered by phpBB® Forum Software