Experiments with 3D-Now! 
Author Message
 Experiments with 3D-Now!

Experimenting with AMD's 3D-Now! floating-point and prefetch
extensions left me puzzled. The 3D-Now! stuff allows to work
on two 32-bit FP numbers simultaneously. However, the naive
expectation that properly written code would run two to four
times faster than standard FP code proved false. My non-assembler
stuff improves at most 10 to 20%, and then mostly due to the
prefetch instruction on > 500 element vectors, not to the SIMD
stuff.

Up to now I've only tried the SDOT routine (multiply two FP
vectors).

Does anybody know what theoretically can be achieved in
practice with careful use of 3D-Now! (on, say, SDOT)?

-marcel

-- Here is the main sdot. Tried on AMD Athlon 900 MHz.
-- The code seems to be almost completely insensitive to
-- scheduling. Max. achieved: 512 Mflops with smm.frt.
-- A larger unroll doesn't help (tried 8 and 16)
-- The assembler is experimental ( see prefetch, )

CODE sdot(n*4) ( 'a 'b cnt/4 -- 'a1 'b1 float32 )

        rpush,

        ecx pop, ebx pop, eax pop,
        edx push,
        mm7 -> mm7 pxor,     \ clear accumulator
        ecx -> edx mov,  
        ecx -> ecx xor,
        edx -> edx or,
0<>, IF,
        ALIGN16
        BEGIN,
          #64 [eax] [ecx*4] -> mm0 prefetch,
          #64 [ebx] [ecx*4] -> mm0 prefetch,

              [eax] [ecx*4] -> mm0 movq,
              [ebx] [ecx*4] -> mm0 pfmul,
                        mm0 -> mm7 pfadd,

            8 [eax] [ecx*4] -> mm1 movq,
            8 [ebx] [ecx*4] -> mm1 pfmul,
                        mm1 -> mm7 pfadd,

                       4 b# -> ecx add,
                               edx dec,
    0=, UNTIL,
ENDIF,
        edx pop,
        [eax] [ecx*4] -> eax lea,  eax push,
        [ebx] [ecx*4] -> ebx lea,  ebx push,

        -4 [esp] -> eax lea,
        mm7  -> mm7 pfacc,
        mm7  -> mm0 movq,
        4 d# -> esp sub,
        mm0  -> 0 [eax] movd,
        femms,

        rpop, ebx jmp,

END-CODE



Fri, 06 Feb 2004 14:24:42 GMT  
 Experiments with 3D-Now!

Quote:

> Experimenting with AMD's 3D-Now! floating-point and prefetch
> extensions left me puzzled. The 3D-Now! stuff allows to work
> on two 32-bit FP numbers simultaneously. However, the naive
> expectation that properly written code would run two to four
> times faster than standard FP code proved false. My non-assembler
> stuff improves at most 10 to 20%, and then mostly due to the
> prefetch instruction on > 500 element vectors, not to the SIMD
> stuff.

> Up to now I've only tried the SDOT routine (multiply two FP
> vectors).

> Does anybody know what theoretically can be achieved in
> practice with careful use of 3D-Now! (on, say, SDOT)?

> -marcel

> -- Here is the main sdot. Tried on AMD Athlon 900 MHz.
> -- The code seems to be almost completely insensitive to
> -- scheduling. Max. achieved: 512 Mflops with smm.frt.
> -- A larger unroll doesn't help (tried 8 and 16)
> -- The assembler is experimental ( see prefetch, )

> CODE sdot(n*4) ( 'a 'b cnt/4 -- 'a1 'b1 float32 )

>         rpush,

>         ecx pop, ebx pop, eax pop,
>         edx push,
>         mm7 -> mm7 pxor,        \ clear accumulator
>         ecx -> edx mov,
>         ecx -> ecx xor,
>         edx -> edx or,
> 0<>, IF,
>         ALIGN16
>         BEGIN,
>           #64 [eax] [ecx*4] -> mm0 prefetch,
>           #64 [ebx] [ecx*4] -> mm0 prefetch,

>               [eax] [ecx*4] -> mm0 movq,
>               [ebx] [ecx*4] -> mm0 pfmul,
>                         mm0 -> mm7 pfadd,

>             8 [eax] [ecx*4] -> mm1 movq,
>             8 [ebx] [ecx*4] -> mm1 pfmul,
>                         mm1 -> mm7 pfadd,

>                        4 b# -> ecx add,
>                                edx dec,
>     0=, UNTIL,
> ENDIF,
>         edx pop,
>         [eax] [ecx*4] -> eax lea,  eax push,
>         [ebx] [ecx*4] -> ebx lea,  ebx push,

>         -4 [esp] -> eax lea,
>         mm7  -> mm7 pfacc,
>         mm7  -> mm0 movq,
>         4 d# -> esp sub,
>         mm0  -> 0 [eax] movd,
>         femms,

>         rpop, ebx jmp,

> END-CODE

Marcel,

I am guessing that 3D-Now is basically what was in the IIT
803C87 chip, namely space to hold a 4x4 matrix and a 4-dim
vector. Rotation, translation and scaling can be expressed
by the matrix, and the 4-dim vector is the usual 3D position
with 1 in the 4th component. Having the matrix * vector mult
built in greatly speeds up image redrawing since one represents
the ends of line segments as vectors, etc. etc. But 10-20%
sounds right.

If this is what is happening, then I actually discussed how
this feature can be used to speed up matrix ops by some factor
that depended on the bus width, data length, and which matrix
op you were using. Place where discussed: my book "Scientific
Forth", which I believe you possess.

If my guess is wrong and that is NOT what 3D-Now does, then
I have no idea.

--
Julian V. Noble
Profesor of Physics

Galileo's Commandment:

   "Science knows only one commandment: contribute to science."
   -- Bertolt Brecht, "Galileo".



Sat, 07 Feb 2004 11:15:55 GMT  
 Experiments with 3D-Now!

Quote:
> Marcel,

> I am guessing that 3D-Now is basically what was in the IIT
> 803C87 chip, namely space to hold a 4x4 matrix and a 4-dim
> vector. Rotation, translation and scaling can be expressed
> by the matrix, and the 4-dim vector is the usual 3D position
> with 1 in the 4th component. Having the matrix * vector mult
> built in greatly speeds up image redrawing since one represents
> the ends of line segments as vectors, etc. etc. But 10-20%
> sounds right.

> If this is what is happening, then I actually discussed how
> this feature can be used to speed up matrix ops by some factor
> that depended on the bus width, data length, and which matrix
> op you were using. Place where discussed: my book "Scientific
> Forth", which I believe you possess.

> If my guess is wrong and that is NOT what 3D-Now does, then
> I have no idea.

> --
> Julian V. Noble
> Profesor of Physics


What is the possibility of an update to this book based on new processor
technology such as 3D-Now and MMX?  Just wondering.


Sat, 14 Feb 2004 22:00:05 GMT  
 Experiments with 3D-Now!

Quote:


> > Marcel,

> > I am guessing that 3D-Now is basically what was in the IIT
> > 803C87 chip, namely space to hold a 4x4 matrix and a 4-dim
> > vector. Rotation, translation and scaling can be expressed
> > by the matrix, and the 4-dim vector is the usual 3D position
> > with 1 in the 4th component. Having the matrix * vector mult
> > built in greatly speeds up image redrawing since one represents
> > the ends of line segments as vectors, etc. etc. But 10-20%
> > sounds right.

> > If this is what is happening, then I actually discussed how
> > this feature can be used to speed up matrix ops by some factor
> > that depended on the bus width, data length, and which matrix
> > op you were using. Place where discussed: my book "Scientific
> > Forth", which I believe you possess.

> > If my guess is wrong and that is NOT what 3D-Now does, then
> > I have no idea.

> > --
> > Julian V. Noble
> > Profesor of Physics

> What is the possibility of an update to this book based on new processor
> technology such as 3D-Now and MMX?  Just wondering.

Someone would have to give me a sabattical to do it. Otherwise
I do not have time to revise SciFth for the next few years.

However, if you know where I can get the info on 3D-Now and
MMX --detailed spec sheets and/or op codes is all I need, I
could probably update the Forth words I developed in fairly
short order.

--
Julian V. Noble
Professor of Physics

Galileo's Commandment:

   "Science knows only one commandment: contribute to science."
   -- Bertolt Brecht, "Galileo".



Sun, 22 Feb 2004 22:01:31 GMT  
 Experiments with 3D-Now!

Quote:

> However, if you know where I can get the info on 3D-Now and
> MMX --detailed spec sheets and/or op codes is all I need, I
> could probably update the Forth words I developed in fairly
> short order.

http://developer.intel.com/

--
Bernd



Mon, 23 Feb 2004 00:51:00 GMT  
 Experiments with 3D-Now!

Quote:


[..]
>>> I am guessing that 3D-Now is basically what was in the IIT
>>> 803C87 chip, namely space to hold a 4x4 matrix and a 4-dim
>>> vector. Rotation, translation and scaling can be expressed
>>> by the matrix, and the 4-dim vector is the usual 3D position
>>> with 1 in the 4th component. Having the matrix * vector mult
>>> built in greatly speeds up image redrawing since one represents
>>> the ends of line segments as vectors, etc. etc. But 10-20%
>>> sounds right.

No, 3D-Now is almost a complete, but very "basic", alternative
floating-point instruction set. It works on four 32-bit FP numbers
at the same time and doesn't use a stack but 8 conventional registers.
Most instructions (except for the really useful ones like MAC :-)
execute in a single cycle.

[.. JvN's book "Scientific Forth" ..]

Quote:
>>> If my guess is wrong and that is NOT what 3D-Now does, then
>>> I have no idea.

[..]

Quote:
>> What is the possibility of an update to this book based on new processor
>> technology such as 3D-Now and MMX?  Just wondering.
> Someone would have to give me a sabattical to do it. Otherwise
> I do not have time to revise SciFth for the next few years.
> However, if you know where I can get the info on 3D-Now and
> MMX --detailed spec sheets and/or op codes is all I need, I
> could probably update the Forth words I developed in fairly
> short order.

Let's hope somebody will pick up the suggestion.

I'd advise to wait a little and build on SSE2 (P4 and hopefully new
AMD offerings), which uses 64-bit floats in 128-bit registers. However,
if the results are as disappointing as with 3D-Now ...

-marcel



Mon, 23 Feb 2004 01:48:59 GMT  
 Experiments with 3D-Now!

        [ deleted ]

Quote:
> However, if you know where I can get the info on 3D-Now and
> MMX --detailed spec sheets and/or op codes is all I need, I
> could probably update the Forth words I developed in fairly
> short order.

Thanks to Bernd Beuster for pointing me to Intel. The URL
that looks full of useful info is

http://developer.intel.com/software/products/itc/strmsimd/sseappnots.htm

--
Julian V. Noble
Professor of Physics

Galileo's Commandment:

   "Science knows only one commandment: contribute to science."
   -- Bertolt Brecht, "Galileo".



Mon, 23 Feb 2004 04:38:53 GMT  
 Experiments with 3D-Now!


Quote:

>> However, if you know where I can get the info on 3D-Now and
>> MMX --detailed spec sheets and/or op codes is all I need, I
>> could probably update the Forth words I developed in fairly
>> short order.

>http://developer.intel.com/

I guess he will have more luck at AMD (no 3DNow from Intel).

- anton
--
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html



Mon, 23 Feb 2004 20:18:24 GMT  
 Experiments with 3D-Now!

Quote:




> >> However, if you know where I can get the info on 3D-Now and
> >> MMX --detailed spec sheets and/or op codes is all I need, I
> >> could probably update the Forth words I developed in fairly
> >> short order.

> >http://developer.intel.com/

> I guess he will have more luck at AMD (no 3DNow from Intel).

> - anton
> --
> M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
> comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html

I began to suspect that after I saw nothing on 3D-Now from my Intel
search.
--
Julian V. Noble
Professor of Physics

Galileo's Commandment:

   "Science knows only one commandment: contribute to science."
   -- Bertolt Brecht, "Galileo".



Tue, 24 Feb 2004 09:27:18 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. accessing 3D FFT data as 1D/3D complex/real arrays

2. ? subroutine to check if a 3D point is in the 3D region or not

3. Experiments with I.P. SHARP APL/PC

4. An experiment in human nature 18234

5. An Experiment with Time

6. Email Experiment - Please help

7. An experiment in human nature 4784

8. Forth online experiment

9. I want to experiment with teo Forth systems exchanging data via the Internet

10. Anton Ertl's objects.fs's experiment

11. Clip4win experiment problem

12. would it still be Forth? (A thought experiment)

 

 
Powered by phpBB® Forum Software