FPU instructions etc. 
Author Message
 FPU instructions etc.

Hello, (world!)
I am just about to enter the wonderful world of the math co-processor. I
was wondering if someone could point out some good resources explaining
things.
And whats this thing about 'unrolling' I've seen in some code snippets?
If you unroll this a couple of times it'll be faster? Huh?

Appreciated,
Jussi



Wed, 13 Nov 2002 03:00:00 GMT  
 FPU instructions etc.
Vulture's site:
http://www.ice-digga.com/programming/vul.html - I haven't looked at his
FPU tutorial, but hey, see how you like it :)

Quote:

> Hello, (world!)
> I am just about to enter the wonderful world of the math co-processor. I
> was wondering if someone could point out some good resources explaining
> things.
> And whats this thing about 'unrolling' I've seen in some code snippets?
> If you unroll this a couple of times it'll be faster? Huh?

> Appreciated,
> Jussi

--

Team2k PC/Palm Pilot Programming Team:
http://ppilot.homepage.com

To email me, remove '3*&' from my email address. This is to deter spam :)



Wed, 13 Nov 2002 03:00:00 GMT  
 FPU instructions etc.

Quote:

> I am just about to enter the wonderful world of the math co-processor. I
> was wondering if someone could point out some good resources explaining
> things.

I haven't found a good one other than the original documentation.

Quote:
> And whats this thing about 'unrolling' I've seen in some code snippets?
> If you unroll this a couple of times it'll be faster? Huh?

Take the loop:

BASIC:
let y=0;
for x=1 to 10
y=y+x
next x

C++:
int y=0;
for (int x=1;i<11;i++) y+=x;

Now, such a loop, when executed, would spend a lot of time in the looping
code and not a lot of time in the addition stage.  In pseudo-machine
language, it would look like this:

        y=0;
        x=1;
loop:   is x<11?
        if no, exit loop
        add x to y      // of interest
        add 1 to x
        branch back up to loop
exit loop:
        .
        .
        .

Now, the line of interest above is the only line that actually does
something computation-wise.  The rest of it is just to control the program
flow.  Stepping through all that code ten times wastes time, and on modern
processors the branching bit takes relatively far more time than the
addition part.  So a good compiler would optimise it by unrolling the
loop, thus:

        y=0
        add 1 to y
        add 2 to y
        add 3 to y
        add 4 to y
        add 5 to y
        add 6 to y
        add 7 to y
        add 8 to y
        add 9 to y
        add 10 to y

This would be unrolling the loop ten times.  All I've done is take away
the control code and list the 'inner loop' the number of times it would
actually execute.

You could also unroll it just once by getting the loop to do two additions
in the loopy bit, or it could do five additions twice.  You'll note my
'optimisation' above doesn't account for x should it be needed later on,
and also any self-respecting compiler would optimise the whole lot as
simply y=55.

Modern processors can unroll loops to a limited extent before the code
hits the actual execution unit.  In fact, unrolling loops can decrease
execution speed in certain cases.

Unrolling loops is all about increasing speed.  The tradeoff is that the
program size generally gets bigger.

Richard Cavell



Thu, 14 Nov 2002 03:00:00 GMT  
 FPU instructions etc.
For basic info, see Ch 14 of "The art of assembly language programming"
at http://webster.cs.ucr.edu
Randy Hyde


Quote:
> Hello, (world!)
> I am just about to enter the wonderful world of the math co-processor. I
> was wondering if someone could point out some good resources explaining
> things.
> And whats this thing about 'unrolling' I've seen in some code snippets?
> If you unroll this a couple of times it'll be faster? Huh?

> Appreciated,
> Jussi



Sat, 16 Nov 2002 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Assembly FPU instructions

2. Minutage d'instructions - FPU/SSE

3. Problem mixing ASM dll with FPU instructions and VB

4. FPU instructions

5. FPU instruction for square root?

6. FPU instruction for square root?

7. FPU (x87) FWAIT instruction?

8. FPU (x87) instruction encoding

9. FPU instructions move data faster than MOV?!

10. Instruction Speed and instruction availability

11. Algorithm etc etc

12. Overloading and / or / = / etc etc

 

 
Powered by phpBB® Forum Software