How to optimize execution speed of code ?? 
Author Message
 How to optimize execution speed of code ??

I know loop unrolling. Any other techniques ??
(I've heard that optimization should better be done by the compiler itself,
or experts only.)


Sun, 07 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??

Quote:

> I know loop unrolling. Any other techniques ??
> (I've heard that optimization should better be done by the compiler itself,
> or experts only.)

indeed.  loop unrolling should never be done by a human (if for no other
reason other than he has better things to do).  clue: if there are any set
"rules" for optimising, your compiler has probably implemented them years
ago.  sadly, not everyone seems to figure this out (anyone ever heard "dood!
don't divide by 2!  just bitshift it!"?).

the only way to optimise your code is to implement better algorithms.

p.s. please excuse my gender-neutral use of 'he' :)

--
              /"\                              m i k e    b u r r e l l

               X        AGAINST HTML MAIL      http://mikpos.dyndns.org
              / \



Sun, 07 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??

Quote:


> > I know loop unrolling. Any other techniques ??
> > (I've heard that optimization should better be done by the compiler itself,
> > or experts only.)

> indeed.  loop unrolling should never be done by a human (if for no other
> reason other than he has better things to do).  clue: if there are any set
> "rules" for optimising, your compiler has probably implemented them years
> ago.  sadly, not everyone seems to figure this out (anyone ever heard "dood!
> don't divide by 2!  just bitshift it!"?).

> the only way to optimise your code is to implement better algorithms.

Of course one should implement the best algorithm for the problem at
hand; "best" includes ease of coding without bugs, implementation and
running time.

Last year an algorithms course at my university included a simple
implementation task. The algorithm to implement was the well-known
divide-and conquer multiplication algorithm (Karatsuba's). It turned
out that some students produced code with the correct asymptotical
running time, O(n^(log_2 3)) with n being the number of digits in the
numbers multiplied, but still running 100 times slower than a decent
implementation. Both versions were compiled with the same C compiler
and running on the same hardware! Most of the slow-down was due to
unnecessary memory allocation (lots of malloc/free calls instead of a
global memory arena) and/or unnecessarily sparse packing of data
(e.g. only using the lower-most 4 bits of a 32-bit integer).

*This* is probably the kind of stuff one should work on, rather than
replacing << 1 by * 2 through the code. Most compilers are very good
at micro optimization, such as instruction selection, strength
reduction (e.g. shift instead of mult/div) etc.

In my experience, they are much worse doing global optimization.
Inlining function comes to mind; I've seen a program run 4x slower
when compiled with gcc -O3 than when compiled with gcc -O. It turned
out that the culprit was that the bottleneck code, in which 99% of
the CPU time was spent, didn't fit in the L1 code cache when some
functions were inlined.

There are compilers which you can feed profiling logs (I've read
about this for the Sun cc compiler), and it might be a solution to
this and other problems with optimizing compilers, but I haven't
tried it myself.

/ Gunnar



Sun, 07 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??

: > I know loop unrolling. Any other techniques ??
: > (I've heard that optimization should better be done by the compiler
: > itself, or experts only.)

: indeed.  loop unrolling should never be done by a human (if for no other
: reason other than he has better things to do).  clue: if there are any set
: "rules" for optimising, your compiler has probably implemented them years
: ago.

We can only wish this was true.  Since it isn't, I suggest we modify
the statement to something like:

  "Before unrolling loops by hand, at least make sure that your
   compiler doesn't know how to do it."

: sadly, not everyone seems to figure this out (anyone ever heard "dood!
: don't divide by 2!  just bitshift it!"?).

: the only way to optimise your code is to implement better algorithms.

Nonsense.  Better algorithms will probably give the most significant
difference in performance, but it's far from the only way to optimize
code.  For instance, the same algorithm can be made more cache-friendly,
or rewritten in assembly language.  Enabling the compiler to inline
certain functions (which may include moving code around) is another way.



Sun, 07 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??


Quote:
> indeed.  loop unrolling should never be done by a human (if for no other
> reason other than he has better things to do).  clue: if there are any set
> "rules" for optimising, your compiler has probably implemented them years
> ago.  sadly, not everyone seems to figure this out (anyone ever heard "dood!
> don't divide by 2!  just bitshift it!"?).

As Szu has pointed out unfortunately this isn't true. Only five years ago one
of my C compilers implemented a char-to-float cast with twenty machine
instructions. A compiler can probably optimise a divide-by-constant two, but
loop unrolling is much more dodgy.

Quote:

> the only way to optimise your code is to implement better algorithms.

Half true. Optimisation can be divided into algorithmic optimisation
(reducing the big-O running time of the algorithm) and micro-optimisation
(reducing the number of machine instructuons to perform each operation). Only
very rarely will it be efficient to accept a worse running time for better
micro-optimisation. Implement the best algorithm first, then micro-optimise
to squeeze the last cycles.

Quote:
> p.s. please excuse my gender-neutral use of 'he' :)

> --
>               /"\                              m i k e    b u r r e l l

>                X        AGAINST HTML MAIL      http://mikpos.dyndns.org
>               / \

Sent via Deja.com http://www.deja.com/
Before you buy.


Sun, 07 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??

Quote:

> interestingly,  many compiler systems allow generation of a
> listing of the assembler which produces an object ... and
> hence comparison of assembler optimizeed at different
> levels ... ( programmer can compare results at assembler
> level ... )

Even on systems that don't support generation of a listing,
disassembly of the generated binary can often produce
enlightenment of a similar order.
--
"You call this a *C* question? What the hell are you smoking?" --Kaz


Sun, 07 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??

  when -compile- or -make- of object with compiler system
is successful ... generally,  optionally,  optimization
phase may apply ...
( usually ) operating on generated assembler ...
( check compile options ) ...
interestingly,  many compiler systems allow generation of a
listing of the assembler which produces an object ... and
hence comparison of assembler optimizeed at different
levels ... ( programmer can compare results at assembler
level ... )

 coding practices which produce valid and efficient code
are the basis ( largely ) of the art of programming ...
and, of course, too large a topic to be covered in a single
posting ...

 my internet site ...

 http://home.beseen.com/technology/ralphsilverman/

 ____Ralph Silverman

* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!



Sun, 07 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??
 I always heard, "don't multiply by 13! shift it twice and add it twice!
 it still takes 9 cycles less!"

  -- Ron Steedman

: ago.  sadly, not everyone seems to figure this out (anyone ever heard "dood!
: don't divide by 2!  just bitshift it!"?).



Mon, 08 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??


Quote:
>  I always heard, "don't multiply by 13! shift it twice and add it twice!
>  it still takes 9 cycles less!"

gcc seems to choose the fastest way to multiply by constants for a
given architecture. E.g. on a Pentium, it combines the instruction
add, left shift and leal (can e.g. be used for one-clock mult by 3, 5
and 9).

So don't bother writing this yourself; it your compiler is competent
it will do it for you (without introducing bugs in the process).

/ Gunnar



Mon, 08 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??


Quote:
> I always heard, "don't multiply by 13! shift it twice and add it twice!
> it still takes 9 cycles less!"

On what? For example on a PPro/II/III/Celeron a full-blown integer
multiply takes 4 clock cycles and can run in parallel with other operations
(even other multiply operations through pipelining).
Probably the fastest way to multiply by 13 on these processors would be

  tmp = value*2 + value;
  result = tmp*4 + value;

which can be implemented as 2 single clock lea instructions.
Unless you identify a real bottleneck this sort of thing really is best
left to the compiler.

--
-----------------------------------------


-----------------------------------------



Mon, 08 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??


:> I always heard, "don't multiply by 13! shift it twice and add it twice!
:> it still takes 9 cycles less!"

: On what? For example on a PPro/II/III/Celeron a full-blown integer
: multiply takes 4 clock cycles and can run in parallel with other operations
: (even other multiply operations through pipelining).
: Probably the fastest way to multiply by 13 on these processors would be

:   tmp = value*2 + value;
:   result = tmp*4 + value;

: which can be implemented as 2 single clock lea instructions.

Load effective address?  OK, you've got me curious...

-- Wetboy



Mon, 08 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??

 On the particular MIPS processor I was writing an optimizer for a MUL
 took approx 13 times longer than a SHR or ADD. Note that the code you
 have below is functionally equivalent to what I said (2 shifts, 2 adds).
 When writing an optimizer you don't have the luxury of leaving the opts
 to someone else. : I would think that Microsoft has this built in their
 compiler already - otherwise the "Is assembly faster" debate would have
 been won by hand assemblers. Besides, it's very easy to implement.

   -- Ron Steedman

: On what? For example on a PPro/II/III/Celeron a full-blown integer

: Probably the fastest way to multiply by 13 on these processors would be

:   tmp = value*2 + value;
:   result = tmp*4 + value;


:> I always heard, "don't multiply by 13! shift it twice and add it twice!
:> it still takes 9 cycles less!"



Mon, 08 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??

Quote:




>:> I always heard, "don't multiply by 13! shift it twice and add it twice!
>:> it still takes 9 cycles less!"

>: On what? For example on a PPro/II/III/Celeron a full-blown integer
>: multiply takes 4 clock cycles and can run in parallel with other operations
>: (even other multiply operations through pipelining).
>: Probably the fastest way to multiply by 13 on these processors would be

>:   tmp = value*2 + value;
>:   result = tmp*4 + value;

>: which can be implemented as 2 single clock lea instructions.

>Load effective address?  OK, you've got me curious...

Yes. Consider

int mul13(int value)
{
    return value*13;

Quote:
}

Now gcc -O -fomit-frame-pointer -S mul13.c gives (irrelevant parts
trimmed)

mul13:
        movl 4(%esp),%eax
        leal (%eax,%eax,2),%edx     ; %edx = %eax + %eax*2
        leal (%eax,%edx,4),%edx     ; %edx = %eax + %edx*4
        movl %edx,%eax
        ret
        .ident  "GCC: (GNU) 2.7.2.1"

It could have got rid of the 2nd movl by storing directly in %eax but that
is by the by. The processor can calculate an address formed from
a base plus a scaled index (plus a constant offset). lea instead of
accessing the memory location stores the resulting address (which is
just a value) in a register. However it works for any numbers, not just
valid addresses.

However as you can see the compiler is quite capable of making this
optimisation itself.

--
-----------------------------------------


-----------------------------------------



Mon, 08 Jul 2002 03:00:00 GMT  
 How to optimize execution speed of code ??



:>
:>:> I always heard, "don't multiply by 13! shift it twice and add it twice!
:>:> it still takes 9 cycles less!"
:>
:>: On what? For example on a PPro/II/III/Celeron a full-blown integer
:>: multiply takes 4 clock cycles and can run in parallel with other operations
:>: (even other multiply operations through pipelining).
:>: Probably the fastest way to multiply by 13 on these processors would be
:>
:>:   tmp = value*2 + value;
:>:   result = tmp*4 + value;
:>
:>: which can be implemented as 2 single clock lea instructions.
:>
:>Load effective address?  OK, you've got me curious...

: Yes. Consider

: int mul13(int value)
: {
:     return value*13;
: }

: Now gcc -O -fomit-frame-pointer -S mul13.c gives (irrelevant parts
: trimmed)

: mul13:
:       movl 4(%esp),%eax
:       leal (%eax,%eax,2),%edx     ; %edx = %eax + %eax*2
:       leal (%eax,%edx,4),%edx     ; %edx = %eax + %edx*4
:       movl %edx,%eax
:       ret
:       .ident  "GCC: (GNU) 2.7.2.1"

Cool! Thanks!  Visual C++ (6.0) compiles it that way also.

-- Wetboy



Thu, 11 Jul 2002 03:00:00 GMT  
 
 [ 15 post ] 

 Relevant Pages 

1. Bad Address from Speed Optimized Code

2. How to speed optimize c/c++ code

3. Problems with for speed optimized code

4. Problems with for speed optimized code

5. how to optimize this code for speed

6. Execution time bottleneck: How to speed up execution?

7. Optimizing random number generator (was Re: Optimizing code for tests)

8. Throwing exceptions in ATL dll with optimize for SPEED causes Access Violation

9. Speed/safety optimizing recomendation.

10. Problem with Optimize for speed in VC++ 5.0

11. Optimizing C compiler(speed)

12. Optimizing speed using inline assembly fails - Why?

 

 
Powered by phpBB® Forum Software