Faster faster 
Author Message
 Faster faster

I have just become involved in a large, complex academic written
project which
makes extensive use of vector math. It currently implements very
little STL or Standard C++ library.
I have been tasked with getting this thing to perform considerable
better.
From my short time with the code I believe I *can* add some structure
- add simple things like classes!!!
They have implemented there own array, vector and array classes.  

MSVC V6.0

Considering the matrix math currently takes the form of:

Matrix_Multiply( double* a , int arow , int acol , double* c , int
brow , int bcol, double* result );

I *think* I increase performance by.

class Cmatrix : public std::valarray<double>
{
public:
  size_t m_row;
  size_t m_col;

Quote:
}

And then implement:

stlport (www.stlport.com) AND MTL
( http://www.*-*-*.com/ )

I am concerned that the overhead of using an encapsulated type vs a
fundemental type will negate any potential performance gain.

It would certainly be a 'nicer' way of doing it.  But nice is not
always faster.

Does anyone have any insight of STLPort within MSVC
How would using MTL compare to a fundemental (double) operation.

--
M.



Wed, 02 Mar 2005 22:10:21 GMT  
 Faster faster


Quote:
> I have just become involved in a large, complex academic written
> project which
> makes extensive use of vector math. It currently implements very
> little STL or Standard C++ library.
> I have been tasked with getting this thing to perform considerable
> better.

I assume by "better" you mean "faster", based on the
subject of your post. If faster is not your only criteria,
then say so.

Based on that assumption, I have a question for you:
What were the results of profiling the application to
determine exactly where the delays exist?

If you did not profile the application, do so.

Quote:
> From my short time with the code I believe I *can* add some structure
> - add simple things like classes!!!

Adding classes may or may not add structure to the
code.

If the code is not well structured, feel free to restructure
it. But start from a design, then add classes; don't start
with the assumption that more classes == better structure.

Quote:
> They have implemented there own array, vector and array classes.

This could be good or bad. Taken as a fact in isolation,
it means very little.

Quote:
> MSVC V6.0

> Considering the matrix math currently takes the form of:

> Matrix_Multiply( double* a , int arow , int acol , double* c , int
> brow , int bcol, double* result );

> I *think* I increase performance by.

> class Cmatrix : public std::valarray<double>
> {
> public:
>   size_t m_row;
>   size_t m_col;
> }

> And then implement:

> stlport (www.stlport.com) AND MTL
> (http://www.osl.iu.edu/research/mtl/)

Do _not_ assume this will increase speed (although it
may well improve structure). VC 6's template support
is less than stellar, and may well _reduce_ performance
by confusing with the optimizer.

Quote:
> I am concerned that the overhead of using an encapsulated type vs a
> fundemental type will negate any potential performance gain.

You have given no reason to believe that any of your
changes will improve performance to begin with.

Quote:
> It would certainly be a 'nicer' way of doing it.  But nice is not
> always faster.

Very true.

Quote:
> Does anyone have any insight of STLPort within MSVC
> How would using MTL compare to a fundemental (double) operation.

I can't comment on this. Anyone?

I _can_ state that the best way to determine where
performance can be improved is _not_ to ask for
anecdotal evidence in newsgroups, but to profile the
application.

Good luck.



Thu, 03 Mar 2005 09:00:00 GMT  
 Faster faster

Quote:

> Do _not_ assume this will increase speed (although it
> may well improve structure). VC 6's template support
> is less than stellar, and may well _reduce_ performance
> by confusing with the optimizer.

I have run benchmarks on VC6 that showed (at least in the conditions of
the benchmark) that running a std:: algorithm with custom functors
against a vector<> was exactly as fast as a custom for loop with inline
operations against an array.

That's not to say it will necessarily optimize everything perfectly, but
it does do a pretty good job of flattening the STL abstractions down.

Quote:
> I _can_ state that the best way to determine where
> performance can be improved is _not_ to ask for
> anecdotal evidence in newsgroups, but to profile the
> application.

Totally agreed.  Profile, profile, profile.

Ken



Thu, 03 Mar 2005 17:17:30 GMT  
 Faster faster


Quote:
> I have just become involved in a large, complex academic written
> project which
> makes extensive use of vector math. It currently implements very
> little STL or Standard C++ library.
> I have been tasked with getting this thing to perform considerable
> better.
> From my short time with the code I believe I *can* add some structure
> - add simple things like classes!!!
> They have implemented there own array, vector and array classes.

> MSVC V6.0

> Considering the matrix math currently takes the form of:

> Matrix_Multiply( double* a , int arow , int acol , double* c , int
> brow , int bcol, double* result );

> I *think* I increase performance by.

> class Cmatrix : public std::valarray<double>
> {
> public:
>   size_t m_row;
>   size_t m_col;
> }

> And then implement:

> stlport (www.stlport.com) AND MTL
> (http://www.osl.iu.edu/research/mtl/)

> I am concerned that the overhead of using an encapsulated type vs a
> fundemental type will negate any potential performance gain.

> It would certainly be a 'nicer' way of doing it.  But nice is not
> always faster.

> Does anyone have any insight of STLPort within MSVC
> How would using MTL compare to a fundemental (double) operation.

There is a single way to accurately assess performance
of a body of code:

Measure.

-Mike



Wed, 02 Mar 2005 22:51:37 GMT  
 Faster faster


Quote:
>> Does anyone have any insight of STLPort within MSVC
>> How would using MTL compare to a fundemental (double) operation.

>I can't comment on this. Anyone?

Libraries like MTL and Blitz++ are generally faster than hand writing
the code yourself because they generate code that eliminates many of
the temporaries than a hand coded version would contain for brevity.

e.g.

A = NM + B

A C API might do it as:

temp = MatrixMultiply(n, m, columns, rows);
a = MatrixAdd(temp, b, columns, rows);

whereas, blitz (using expression templates) would make it:

a = MatrixMultiplyAndAdd(n, m, b, ...);

It also does this for much more complex expressions.

So whether Blitz++/MTL performs better than the current code depends
on just how well optimized the current code is, but chances are they
will be faster.

Tom



Fri, 04 Mar 2005 18:36:48 GMT  
 Faster faster


Quote:


> >> Does anyone have any insight of STLPort within MSVC
> >> How would using MTL compare to a fundemental (double) operation.

> >I can't comment on this. Anyone?

> Libraries like MTL and Blitz++ are generally faster than hand writing
> the code yourself because they generate code that eliminates many of
> the temporaries than a hand coded version would contain for brevity.

<snip>

Quote:
> So whether Blitz++/MTL performs better than the current code depends
> on just how well optimized the current code is, but chances are they
> will be faster.

Not to get too far off on a tangent, but that assumes the
most significant bottleneck is in the matrix math operations.

Without profiling the application, no one can say if that
is the case. That's the problem with optimizing via
newsgroup collaboration: you _have_ to answer every
question with the included assumption "all thing being
equal", but how often are all things equal?

Just my $.02



Fri, 04 Mar 2005 18:55:43 GMT  
 Faster faster

Quote:

> I have just become involved in a large, complex academic written
> project which
> makes extensive use of vector math. It currently implements very
> little STL or Standard C++ library.
> I have been tasked with getting this thing to perform considerable
> better.
> From my short time with the code I believe I *can* add some structure
> - add simple things like classes!!!
> They have implemented there own array, vector and array classes.  

> MSVC V6.0

> Considering the matrix math currently takes the form of:

> Matrix_Multiply( double* a , int arow , int acol , double* c , int
> brow , int bcol, double* result );

> I *think* I increase performance by.

> class Cmatrix : public std::valarray<double>
> {
> public:
>   size_t m_row;
>   size_t m_col;
> }

> And then implement:

> stlport (www.stlport.com) AND MTL
> (http://www.osl.iu.edu/research/mtl/)

> I am concerned that the overhead of using an encapsulated type vs a
> fundemental type will negate any potential performance gain.

> It would certainly be a 'nicer' way of doing it.  But nice is not
> always faster.

> Does anyone have any insight of STLPort within MSVC
> How would using MTL compare to a fundemental (double) operation.

In response to the other reply.

Identify, Quantify and Eliminate.

My Cbenchmark class has been used entensively.
It is a very high precision timer.

I have already been through the profiling steps and change the areas
which were giving me most concern.  I am now at the point where a
sweeping change in theory *may* an dramitic effect - which was the
point of the original unclear post.  I was wondering why experiences
people have had with matrix math of longs vs double, STP Port very STL
within MSVC and MTL.

Thanks

--
M.



Fri, 04 Mar 2005 21:25:15 GMT  
 Faster faster

Quote:

> I have just become involved in a large, complex academic written
> project which
> makes extensive use of vector math. It currently implements very
> little STL or Standard C++ library.
> I have been tasked with getting this thing to perform considerable
> better.
> From my short time with the code I believe I *can* add some structure
> - add simple things like classes!!!
> They have implemented there own array, vector and array classes.

> MSVC V6.0

> Considering the matrix math currently takes the form of:

> Matrix_Multiply( double* a , int arow , int acol , double* c , int
> brow , int bcol, double* result );

> I *think* I increase performance by.

> class Cmatrix : public std::valarray<double>
> {
> public:
>   size_t m_row;
>   size_t m_col;
> }

It very likely depends on the implementation of Matrix_Multiply.  I
would not trust myself to try to beat something like Lapack, which
has had years of experience in optimization applied to it (the one
time I benchmarked my own code against Lapack I was 15% slower).
So if it's performance you want, I think you should look at linking
against a good math library.

If it would improve readability and maintenance of your project, it
may also make sense to encapsulate your matrix in a class.  Aside
from the usual warnings about inheriting from standard containers,
I'm not sure that you gain anything in this case.  Probably the most
useful thing you can do is provide an interface that lets you
access the underlying data storage for use with a C-based API.

Please note that I have no experience whatsoever with C/C++ - based
math libraries.



Fri, 04 Mar 2005 21:46:19 GMT  
 Faster faster

Quote:

> Libraries like MTL and Blitz++
> are generally faster than hand writing the code yourself
> because they generate code that eliminates many of the temporaries
> than a hand coded version would contain for brevity.
> e.g.

>     A = NM + B

> A C API might do it as:

>     temp = MatrixMultiply(n, m, columns, rows);
>     a = MatrixAdd(temp, b, columns, rows);

> whereas, blitz (using expression templates) would make it:

>     a = MatrixMultiplyAndAdd(n, m, b, ...);

> It also does this for much more complex expressions.

This is called "loop fusion" probably because of the presumption
that the matrix-matrix multiply and the matrix-matrix add
will be performed by some kind of [nested] loop construct.

Quote:
> So whether Blitz++/MTL performs better than the current code
> depends on just how well optimized the current code is
> but chances are they will be faster.

No.
A C (or fortran) API will usually specify additional operators
with three or more operands that fuse the loops together.
For example, daxpy and dgemm from the BLAS library.
The C++ computer programming language simply allows you
to define expression classes so that you can write

    A = matmul(N, M) + B;

and the C++ compiler automatically substitutes

    MatrixMatrixMultiplyAdd(A, N, M, B);

This doesn't require any templates but the library developer
is obliged to hand code all of the useful fused operators.

Expression class templates are simply a way to generate
the fused operators automatically at compile time.
They don't really help speed up the computation.
They are probably overkill as there is only a very short list
of truly useful fused operators and numerical libraries
have always provided them.

What this all boils down to is that expression classes and
expression class templates really just provide a kind of "syntactic
sugar".
But this syntactic sugar is important because it allows
numerical application programmers to write code
that is much easier to read, understand and maintain.



Sat, 05 Mar 2005 01:41:47 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. faster,faster

2. faster strcat (and/or faster string append in C++)

3. A Faster Language? (Was Re: Prolog is "faster" than C)

4. Fast List Control (Flicker Free and High Performance)

5. Fastest Conversion Method?

6. Faster way to get color info from richtextbox ?

7. Cheapest and fastest data access??

8. Timers getting faster, problem with forms?

9. Q: Fastest method to search a collection

10. How fast is VC++ in DotNet?

11. Q: Fastest way to look for matching items in two arrays

12. Dev Environment Runs Floats Faster?

 

 
Powered by phpBB® Forum Software