IVF 10: IO speed slower with full optimizations than with a debug build 
Author Message
 IVF 10: IO speed slower with full optimizations than with a debug build

Thought I'd ask here first (before going to Intel) to see if anybody
has seen this odd behavior  with IVF 10.0.  I create a fully optimized
verision of my application with:

/QxN /O3 /Qipo /Qprec-div- /Qprec-sqrt- /fp:fast=2 /Qparallel /
Qcomplex-limited-range /Qopenmp

I also create a separate debug version with:

/traceback /QxW /CU /Qtrapuv /CB /Od

As expected most operations in the application run significantly
faster with the optimized version.  However, one operation dominated
by READing a large file is actually 2x SLOWER on the the optimized
version than on the debug version.

Anybody got any ideas what could be going on here?  One thing, I've
always noticed the IVF's file IO is much slower than Lahey's LF95 (IVF
is faster in everything else).

Al Greynolds
www.ruda.com



Tue, 09 Mar 2010 01:48:20 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build


Quote:
> As expected most operations in the application run significantly
> faster with the optimized version.  However, one operation dominated
> by READing a large file is actually 2x SLOWER on the the optimized
> version than on the debug version.

        I think you'd need to show us the code before we could begin to make
even educated guesses.  There are _many_ ways to read a file!

--
Ivan Reid, School of Engineering & Design, _____________  CMS Collaboration,

        KotPT -- "for stupidity above and beyond the call of duty".



Tue, 09 Mar 2010 04:13:14 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:
> /traceback /QxW /CU /Qtrapuv /CB /Od

> As expected most operations in the application run significantly
> faster with the optimized version.  However, one operation dominated
> by READing a large file is actually 2x SLOWER on the the optimized
> version than on the debug version.

Really the best thing to do is to contact support and supply an
example of your program.  As a side comment, you may as well remove /
Qtrapuv as it doesn't do anything useful.

Steve



Tue, 09 Mar 2010 04:37:57 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:
> Thought I'd ask here first (before going to Intel) to see if anybody
> has seen this odd behavior  with IVF 10.0.  I create a fully optimized
> verision of my application with:

> /QxN /O3 /Qipo /Qprec-div- /Qprec-sqrt- /fp:fast=2 /Qparallel /
> Qcomplex-limited-range /Qopenmp

> I also create a separate debug version with:

> /traceback /QxW /CU /Qtrapuv /CB /Od

> As expected most operations in the application run significantly
> faster with the optimized version.  However, one operation dominated
> by READing a large file is actually 2x SLOWER on the the optimized
> version than on the debug version.

> Anybody got any ideas what could be going on here?  One thing, I've
> always noticed the IVF's file IO is much slower than Lahey's LF95 (IVF
> is faster in everything else).

> Al Greynoldswww.ruda.com

Here's a test program and timings:

program iospeed
 character(1836) line !some files can have lines this long
  call getarg(1,line)
  open(1,file=trim(line),action='read',status='old')
  m=0; n=0
  do
!    read(1,'(q,a)',iostat=i) l,line(:l); if (i/=0) exit !non-standard
version
    read(1,'(a)',iostat=i) line; if (i/=0) exit; l=len_trim(line) !
standard
    m=max(l,m); n=n+1
  enddo
  close(1)
  write(*,*) n,m
end

Elapsed times in seconds on a 150MB file (3 million lines up 73
characters long)

                Optimimzed  Debug
Non-standard       10.9       3.1
Standard           23.2      14.9



Tue, 09 Mar 2010 05:54:09 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:
> Elapsed times in seconds on a 150MB file (3 million lines up 73
> characters long)

>                 Optimimzed  Debug
> Non-standard       10.9       3.1
> Standard           23.2      14.9

Interesting.  You could simply use the variable line instead of
line(:l)   It won't change the meaning.   I'll play with this and see
what I can find.

Steve



Tue, 09 Mar 2010 07:15:50 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> > Elapsed times in seconds on a 150MB file (3 million lines up 73
> > characters long)

> >                 Optimimzed  Debug
> > Non-standard       10.9       3.1
> > Standard           23.2      14.9

> Interesting.  You could simply use the variable line instead of
> line(:l)   It won't change the meaning.   I'll play with this and see
> what I can find.

> Steve

Actually, replacing "line(:l)" with just "line" produces a significant
slowdown (I suspect due to having to initialize line(l+1:1836)
especially when l<<1836)

Al



Tue, 09 Mar 2010 08:40:24 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build
Quote:



>>> Elapsed times in seconds on a 150MB file (3 million lines up 73
>>> characters long)
>>>                 Optimimzed  Debug
>>> Non-standard       10.9       3.1
>>> Standard           23.2      14.9
>> Interesting.  You could simply use the variable line instead of
>> line(:l)   It won't change the meaning.   I'll play with this and see
>> what I can find.

>> Steve

> Actually, replacing "line(:l)" with just "line" produces a significant
> slowdown (I suspect due to having to initialize line(l+1:1836)
> especially when l<<1836)

> Al

You forgot the smiley.....:)


Tue, 09 Mar 2010 11:32:52 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:
> Actually, replacing "line(:l)" with just "line" produces a significant
> slowdown (I suspect due to having to initialize line(l+1:1836)
> especially when l<<1836)

Yes, you are right regarding my suggested change.  However, so far I
have been unable to reproduce the behavior you are describing.  Please
do file a report with Intel Premier Support and we will be glad to
look at it in more detail.

Steve



Wed, 10 Mar 2010 04:58:37 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> > Actually, replacing "line(:l)" with just "line" produces a significant
> > slowdown (I suspect due to having to initialize line(l+1:1836)
> > especially when l<<1836)

> Yes, you are right regarding my suggested change.  However, so far I
> have been unable to reproduce the behavior you are describing.  Please
> do file a report with Intel Premier Support and we will be glad to
> look at it in more detail.

> Steve


To be sure I ran the cases on both the original Dell Pentium-4 Xeon
workstation and an Apple Core 2 Duo laptop (both running XP-Pro).  I
got the same odd behavior.


I tracked it down to the /Qparallel option on the optimized version.
If I remove it, the optimized version is now slightly faster than the
debug, as expected.  Do you link in a different set of runtimes with
the /Qparallel option?

Al



Wed, 10 Mar 2010 06:56:36 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:


> > > Actually, replacing "line(:l)" with just "line" produces a significant
> > > slowdown (I suspect due to having to initialize line(l+1:1836)
> > > especially when l<<1836)

> > Yes, you are right regarding my suggested change.  However, so far I
> > have been unable to reproduce the behavior you are describing.  Please
> > do file a report with Intel Premier Support and we will be glad to
> > look at it in more detail.

> > Steve


> To be sure I ran the cases on both the original Dell Pentium-4 Xeon
> workstation and an Apple Core 2 Duo laptop (both running XP-Pro).  I
> got the same odd behavior.


> I tracked it down to the /Qparallel option on the optimized version.
> If I remove it, the optimized version is now slightly faster than the
> debug, as expected.  Do you link in a different set of runtimes with
> the /Qparallel option?

> Al

The above also applies to the /Qopenmp option.  If I remove both the /
Qparallel and /Qopenmp options, one IO bound part of my application
runs 3 times faster, but of course another conputationally intensive
part that uses OpenMP directives now runs more than 2 times slower on
my 2 processor box.

Your multi-threaded runtime must no be as fast as your single-threaded
at this particular IO.  Is there a way to build my application so that
some parts are compiled for OpenMP while other parts aren't?

Al



Wed, 10 Mar 2010 07:18:12 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build


Quote:



>> > > Actually, replacing "line(:l)" with just "line" produces a
>> > > significant
>> > > slowdown (I suspect due to having to initialize line(l+1:1836)
>> > > especially when l<<1836)

>> > Yes, you are right regarding my suggested change.  However, so far I
>> > have been unable to reproduce the behavior you are describing.  Please
>> > do file a report with Intel Premier Support and we will be glad to
>> > look at it in more detail.

>> > Steve


>> To be sure I ran the cases on both the original Dell Pentium-4 Xeon
>> workstation and an Apple Core 2 Duo laptop (both running XP-Pro).  I
>> got the same odd behavior.


>> I tracked it down to the /Qparallel option on the optimized version.
>> If I remove it, the optimized version is now slightly faster than the
>> debug, as expected.  Do you link in a different set of runtimes with
>> the /Qparallel option?

>> Al

> The above also applies to the /Qopenmp option.  If I remove both the /
> Qparallel and /Qopenmp options, one IO bound part of my application
> runs 3 times faster, but of course another conputationally intensive
> part that uses OpenMP directives now runs more than 2 times slower on
> my 2 processor box.

> Your multi-threaded runtime must no be as fast as your single-threaded
> at this particular IO.  Is there a way to build my application so that
> some parts are compiled for OpenMP while other parts aren't?

> Al

And what does the above have to do with fortran?
--
Wade Ward
"I apparently do have time to bleed.  Unusual luxury."


Wed, 10 Mar 2010 18:40:59 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:
> Your multi-threaded runtime must no be as fast as your single-threaded
> at this particular IO.  Is there a way to build my application so that
> some parts are compiled for OpenMP while other parts aren't?

Ah, I should have asked you what you meant by "optimized".  The thread
safe libraries clearly need to protect themselves against operations
in other threads.  This involves synchronization primitives that do
take extra time.  I suppose one option is to make sure that all your I/
O is done in a single thread and link against the non-thread safe
libraries.

Please do submit a report to Intel Premier support and provide all of
the details that you have listed here.  I know there has been some
recent work done on optimizing threaded libraries which has not yet
been released.  I don't know whether your program would show any
improvement with this work, so please do submit your test case.

Out of curiosity: you said that I/O using Lahey was faster.  Was this
also with parallel?

Steve



Wed, 10 Mar 2010 20:39:55 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> > Your multi-threaded runtime must no be as fast as your single-threaded
> > at this particular IO.  Is there a way to build my application so that
> > some parts are compiled for OpenMP while other parts aren't?

> Ah, I should have asked you what you meant by "optimized".  The thread
> safe libraries clearly need to protect themselves against operations
> in other threads.  This involves synchronization primitives that do
> take extra time.  I suppose one option is to make sure that all your I/
> O is done in a single thread and link against the non-thread safe
> libraries.

> Please do submit a report to Intel Premier support and provide all of
> the details that you have listed here.  I know there has been some
> recent work done on optimizing threaded libraries which has not yet
> been released.  I don't know whether your program would show any
> improvement with this work, so please do submit your test case.

> Out of curiosity: you said that I/O using Lahey was faster.  Was this
> also with parallel?

> Steve

Will do on the Intel Premier subsmission.  One last point, the exact
same application processing the exact same large file but on a Dual
PowerMac G5 using the IBM XLF 8.1 compiler sees vitually no speed
difference in the IO part whether compiled with or without OpenMP
(time is comparable to non-OpenmMP IVF executable on Dell dual
Pentium-4 Xeon workstation).  So its definitely possible to write
"fast" multi-threaded IO libraries.

Al

Al



Wed, 10 Mar 2010 21:13:38 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:
> Will do on the Intel Premier subsmission.  One last point, the exact
> same application processing the exact same large file but on a Dual
> PowerMac G5 using the IBM XLF 8.1 compiler sees vitually no speed
> difference in the IO part whether compiled with or without OpenMP
> (time is comparable to non-OpenmMP IVF executable on Dell dual
> Pentium-4 Xeon workstation).  So its definitely possible to write
> "fast" multi-threaded IO libraries.

But do you know if the IBM compiler actually has thread safe I/O
libraries?  Just because OpenMP is supported that doesn't mean that
their I/O library protects against I/O from multiple threads.

Steve



Wed, 10 Mar 2010 21:47:55 GMT  
 IVF 10: IO speed slower with full optimizations than with a debug build

Quote:

> > Will do on the Intel Premier subsmission.  One last point, the exact
> > same application processing the exact same large file but on a Dual
> > PowerMac G5 using the IBM XLF 8.1 compiler sees vitually no speed
> > difference in the IO part whether compiled with or without OpenMP
> > (time is comparable to non-OpenmMP IVF executable on Dell dual
> > Pentium-4 Xeon workstation).  So its definitely possible to write
> > "fast" multi-threaded IO libraries.

> But do you know if the IBM compiler actually has thread safe I/O
> libraries?  Just because OpenMP is supported that doesn't mean that
> their I/O library protects against I/O from multiple threads.

> Steve

I use xlf95_r for OpenMP compiling which according to the IBM docs
uses the "thread-safe" libraries.


Wed, 10 Mar 2010 22:09:12 GMT  
 
 [ 16 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Speed of code compiled with IVF 10.0 ?

2. IVF 10 Bug: Wrong branch returned by SQRT for COMPLEX*16 argument

3. gnat3.10 optimization O2 on win95 - problem

4. CW2000-PUT() under Win95 10 times slower than under Win3.11

5. 10 channel high speed data loggin in millisecs with timed axis

6. Speed Tests and Ideas for Using VSE3.10 SystemWeakRegistry

7. Can't debug code resources with CW8.3, OS X 10.2.2

8. debugging f90 with xdb on HPUX B.10.20

9. IVF 10.0 Win/VS 2005 Profiling

10. Issue from IVF 9.0 to IVF 10.1

11. Carbon Builds on PPC w/o 10.x

12. RB apps built under 10.1.5 crash...

 

 
Powered by phpBB® Forum Software