Why is my f90 program slower than its f77 version? 
Author Message
 Why is my f90 program slower than its f77 version?

Hello, everyone,

Weeks ago I decided to transform my f77 programs into f90 programs.
Unfortunately I found my f90 program runs much slower than previous
f77 version for a same problem space. I enclose here a simplied copy of
my program below. Could someone help me clarify what might be the reason
behind that? I used common block for data sharing before.

Thanks a lot!

Jie Xu

Encl:
---------------
module moducom
 real, dimension(:,:), allocatable:: a,b
 integer:: m,n
end module moducom

program main
 use moducom
 integer:: i,j
   m=10; n=10
   allocate (a(m,n),b(m,n))
   call update_A
   call update_B
   do i=1,m
    do j=1,n
    print *,a(i,j),b(i,j)
    end do
   end do
end program main

subroutine update_A
 use moducom
 integer:: i,j
   do i=1,m
    do j=1,n
    a(i,j)=0.5**i+0.3**j
    end do
   end do
end subroutine update_A

subroutine update_B
 use moducom
 integer:: i,j
   do i=1,m
    do j=1,n
    b(i,j)=0.5**i-0.43*j
    end do
   end do
end subroutine update_B



Fri, 22 Aug 1997 02:05:38 GMT  
 Why is my f90 program slower than its f77 version?

Quote:

>Hello, everyone,

>Weeks ago I decided to transform my f77 programs into f90 programs.
>Unfortunately I found my f90 program runs much slower than previous
>f77 version for a same problem space. I enclose here a simplied copy of
>my program below. Could someone help me clarify what might be the reason
>behind that? I used common block for data sharing before.

>[...most of sample code snipped...]
>   do i=1,m
>    do j=1,n
>    b(i,j)=0.5**i-0.43*j
>    end do
>   end do

The only possibility I see is that all the nested loops in this code have the outermost index varying most rapidly.  With the small size of the arrays in this sample (m & n both 10) this shouldn't make much difference but on large arrays it can.  Your f77 compiler may be sophisticated enough to interchange the loops, whereas your f90 compiler may not (yet) be smart enough to do this sort of optimization (or perhaps you need to specify some optimization switch when compiling).  Have you tried comparing the

speed of your f77 and f90 codes when each are compiled by your f90 compiler?

--

Stanford Linear Accelerator Center       | reflect those of SLAC,
MS 97; P.O. Box 4349; Stanford, CA 94309 | Stanford or the DOE



Tue, 26 Aug 1997 03:25:30 GMT  
 Why is my f90 program slower than its f77 version?

|>Hello, everyone,
|>
|>Weeks ago I decided to transform my f77 programs into f90 programs.
|>Unfortunately I found my f90 program runs much slower than previous

1) As far as I can see there is no reason that this program should run
   any faster than the f77 version. Except from memory allocation you
   do not seem to be using any f90 specialities.

2) Which f90 do you use? I'd say it is a fairly good chance you run the
   NAG implementation. This is (probably) a f90-to-c translator, which
   is not suppposed to produce good code, at least the one I have tried
   on my Ultrix box (I've never tried a "real" f90).

 +------------ 950102 (jaf): Linuxers do it with pleasure -------------+

 | Research Engineer, SINTEF Applied Thermodynamics and Fluid Dynamics |
 | Phone : +47 73 59 68 90                     Fax  : +47 73 59 35 80  |
 | Mail  : SINTEF Varme- og str?mningsl?re, 7034 Trondheim, Norway     |
 +---------------------------------------------------------------------+



Mon, 25 Aug 1997 20:08:42 GMT  
 Why is my f90 program slower than its f77 version?


Quote:

>Hello, everyone,

>Weeks ago I decided to transform my f77 programs into f90 programs.
>Unfortunately I found my f90 program runs much slower than previous
>f77 version for a same problem space. I enclose here a simplied copy of
>my program below. Could someone help me clarify what might be the reason
>behind that? I used common block for data sharing before.

>Thanks a lot!

>Jie Xu

>Encl:
>---------------
>module moducom
> real, dimension(:,:), allocatable:: a,b
> integer:: m,n
>end module moducom

>program main
> use moducom
> integer:: i,j
>   m=10; n=10
>   allocate (a(m,n),b(m,n))
>   call update_A
>   call update_B
>   do i=1,m
>    do j=1,n
>    print *,a(i,j),b(i,j)
>    end do
>   end do
>end program main

>subroutine update_A
> use moducom
> integer:: i,j
>   do i=1,m
>    do j=1,n
>    a(i,j)=0.5**i+0.3**j
>    end do
>   end do
>end subroutine update_A

>subroutine update_B
> use moducom
> integer:: i,j
>   do i=1,m
>    do j=1,n
>    b(i,j)=0.5**i-0.43*j
>    end do
>   end do
>end subroutine update_B

There are three points about this request:

1) I suspect this program spends most of its time printing. The
DO loop around the print statement should be replaced by an
implied-DO in the print statement itself.

2) All the loops are in the reverse order for guarantedd best performance.

3) You do not state which compilers and options are being used.

I also suspect you mean '**' not '*' for the last operator in update_B.

                             Mike Metcalf



Tue, 26 Aug 1997 16:09:57 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Why is my f90 program slower than its f77 version?

2. Why my f90 translation in so slow

3. So why not convert from f77 to f90?

4. F90 subroutine with F77 program?

5. Using SLATEC F77 library routine from F90 program

6. mixed language programming f90 / f77

7. Calling f90 module from f77 program?

8. f90 extensions in f77 compilers, test program

9. F90 comilation of F77 programs

10. transforming f77 to f90 programs(foresys?)

11. yet another f77 to f90 program (in perl)

12. HP F90 (was: optimization, f77 vs f90)

 

 
Powered by phpBB® Forum Software