allocatrouble arrays
Author Message
allocatrouble arrays

Hello,
How to keep a dynamically sized array of 3d (2d,4d) vectors ?
Of course, the simplest way is to declare

real,dimension(:,:),allocatable:: vec(:,:)

The drawback is, that the compiler is not informed that the leading
dimension will always be 3,
and when I naively make some calculations with these vectors, like

vec(:,i) = vec(:,j) + vec(:,k)

the compiler can not directly see that this is a 3-cycle loop and it
has to generate code for a potentially long vectors !

So far, I only came up with 4 solutions:

1. Leave it as it is and hope the compiler will cope with it well.

2. Use explicit bounds in such cases
a)          vec(1:3,i) = vec(1:3,j) + vec(1:3,k)
or
b)          vec(1:3,i) = vec(:,j) + vec(:,k)

3. Use derived type "wrappers":
type vec3
real:: v(3)
end type
vec3,allocatable:: vec(:)
vec(i)%v = vec(j)%v + vec(k)%v

4. Sacrifice dynamicity (the F77 style)
real:: vec(3,maxvec)
and use 1.

Each solutions has its pluses and minuses:
1. Is convenient, but the overhead can be _serious_

2. A good compromise. There is a little typing convenience sacrificed.
The compiler still doesn't have the full information, as he doesn't
know that 1:3 is the full extent of the leading dimension. I supppose
there is some small overhead taht remains.

3. This gives the compiler all the information about the layout of the
array - blocks of 3 real numbers stored contiguously in memory.
Relatively convenient, but the _big_ drawback is
that sections like vec%v or vec(i:j)%v(1:2) are banned - it is no
longer possible to use the array as 2-dimensional matrix.

4. OK, but I want dynamic arrays

As 1. and 4. are extremal solutions, the only reasonable left are 2.
and 3. I tend to use 2. on simple cases like this one, but switch to 3.
when there are multiple leading fixed dimensions.
What do you think? Or is there another option?

If anyone is interested, I have implemented all 4 options for an
artificial problem to do some performance measures:
www.highegg.matfyz.cz/centers.tgz
On my old Celeron with Intel fortran (-O3 -xK -i-static), I got
Version 1: elapsed time:    6.242 seconds.
Version 2: elapsed time:    5.246 seconds.
Version 3: elapsed time:    5.928 seconds.
Version 4: elapsed time:    3.261 seconds.
on our school cluster, which is AMD and thus the Intel is unable to
vectorize, I get
(-O3 -i-static)
Version 1: elapsed time:    2.272 seconds.
Version 2: elapsed time:    0.808 seconds.
Version 3: elapsed time:    0.740 seconds.
Version 4: elapsed time:    0.764 seconds.

I would be grateful if someone could post in results on other machines.
Surely, It would be best if we were allowed to specify fixed dimensions
in deferred shape. I think this is still not possible in F2003. Can I
hope for F2008 or some TR?

Jaroslav Hajek
Faculty of Mathematics and Physics, Charles University, Prague
Aeronautical Research and Test Institute, Prague

Wed, 26 Nov 2008 02:09:00 GMT
allocatrouble arrays
I'm interested in the results, but getting :

Subscript 1 of NLST (value 4) is out of range (1:3)
Program terminated by fatal error
In CENTERS, line 21 of centers1.f90
Aborted

You might need to check the testcases again.

Cheers,

Joost

Wed, 26 Nov 2008 02:55:48 GMT
allocatrouble arrays

Quote:
> Subscript 1 of NLST (value 4) is out of range (1:3)
> Program terminated by fatal error
> In CENTERS, line 21 of centers1.f90
> Aborted

Fixed now, that's what happens if one uses numbers and _then_ switches
to constants :-)

It doesn't seem to have affected my timings, except case 2 on my old
machine got worse (but strange things happen here anyway :-)
Jaroslav

Wed, 26 Nov 2008 03:09:48 GMT
allocatrouble arrays
(snip)

Quote:
> The drawback is, that the compiler is not informed that the leading
> dimension will always be 3,
> and when I naively make some calculations with these vectors, like
> vec(:,i) = vec(:,j) + vec(:,k)
> the compiler can not directly see that this is a 3-cycle loop and it
> has to generate code for a potentially long vectors !

I think this was suggested earlier for F2008.

Quote:
> So far, I only came up with 4 solutions:

> 1. Leave it as it is and hope the compiler will cope with it well.

Branch prediction should predict the branch taken, so it shouldn't

Quote:
> 2. Use explicit bounds in such cases
> a)          vec(1:3,i) = vec(1:3,j) + vec(1:3,k)
> or
> b)          vec(1:3,i) = vec(:,j) + vec(:,k)
> 3. Use derived type "wrappers":
> type vec3
>   real:: v(3)
> end type
> vec3,allocatable:: vec(:)
> vec(i)%v = vec(j)%v + vec(k)%v

I suppose I like this one.  I believe Fortran doesn't (yet) allow
structure expressions, otherwise you could make the elements x, y, and
z.

(For comparison purposes only, PL/I allows structure expressions,
similar to array expressions, where operations are applied to all
elements of a structure.)

Quote:
> 4. Sacrifice dynamicity (the F77 style)
> real:: vec(3,maxvec)
> and use 1.

(snip of comparisons of the different methods)

If there were benchmarks where it mattered, compilers would probably
figure out how to do it.  For arrays not passed out to any subprograms,
the compiler could notice only constants in the ALLOCATE statements.

I would vote for adding structure expressions to Fortran, though.

-- glen

Wed, 26 Nov 2008 03:41:06 GMT
allocatrouble arrays

Quote:

> Branch prediction should predict the branch taken, so it shouldn't

Thanks for pointing me to branch prediction. I suppose that's why on
Intel the relative overhead was small compared to the AMD. Wow, I
didn't have a clue my old processor could do such things... I have to
value it more.

Quote:
> If there were benchmarks where it mattered, compilers would probably
> figure out how to do it.  For arrays not passed out to any subprograms,
> the compiler could notice only constants in the ALLOCATE statements.

I think that would be pretty tough. While I'm no expert in
optimization, I have a strong
feeling that specifying information known at compile time could only do
good.

Quote:
> I would vote for adding structure expressions to Fortran, though.

Me too. But I know I'm a bit too eager about new features like this,
presumably because I have not yet enough experience with programming to
If I would be the key designer of Fortran, it would soon end
overfeatured and overcomplex. And removing any feature from
standardized language is painful.
So I must admit that J3 is doing a good job being a bit conservative
(and the progress is still remarkable).
After all, I've heard that performance only matters in 10% of code, and
I can always
code these 10% in separate subroutines and pass the array as explicit
shape argument.

Regards,
Jaroslav Hajek

Wed, 26 Nov 2008 23:14:19 GMT
allocatrouble arrays

Quote:

>>3. Use derived type "wrappers":
>>type vec3
>>  real:: v(3)
>>end type
>>vec3,allocatable:: vec(:)
>>vec(i)%v = vec(j)%v + vec(k)%v

> I suppose I like this one.  I believe Fortran doesn't (yet) allow
> structure expressions, otherwise you could make the elements x, y, and
> z.

That's how I've done mine; the code to define an addition operator for
vector types is almost trivial:

interface operator(+)
end interface
type(vec3), intent in :: v1, v2
v3%x = v1%x + v2%x
v3%y = v1%y + v2%y
v3%z = v1%z + v2%z
end function

Admittedly, this does raise the possibility of having an extra function
dispatch involved in calculating the addition, which would make this
method slower unless the compiler is smart enough to inline the function.

- Brooks

--
The "bmoses-nospam" address is valid; no unmunging needed.

Thu, 27 Nov 2008 02:27:27 GMT

 Page 1 of 1 [ 6 post ]

Relevant Pages