Using derived types in reading binary data... 
Author Message
 Using derived types in reading binary data...

Hey there,

This may not be a hoary chestnut of the calibre of "how do I write numbers to a character
string" but it may be close.

Lets say I have a binary file (direct or sequential, doesn't matter apart from the OPEN
statement, but I'll use direct in this case) that contains records which consist of a
variety of different data types, such as byte, single precision float, long integer,
double precision float, and short integer. (The sizes in bytes are the usual 1, 4, 4, 8,
2).

To read this data I declare a data structure like so:

  INTEGER, PARAMETER :: Byte  = SELECTED_INT_KIND(1)
  INTEGER, PARAMETER :: Short = SELECTED_INT_KIND(4)
  INTEGER, PARAMETER :: Long  = SELECTED_INT_KIND(8)
  INTEGER, PARAMETER :: Single = SELECTED_REAL_KIND(6)
  INTEGER, PARAMETER :: Double = SELECTED_REAL_KIND(15)

  INTEGER, PARAMETER :: recl_in_bytes = 19

  TYPE :: data_structure
    INTEGER( Byte )  :: a
    REAL( Single )   :: b
    INTEGER( Long )  :: c
    REAL( Double )   :: d
    INTEGER( Short ) :: e
  END TYPE data_structure

  TYPE( data_structure ) :: my_data

Now, when I read the data if I do the following:

  ! -- Direct access
  OPEN( 10, FILE   = 'testdir.bin', &
            STATUS = 'OLD', &
            FORM   = 'UNFORMATTED', &
            ACCESS = 'DIRECT', &
            RECL   = recl_in_bytes, &
            IOSTAT = io_status )

  READ( 10, REC    = 1, &
            IOSTAT = io_status ) my_data%a,my_data%b,my_data%c,my_data%d,my_data%e

This works fine all the time, different platforms, compilers etc.

But whe I do the following:

  READ( 10, REC    = 1, &
            IOSTAT = io_status ) my_data

the data may or may not be o.k. depending on what platform/compiler (and regardless of
whether a SEQUENCE statement is used in the data structure definition). On an SGI, IBM,
linux with pgf90 the latter straight data structure read works fine. On a Sun, the read
works but the structure elements are full of garbage - which does make sense when you take
the alignment into account and shift all the bits in the correct manner. Thus, it is
obvious that the data structure is being aligned on certain byte boundaries for efficiency
or what not, but why does that{*filter*}up the latter data read? It appears the
representation of the structure in memory (alighned on certain byte boundaries) is used as
a template for reading the data even though the template may not "fit" the file data
structure (which is not aligned on any boundaries).

Assuming that this behaviour satifies the standard (does it?) this seems like an excellent
way of rendering derived types pretty useless for simply reading complicated data records
(e.g. satellite data) in a portable manner.

The above example is simple, but I have some code that uses ripper derived types full of
floats, shorts, longs, etc that just happen to work because people who use the code are on
an SGI or an IBM. If somebody uses a Sun machine, kablooie.

How does one get around this?

cheers,

paulv

--
Paul van Delst           A little learning is a dangerous thing;

Ph: (301)763-8000 x7274  There shallow draughts intoxicate the brain,
Fax:(301)763-8545        And drinking largely sobers us again.



Sat, 23 Aug 2003 04:36:32 GMT  
 Using derived types in reading binary data...

Quote:
>   READ( 10, REC    = 1, &
>             IOSTAT = io_status ) my_data%a,my_data%b,my_data%c,my_data%d,my_data%e
> This works fine all the time, different platforms, compilers etc.
> But whe I do the following:
>   READ( 10, REC    = 1, &
>             IOSTAT = io_status ) my_data
> the data may or may not be o.k.
...
> Assuming that this behaviour satifies the standard (does it?)...

Yes.  What you are running into is that for unformatted I/O the standard does
not require an object of derived type to be the same thing as a list of the
components of the type.

The idea here is that unformatted I/O is supposed to be fast and just do
a simple copy of the data between memory and the file.  So if the derived
type has padding or other "funniness" in it's memory layout, you'll end
up with that same funniness on the file when you write the derived type
object (and conversely, the same will be expected when you read it).
But if you do the components individually, then you won't see that
padding.  (Of course, if one of the components is itself of derived type,
then you have the same issue at that nested level).

So if you want to read it as individual components, you need to write it
that way.  Or if you want to read it as a single derived type object, you
better write it that way.  You can't guarantee that both ways will work.

--
Richard Maine                       |  Good judgement comes from experience;
email: my last name at host.domain  |  experience comes from bad judgement.
host: altair, domain: dfrc.nasa.gov |        -- Mark Twain



Sat, 23 Aug 2003 06:40:26 GMT  
 Using derived types in reading binary data...
I think most compilers have switches that allow for alternatives on
packing the structures that <might> help, but still isn't guaranteed way
to get around it...
Quote:


> >   READ( 10, REC    = 1, &
> >             IOSTAT = io_status ) my_data%a,my_data%b,my_data%c,my_data%d,my_data%e
> > This works fine all the time, different platforms, compilers etc.
> > But whe I do the following:
> >   READ( 10, REC    = 1, &
> >             IOSTAT = io_status ) my_data
> > the data may or may not be o.k.
> ...
> > Assuming that this behaviour satifies the standard (does it?)...

> Yes.  What you are running into is that for unformatted I/O the standard does
> not require an object of derived type to be the same thing as a list of the
> components of the type.

> The idea here is that unformatted I/O is supposed to be fast and just do
> a simple copy of the data between memory and the file.  So if the derived
> type has padding or other "funniness" in it's memory layout, you'll end
> up with that same funniness on the file when you write the derived type
> object (and conversely, the same will be expected when you read it).
> But if you do the components individually, then you won't see that
> padding.  (Of course, if one of the components is itself of derived type,
> then you have the same issue at that nested level).

> So if you want to read it as individual components, you need to write it
> that way.  Or if you want to read it as a single derived type object, you
> better write it that way.  You can't guarantee that both ways will work.

> --
> Richard Maine                       |  Good judgement comes from experience;
> email: my last name at host.domain  |  experience comes from bad judgement.
> host: altair, domain: dfrc.nasa.gov |        -- Mark Twain



Sat, 23 Aug 2003 06:53:01 GMT  
 Using derived types in reading binary data...

  TYPE :: data_structure
    INTEGER( Byte )  :: a
    REAL( Single )   :: b
    INTEGER( Long )  :: c
    REAL( Double )   :: d
    INTEGER( Short ) :: e
  END TYPE data_structure
==============================================================

In addition to the comments you've already had, a further one is
to note how this is reminiscent of similar difficuties of
alignment encountered in COMMONs. Is it possible for you to write
the data with the components in decreasing order of size? That
way the need to pad might be eliminated completely. That's no
more reliable than what you have now, just more likely to work on
more processors.

Regards,

Mike Metcalf

--



Sat, 23 Aug 2003 16:43:49 GMT  
 Using derived types in reading binary data...

Quote:

> So if you want to read it as individual components, you need to write it
> that way.  Or if you want to read it as a single derived type object, you
> better write it that way.  You can't guarantee that both ways will work.

To summarize: be consistent, and it will work on any one platform. Don't
expect, when writing the derived data type as a whole, to be able to do a
binary FTP to another platform and be able to read it there; if you are
writing components instead, the chance that it will work are much better.

In any case, a record/structure layout that doesn't start with the largest
units and goes to the smaller ones is somewhat broken by design - somewhat
because reality sometimes intrudes on theory.

        Jan



Sat, 23 Aug 2003 16:55:59 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Spacing in derived data types, using SEQUENCE attribute, for MPI

2. Derived types with derived types with allocatables

3. Derived-type subcomponents of derived type

4. Using DEC Fortran to read raw binary data?

5. Reading a binary file / writing binary data to a file

6. Reading Arguments from Command Line and reading binary data

7. How are derived data types stored in memory?

8. Derived data types in f95

9. Passing array of derived data types to a subroutine

10. Initilization of nested derived data types

11. Initialization of a nested derived data type

12. private pointer derived data type members

 

 
Powered by phpBB® Forum Software