Need clarification on unformatted IO 
Author Message
 Need clarification on unformatted IO

I am trying to read in a unformmated sequentail binary file generated
from someone else's code.

In his code, he has something like:

do j = 1, total
write(12) (buffer(i), i = 1, total2)
enddo

Here's my question:

When I try to read in the content of the file with the following code,
it crashes (segmentation fault):

do i = 1, total
read(12) differ_buffer(1:total2)
end do

However, if I use the following code, I can complete the read call:

do i = 1, total
read(12) (differ_buffer(j), j = 1, total2)
end do

Quote:
>From what I've read from fortran 90, don't the following code

fragements the same thing?

differ_buffer(A:B)

and

do i = A,B
differ_buffer(i)
enddo

and

differ_buffer(i), i = A,B



Fri, 18 Jul 2008 23:11:47 GMT  
 Need clarification on unformatted IO

Quote:

> I am trying to read in a unformmated sequentail binary file generated
> from someone else's code.
> In his code, he has something like:
> do j = 1, total
> write(12) (buffer(i), i = 1, total2)
> enddo
> Here's my question:
> When I try to read in the content of the file with the following code,
> it crashes (segmentation fault):
> do i = 1, total
> read(12) differ_buffer(1:total2)
> end do
> However, if I use the following code, I can complete the read call:
> do i = 1, total
> read(12) (differ_buffer(j), j = 1, total2)
> end do

I would have thought so, but my usual rule is that the READ
should look just like the WRITE, except for the obvious changes.

Quote:
>>From what I've read from Fortran 90, don't the following code
> fragements the same thing?
> differ_buffer(A:B)
> and
> do i = A,B
> differ_buffer(i)
> enddo

You mean:

DO I=A,B
READ(12) BUFFER(I)
ENDDO

no, that would be different.  It is one record per READ or
WRITE statement.

Quote:
> and
> differ_buffer(i), i = A,B

-- glen


Sat, 19 Jul 2008 11:55:10 GMT  
 Need clarification on unformatted IO

Quote:

> When I try to read in the content of the file with the following code,
> it crashes (segmentation fault):

> do i = 1, total
> read(12) differ_buffer(1:total2)
> end do

> However, if I use the following code, I can complete the read call:

> do i = 1, total
> read(12) (differ_buffer(j), j = 1, total2)
> end do

Those should be equivalent in terms of the standard if you have
accurately reported the circumstances and not left out some other
factor. It is vaguely possible, if total2 is large, that one form might
use more temporary space than the other, triggering a crash if you ran
out of resources. I'm not sure how likely I'd guess that to be, but it
is at least a possibility.

However, the "fragments" you quote below just confuse the matter and
make me wonder whether the report is entirely accurate. These are not
meaningful code fragments as you have written them. An I/o list makes no
sense standing on its own; it *MUST* be part of an I/O statement. And if
I'm reading what you are trying to say, then it is wrong. No, an implied
DO is not the same as a real DO loop. The above two
code fragments do mean the same thing, but the ones in your elaboration
below, once completed, would not.

Quote:
> From what I've read from Fortran 90, don't the following code
> fragements the same thing?

> differ_buffer(A:B)

> and

> do i = A,B
> differ_buffer(i)
> enddo

> and

> differ_buffer(i), i = A,B

--
Richard Maine                    | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle           |  -- Mark Twain


Sat, 19 Jul 2008 13:51:55 GMT  
 Need clarification on unformatted IO
Thank you for your help,

I was confused between regular vector operation and IO operations.

I wonder why Fortran put record information on binary files, I am more
familiar with C style binary file format.



Sat, 19 Jul 2008 22:19:11 GMT  
 Need clarification on unformatted IO

Quote:

> I wonder why Fortran put record information on binary files, I am more
> familiar with C style binary file format.

Fortran doesn't have "binary" files. In Fortran, they are called
"unformatted" files. Using the term "binary" for them is a C'ism, and a
bit of an inaccurate one, as they don't directly map to C "binary"
files, as you have noted. Some compilers have a nonstandard extension
referred to as "binary", which causes extra confusion when someone
refers to Fortran's unformatted files using the same term.

As to why records are useful - that's a long discusssion. I will note
that f2003 supports both record files and C-like stream files.

P.S. Then there is the quibble that base 2 has nothing to do with
so-called "binary" files. Yes, they are composed of bits in all actual
implementations, but then so are text files, so "base 2" isn't really a
useful distinction.

--
Richard Maine                     | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov            |       -- Mark Twain



Sat, 19 Jul 2008 23:59:40 GMT  
 Need clarification on unformatted IO

Quote:

>> I wonder why Fortran put record information on binary files, I am more
>> familiar with C style binary file format.
> Fortran doesn't have "binary" files. In Fortran, they are called
> "unformatted" files. Using the term "binary" for them is a C'ism, and a
> bit of an inaccurate one, as they don't directly map to C "binary"
> files, as you have noted. Some compilers have a nonstandard extension
> referred to as "binary", which causes extra confusion when someone
> refers to Fortran's unformatted files using the same term.

(snip)

There is a long history, from even before the beginning of C,
of using "binary" for non-human readable data.

The character code used on the IBM 704 is BCDIC, and characters
are called BCD characters.  That probably makes less sense than
calling non-text data binary, but it seems to be what IBM did.

With that history, and the 704 being the first machine to
run Fortran, the term makes some sense.  As you indicate,
Fortran unformatted files are a subset of "binary" files.

IBM has a long history of record oriented file systems, so
it isn't surprising that Fortran followed that path.

Quote:
> P.S. Then there is the quibble that base 2 has nothing to do with
> so-called "binary" files. Yes, they are composed of bits in all actual
> implementations, but then so are text files, so "base 2" isn't really a
> useful distinction.

Most likely the term would still be used describing non-text data
on a machine with a decimal ALU.  As I understand it, some machines
even used some form of decimal coding for addresses.

-- glen



Sun, 20 Jul 2008 01:47:38 GMT  
 Need clarification on unformatted IO
 "...unformmated sequentail binary file..." is probably not what you
think it is.  FORTRAN sequential I/O is record oriented.  Each output
record has control bytes at the beginning and end of every record.
Each write statement generates one complete record.  This is true of
both formatted and unformatted files.  The only real difference is that
formatted records contain ASCII character data while unformatted
records contain "binary" data.  In order to simulate a C-style binary
stream (no record descriptor bytes of any kind) most FORTRAN compilers
have some other non-standard file format option such as "stream",
"binary", or even direct access with LRECL=1.


Sun, 20 Jul 2008 02:57:59 GMT  
 Need clarification on unformatted IO

Quote:
>The character code used on the IBM 704 is BCDIC, and characters
>are called BCD characters.

Binary Coded Decimal, a.k.a., BCD, is just that.  Decimal that is binary encoded.  At that
time there were TRUE decimal machines, as well as true binary and those that used binary coded
decimal.   These days there are no true decimal machines (to speak of) and binary coded decimal
is the closest we get to decimal.

BCDIC which stands for BCD Interchange Code (or at least it used to) was an extension to BCD
allowing other characters than just the decimal digits to be encoded in binary in a manner
that was a consistent extension of BCD.  Note the shift in thinking, from decimal numbers
to decimal digits, to characters for decimal digits, which are then generalized.  The only
relation between BCDIC and BCD is that the characters for '0' through '9' in BCDIC have
the same binary encoding as the decimal digits 0 .. 9 had in BCD.  But BCD is a numeric
representation, whereas BCDIC is a character encoding.  

As for "binary" for file structures that are not messed with by the RTL, I think the
"cooked" and "uncooked" terminology should have become more widespread.  "Stream" is
certainly better than "binary" but still not very descriptive to my mind.



Sun, 20 Jul 2008 03:47:13 GMT  
 Need clarification on unformatted IO

Quote:

> There is a long history, from even before the beginning of C,
> of using "binary" for non-human readable data.

> The character code used on the IBM 704 is BCDIC, and characters
> are called BCD characters.

Except that

1. BCD data is the human-readable stuff (more or less). Anyway, it is
the character data, which is about as close as you can get to human
readable until you put it on ink or a display screen - and BCD is what
would be sent to that printer or display screen. BCD data would count
more as the text files instead of the non-text ones. (Yes, you can put
text data in a non-text file, but that's a special case just like
writing Fortran character data to an unformatted file.)

2. It stands for binary-coded decimal, where the binary part does indeed
mean base 2, so this doesn't seem to me like a strange different meaning
of the word binary. Perhaps someone later came along and misused the
term, based on a misunderstanding of what this one meant, but I do not
accept this as an illustration of "binary" meaning something other than
base 2.

--
Richard Maine                     | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov            |       -- Mark Twain



Sun, 20 Jul 2008 03:59:01 GMT  
 Need clarification on unformatted IO

Quote:


>>There is a long history, from even before the beginning of C,
>>of using "binary" for non-human readable data.
>>The character code used on the IBM 704 is BCDIC, and characters
>>are called BCD characters.

IBM 704 Fortran has the READ INPUT TAPE and WRITE OUTPUT TAPE
statements, similar to today's formatted I/O.

"The READ INPUT TAPE statement causes the object program to read
BCD information from tape unit i, where i=1, 2, ..., 10. Record
after record is brought in, in accordance with the FORMAT statement,
until the complete list has been placed in storage."

Note the use of BCD where TEXT might be a more appropriate description
today.

Then there are the READ TAPE and WRITE TAPE statements, similar to
today's unformatted I/O.

"The READ TAPE statement causes the object program to read binary
information from tape unit i, where i=1, 2, ..., 10. Only one record
is read, and it will be completely read only if the list contains as
many words as the record.  The tape, however, always moves all the
way to the next record."

"If the list is longer than the record, the object program will
stop with a Read-Write Check, and the program will not be able
to continue."

What is now called formatted and unformatted used to be BCD and
binary, BCD being a six bit character code, not a four bit
representation of a decimal digit.  In the section on formatted
I/O they describe: "Thus a unit record may be ... A BCD tape
record with a maximum of 120 characters."

As I understand it, the 704 could read directly from cards and
write directly to a printer, but it was common to copy cards to
tape before running a program and print from tape after.

Quote:
> Except that
> 1. BCD data is the human-readable stuff (more or less). Anyway, it is
> the character data, which is about as close as you can get to human
> readable until you put it on ink or a display screen - and BCD is what
> would be sent to that printer or display screen. BCD data would count
> more as the text files instead of the non-text ones. (Yes, you can put
> text data in a non-text file, but that's a special case just like
> writing Fortran character data to an unformatted file.)
> 2. It stands for binary-coded decimal, where the binary part does indeed
> mean base 2, so this doesn't seem to me like a strange different meaning
> of the word binary. Perhaps someone later came along and misused the
> term, based on a misunderstanding of what this one meant, but I do not
> accept this as an illustration of "binary" meaning something other than
> base 2.

Personally, I would use binary when all possible bit combinations
are allowed.  In BCD arithmetic only 0000 through 1001 are allowed.
In a unix text file, a line cannot contain X'0A' other than as
a line terminating character, so only 255 different characters
are allowed within a line.

-- glen



Sun, 20 Jul 2008 18:17:45 GMT  
 Need clarification on unformatted IO

Quote:
>BCD being a six bit character code, not a four bit
>representation of a decimal digit.

No, the 6 bit code was BCDIC, BCD is the four bit binary representation of a decimal digit.

You have found convincing evidence that people were sloppy about their terminology over 50 years ago,
proving that it is NOT a recent phenomenon.  But when it was extended to an 8 bit character code it
was labelled EBCDIC (Extended BCDIC) -- it wasn't called EBCD.  Just 'cause someone sloppily calls
"A" "B" doesn't make an "A" into a "B".  BCDIC and BCD are NOT the same, even though many people
likely called BCDIC characters by the label "BCD" -- it does NOT create an identity.  THey are
related, but they are NOT the same.



Sun, 20 Jul 2008 22:15:49 GMT  
 Need clarification on unformatted IO

Quote:

>>BCD being a six bit character code, not a four bit
>>representation of a decimal digit.
> No, the 6 bit code was BCDIC, BCD is the four bit binary representation of a decimal digit.
> You have found convincing evidence that people were sloppy about their terminology over 50 years ago,
> proving that it is NOT a recent phenomenon.  But when it was extended to an 8 bit character code it
> was labelled EBCDIC (Extended BCDIC) -- it wasn't called EBCD.  Just 'cause someone sloppily calls
> "A" "B" doesn't make an "A" into a "B".  BCDIC and BCD are NOT the same, even though many people
> likely called BCDIC characters by the label "BCD" -- it does NOT create an identity.  THey are
> related, but they are NOT the same.

Throughout the 704 Fortran manual they consistently use BCD when
referring to the character set, and never BCDIC.  Maybe the name was
changed later.

In the "TABLE OF FORTRAN CHARACTERS" they describe the coding for 48
characters, in terms of card punches, BCD tape, and 704 storage.
With 48 characters in six bits not all bit patterns are allowed, so
it doesn't follow my definition of binary.  There is no indication
of what might happen if any other codes were used.

It seems that there are two minus signs, only one of which is allowed
in Fortran source programs, the other is produced by object program
output and both are allowed as input data.

-- glen



Mon, 21 Jul 2008 16:40:07 GMT  
 Need clarification on unformatted IO

Quote:
> Throughout the 704 Fortran manual they consistently use BCD when
>  referring to the character set, and never BCDIC.  Maybe the name was
>  changed later.

It was still called BCD in the MoA for the IBM 7094 II.   All 64 bit
patterns
were available, but whether a printing character showed up for a given
bit pattern depended on the printer used.  Chain printers could take
different chains that printed different characters for a given bit
pattern.

Bob Corbett



Mon, 21 Jul 2008 17:56:17 GMT  
 Need clarification on unformatted IO

Quote:
>Throughout the 704 Fortran manual they consistently use BCD when
>referring to the character set, and never BCDIC.  Maybe the name was
>changed later.

All right, I stand corrected.  I still say it was sloppiness.  Using the same term for
two different things is so, so, so ... NATURAL LANGUAGE.  Did they think they were
writing in English???


Mon, 21 Jul 2008 21:54:27 GMT  
 Need clarification on unformatted IO



Quote:
> >Throughout the 704 Fortran manual they consistently use BCD when
> >referring to the character set, and never BCDIC.  Maybe the name was
> >changed later.

> All right, I stand corrected.  I still say it was sloppiness.  Using the
same term for
> two different things is so, so, so ... NATURAL LANGUAGE.  Did they think
they were
> writing in English???

A recent (UK) TV programme about the OED, tracing the origin of certain
words, pointed out that the little word SET had several thousand different
meanings!
eq To set (as in concrete)
the lair of the badger
a tennis set
a collection of similar objects
etc

I think this is what is known as being economic with words - why have
several thousand when one will do :-)

Les



Mon, 21 Jul 2008 23:10:13 GMT  
 
 [ 15 post ] 

 Relevant Pages 

1. Unformatted IO

2. g77/F77 Unformatted Sequential IO Question

3. xlf 3.1 unformatted tape io error

4. Perl IO to Ruby IO help needed

5. Newbie Needs syntax clarification

6. 50 LED generation need some clarification

7. [HELP MSW] Clarification needed

8. [HELP MSW] - Clarification needed

9. [HELP MSW] - Clarification needed

10. I need some clarification

11. SIGL Clarification Needed

12. SIGL Clarification Needed

 

 
Powered by phpBB® Forum Software