When Is EOF not EOF? 
Author Message
 When Is EOF not EOF?

I am using the familiar construct (lifted right out of Kernighan &
Pike):

    int c;
    FILE *fp;
    while ((c = getc(fp)) != EOF) {
        if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c ==
' '))
            putchar(c);
        else
            printf ("\\%03o", c);
    }

to make binary files readable on screen.  It works fine for many files.
However, when I try to use it with any Microsoft Word97 documents, it
craps out after returning the first 6 bytes.  I get:

    \320 \317 \021 \340 \241 \261

for any Word file that I pass through the program.  According to my hex
editor, the value of the 7th byte is "1A".  However, while running the
program with the de{*filter*} I get "-1" (EOF) as the return value for this
byte and the loop is exited.

Could someone please explain this behavior to me and possibly offer a
way to correct this?  I am using Visual C++, version 4.0 with Windows
95.  Thank you.



Mon, 21 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:

>    int c;
>    FILE *fp;

You don't show how you open the file.  I suspect you did not open it
in binary mode.

Quote:
>    while ((c = getc(fp)) != EOF) {

[ snip ]

Quote:
>editor, the value of the 7th byte is "1A".  However, while running the
>program with the de{*filter*} I get "-1" (EOF) as the return value for this
>byte and the loop is exited.

Some DOS implementations of Standard C stdio routines will generate an
EOF upon encountering a ^Z when reading a text stream.

[ snip ]

--

http://www.*-*-*.com/ ~jxh/        Washington University in Saint Louis

Quote:
>>>>>>>>>>>>> I use *SpamBeGone* <URL: http://www.*-*-*.com/ ;



Mon, 21 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:

> According to my hex
> editor, the value of the 7th byte is "1A".  However, while running the
> program with the de{*filter*} I get "-1" (EOF) as the return value for this
> byte and the loop is exited.
> Could someone please explain this behavior to me and possibly offer a
> way to correct this?

0x1A, better known as Control-Z, is the end of file marker
for TEXT FILES in Windows. You reads Control-Z, the
I/O library thinks you are at EOF.

To correct this, open the file in binary mode.

Scott



Mon, 21 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

[ snip ]

Something else occurred to me.  Your original subject in your message.
Another answer to that question is that getc () will return EOF if an
error has occurred.  You should use ferror () if you care to
differentiate between EOF from encountering end of file or from an
error.

--

http://www.cs.wustl.edu/~jxh/        Washington University in Saint Louis

Quote:
>>>>>>>>>>>>> I use *SpamBeGone* <URL:http://www.internz.com/SpamBeGone/>



Mon, 21 May 2001 03:00:00 GMT  
 When Is EOF not EOF?
The byte that stopped your program is hex 1A, or Ctrl+Z. This special
character is used to mark the end of a text file.

If you want to list the contents of a binary file, you must open it in
binary mode.

FILE* fp;

fp = fopen("filename","rb");

Pau Lutus

Quote:

>I am using the familiar construct (lifted right out of Kernighan &
>Pike):

>    int c;
>    FILE *fp;
>    while ((c = getc(fp)) != EOF) {
>        if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c ==
>' '))
>            putchar(c);
>        else
>            printf ("\\%03o", c);
>    }

>to make binary files readable on screen.  It works fine for many files.
>However, when I try to use it with any Microsoft Word97 documents, it
>craps out after returning the first 6 bytes.  I get:

>    \320 \317 \021 \340 \241 \261

>for any Word file that I pass through the program.  According to my hex
>editor, the value of the 7th byte is "1A".  However, while running the
>program with the de{*filter*} I get "-1" (EOF) as the return value for this
>byte and the loop is exited.

>Could someone please explain this behavior to me and possibly offer a
>way to correct this?  I am using Visual C++, version 4.0 with Windows
>95.  Thank you.



Mon, 21 May 2001 03:00:00 GMT  
 When Is EOF not EOF?


Quote:
>I am using the familiar construct (lifted right out of Kernighan &
>Pike):

>    int c;
>    FILE *fp;
>    while ((c = getc(fp)) != EOF) {
>        if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c ==
>' '))

The C language doesn't define an isascii() function (and I suspect you
can just leave it out in this case). Are you saying that Kernighan & Pike
really suggest using this? Slapped wrists if they do.

Quote:
>            putchar(c);
>        else
>            printf ("\\%03o", c);
>    }

>to make binary files readable on screen.  It works fine for many files.
>However, when I try to use it with any Microsoft Word97 documents, it
>craps out after returning the first 6 bytes.  I get:

>    \320 \317 \021 \340 \241 \261

>for any Word file that I pass through the program.  According to my hex
>editor, the value of the 7th byte is "1A".  However, while running the
>program with the de{*filter*} I get "-1" (EOF) as the return value for this
>byte and the loop is exited.

As others have indicated you need to open the file in binary mode (e.g.
use file mode "rb" instead of just "r").

--
-----------------------------------------


-----------------------------------------



Tue, 22 May 2001 03:00:00 GMT  
 When Is EOF not EOF?


: >I am using the familiar construct (lifted right out of Kernighan &
: >Pike):
: >
: >    int c;
: >    FILE *fp;
: >    while ((c = getc(fp)) != EOF) {
: >        if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c ==
: >' '))

: The C language doesn't define an isascii() function (and I suspect you
: can just leave it out in this case). Are you saying that Kernighan & Pike
: really suggest using this? Slapped wrists if they do.

isascii() is now part of Xopen; K&P certainly list it as part of ctype.h,
but my copy of the book is dated 1984.  It could well have been part of
K&R C at that point, tho' it's not in the index of the first edition.

Will



Tue, 22 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:



> : >I am using the familiar construct (lifted right out of Kernighan &
> : >Pike):
> : >
> : >    int c;
> : >    FILE *fp;
> : >    while ((c = getc(fp)) != EOF) {
> : >        if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c ==
> : >' '))

> : The C language doesn't define an isascii() function (and I suspect you
> : can just leave it out in this case). Are you saying that Kernighan & Pike
> : really suggest using this? Slapped wrists if they do.

> isascii() is now part of Xopen; K&P certainly list it as part of ctype.h,
> but my copy of the book is dated 1984.  It could well have been part of
> K&R C at that point, tho' it's not in the index of the first edition.

> Will


<Jack>

In the good (or bad) old pre-standard days, the ctype.h functions were quite a
bit different.  In general, they only covered the range of ASCII characters (0
through 127).  If you used an is... macro on a value greater than 127, it
accessed a value past the end of the flag array, so the result could be rather
random.

The rest of the is... functions were only guaranteed to be accurate for values
fitting in a char for which isascii() was non-zero.  I remember a few unsigned
implementations where it was implemented as !((ch)&0x80).  I don't remember
the macro any particular signed implementation used, could have been the same
or ((ch)>=0).  These were for 8 bit chars, of course, other architectures
would have had somewhat different architectures.

Also in those days the toupper() and tolower() macros were unconditional.
Typically (ASCII character set), they just used bitwise operations to set or
clear bit 6 of the value without regard to whether the value was alpha and of
the opposite case.

Now the standard requires all of the is... functions and/or macros to generate
correct results for every integer value between 0 and UCHAR_MAX, and for EOF
as well.  This is particularly necessary because even in a 8 bit character set
other languages use values greater than 127 to represent characters in their
language which are not part of the ASCII set but can be alpha, punctuation,
etc., in their languages.

And toupper() and tolower() are required not to modify values which aren't
alpha and of the respective opposite case.

Of course it's trivial to implement one's own isascii() function (or macro).

</Jack>



Tue, 22 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:

> The byte that stopped your program is hex 1A, or Ctrl+Z. This special
> character is used to mark the end of a text file.

Please do mention this this behaviour is specific to the MS-DOS
operating system (and maybe some others, but most certainly not all).

Quote:
> If you want to list the contents of a binary file, you must open it in
> binary mode.

> FILE* fp;

> fp = fopen("filename","rb");

... because the absence of the little "b" make "fopen()" default to
text mode.

Stephan
(initiator of the campaign against grumpiness in c.l.c)



Tue, 22 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:


> > According to my hex
> > editor, the value of the 7th byte is "1A".  However, while running the
> > program with the de{*filter*} I get "-1" (EOF) as the return value for this
> > byte and the loop is exited.

> > Could someone please explain this behavior to me and possibly offer a
> > way to correct this?

> 0x1A, better known as Control-Z, is the end of file marker
> for TEXT FILES in Windows. You reads Control-Z, the
> I/O library thinks you are at EOF.

> To correct this, open the file in binary mode.

No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible
runtime library makes this behavior optional, and usually it's off by
default.

--
Pete Becker
Dinkumware, Ltd.
http://www.*-*-*.com/



Tue, 22 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:



>> > According to my hex
>> > editor, the value of the 7th byte is "1A".  However, while running the
>> > program with the de{*filter*} I get "-1" (EOF) as the return value for this
>> > byte and the loop is exited.

>> > Could someone please explain this behavior to me and possibly offer a
>> > way to correct this?

>> 0x1A, better known as Control-Z, is the end of file marker
>> for TEXT FILES in Windows. You reads Control-Z, the
>> I/O library thinks you are at EOF.

>> To correct this, open the file in binary mode.

>No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible
>runtime library makes this behavior optional, and usually it's off by
>default.

Opening in binary mode is the standard, portable solution to the problem.
The original post said that binary data was being read so it will be needed
anyway.

--
-----------------------------------------


-----------------------------------------



Wed, 23 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:


> >No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible
> >runtime library makes this behavior optional, and usually it's off by
> >default.

> Opening in binary mode is the standard, portable solution to the problem.
> The original post said that binary data was being read so it will be needed
> anyway.

No, it's not portable. Since the original file was not written out under
the same implementation, there are no guarantees of what data you will
get when you try to read the file.

--
Pete Becker
Dinkumware, Ltd.
http://www.dinkumware.com



Wed, 23 May 2001 03:00:00 GMT  
 When Is EOF not EOF?

Quote:



> writes:

>> >No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible
>> >runtime library makes this behavior optional, and usually it's off by
>> >default.

>> Opening in binary mode is the standard, portable solution to the problem.
>> The original post said that binary data was being read so it will be needed
>> anyway.

>No, it's not portable. Since the original file was not written out under
>the same implementation, there are no guarantees of what data you will
>get when you try to read the file.

It is a portable way of eliminating any in-band data dependencies for the
stream.

--
-----------------------------------------


-----------------------------------------



Fri, 25 May 2001 03:00:00 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. EOF or not EOF

2. Find eof without using EOF

3. EOF compiler question/or not?

4. CRecordset::MoveNext exception when not at EOF

5. EOF not working

6. EOF ? (feof does not work)

7. Detecting EOF

8. Unexpected behavior reading EOF in midstream

9. STDIN and EOF

10. EOF problem -- HELP!!

11. How do I read past EOF

12. TRUNCATE FILE: EOF?

 

 
Powered by phpBB® Forum Software