Author |
Message |
AJ Rizz #1 / 13
|
 When Is EOF not EOF?
I am using the familiar construct (lifted right out of Kernighan & Pike): int c; FILE *fp; while ((c = getc(fp)) != EOF) { if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c == ' ')) putchar(c); else printf ("\\%03o", c); } to make binary files readable on screen. It works fine for many files. However, when I try to use it with any Microsoft Word97 documents, it craps out after returning the first 6 bytes. I get: \320 \317 \021 \340 \241 \261 for any Word file that I pass through the program. According to my hex editor, the value of the 7th byte is "1A". However, while running the program with the de{*filter*} I get "-1" (EOF) as the return value for this byte and the loop is exited. Could someone please explain this behavior to me and possibly offer a way to correct this? I am using Visual C++, version 4.0 with Windows 95. Thank you.
|
Mon, 21 May 2001 03:00:00 GMT |
|
 |
James #2 / 13
|
 When Is EOF not EOF?
Quote:
> int c; > FILE *fp;
You don't show how you open the file. I suspect you did not open it in binary mode. Quote: > while ((c = getc(fp)) != EOF) {
[ snip ] Quote: >editor, the value of the 7th byte is "1A". However, while running the >program with the de{*filter*} I get "-1" (EOF) as the return value for this >byte and the loop is exited.
Some DOS implementations of Standard C stdio routines will generate an EOF upon encountering a ^Z when reading a text stream. [ snip ] --
http://www.*-*-*.com/ ~jxh/ Washington University in Saint Louis Quote: >>>>>>>>>>>>> I use *SpamBeGone* <URL: http://www.*-*-*.com/ ;
|
Mon, 21 May 2001 03:00:00 GMT |
|
 |
sc.. #3 / 13
|
 When Is EOF not EOF?
Quote:
> According to my hex > editor, the value of the 7th byte is "1A". However, while running the > program with the de{*filter*} I get "-1" (EOF) as the return value for this > byte and the loop is exited. > Could someone please explain this behavior to me and possibly offer a > way to correct this?
0x1A, better known as Control-Z, is the end of file marker for TEXT FILES in Windows. You reads Control-Z, the I/O library thinks you are at EOF. To correct this, open the file in binary mode. Scott
|
Mon, 21 May 2001 03:00:00 GMT |
|
 |
James #4 / 13
|
 When Is EOF not EOF?
[ snip ] Something else occurred to me. Your original subject in your message. Another answer to that question is that getc () will return EOF if an error has occurred. You should use ferror () if you care to differentiate between EOF from encountering end of file or from an error. --
http://www.cs.wustl.edu/~jxh/ Washington University in Saint Louis Quote: >>>>>>>>>>>>> I use *SpamBeGone* <URL:http://www.internz.com/SpamBeGone/>
|
Mon, 21 May 2001 03:00:00 GMT |
|
 |
Paul Lutu #5 / 13
|
 When Is EOF not EOF?
The byte that stopped your program is hex 1A, or Ctrl+Z. This special character is used to mark the end of a text file. If you want to list the contents of a binary file, you must open it in binary mode. FILE* fp; fp = fopen("filename","rb"); Pau Lutus Quote:
>I am using the familiar construct (lifted right out of Kernighan & >Pike): > int c; > FILE *fp; > while ((c = getc(fp)) != EOF) { > if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c == >' ')) > putchar(c); > else > printf ("\\%03o", c); > } >to make binary files readable on screen. It works fine for many files. >However, when I try to use it with any Microsoft Word97 documents, it >craps out after returning the first 6 bytes. I get: > \320 \317 \021 \340 \241 \261 >for any Word file that I pass through the program. According to my hex >editor, the value of the 7th byte is "1A". However, while running the >program with the de{*filter*} I get "-1" (EOF) as the return value for this >byte and the loop is exited. >Could someone please explain this behavior to me and possibly offer a >way to correct this? I am using Visual C++, version 4.0 with Windows >95. Thank you.
|
Mon, 21 May 2001 03:00:00 GMT |
|
 |
Lawrence Kir #6 / 13
|
 When Is EOF not EOF?
Quote: >I am using the familiar construct (lifted right out of Kernighan & >Pike): > int c; > FILE *fp; > while ((c = getc(fp)) != EOF) { > if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c == >' '))
The C language doesn't define an isascii() function (and I suspect you can just leave it out in this case). Are you saying that Kernighan & Pike really suggest using this? Slapped wrists if they do. Quote: > putchar(c); > else > printf ("\\%03o", c); > } >to make binary files readable on screen. It works fine for many files. >However, when I try to use it with any Microsoft Word97 documents, it >craps out after returning the first 6 bytes. I get: > \320 \317 \021 \340 \241 \261 >for any Word file that I pass through the program. According to my hex >editor, the value of the 7th byte is "1A". However, while running the >program with the de{*filter*} I get "-1" (EOF) as the return value for this >byte and the loop is exited.
As others have indicated you need to open the file in binary mode (e.g. use file mode "rb" instead of just "r"). -- -----------------------------------------
-----------------------------------------
|
Tue, 22 May 2001 03:00:00 GMT |
|
 |
Will Ro #7 / 13
|
 When Is EOF not EOF?
: >I am using the familiar construct (lifted right out of Kernighan & : >Pike): : > : > int c; : > FILE *fp; : > while ((c = getc(fp)) != EOF) { : > if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c == : >' ')) : The C language doesn't define an isascii() function (and I suspect you : can just leave it out in this case). Are you saying that Kernighan & Pike : really suggest using this? Slapped wrists if they do. isascii() is now part of Xopen; K&P certainly list it as part of ctype.h, but my copy of the book is dated 1984. It could well have been part of K&R C at that point, tho' it's not in the index of the first edition. Will
|
Tue, 22 May 2001 03:00:00 GMT |
|
 |
Jack Kle #8 / 13
|
 When Is EOF not EOF?
Quote:
> : >I am using the familiar construct (lifted right out of Kernighan & > : >Pike): > : > > : > int c; > : > FILE *fp; > : > while ((c = getc(fp)) != EOF) { > : > if (isascii(c) && (isprint(c) || c == '\n' || c == '\t' || c == > : >' ')) > : The C language doesn't define an isascii() function (and I suspect you > : can just leave it out in this case). Are you saying that Kernighan & Pike > : really suggest using this? Slapped wrists if they do. > isascii() is now part of Xopen; K&P certainly list it as part of ctype.h, > but my copy of the book is dated 1984. It could well have been part of > K&R C at that point, tho' it's not in the index of the first edition. > Will
<Jack> In the good (or bad) old pre-standard days, the ctype.h functions were quite a bit different. In general, they only covered the range of ASCII characters (0 through 127). If you used an is... macro on a value greater than 127, it accessed a value past the end of the flag array, so the result could be rather random. The rest of the is... functions were only guaranteed to be accurate for values fitting in a char for which isascii() was non-zero. I remember a few unsigned implementations where it was implemented as !((ch)&0x80). I don't remember the macro any particular signed implementation used, could have been the same or ((ch)>=0). These were for 8 bit chars, of course, other architectures would have had somewhat different architectures. Also in those days the toupper() and tolower() macros were unconditional. Typically (ASCII character set), they just used bitwise operations to set or clear bit 6 of the value without regard to whether the value was alpha and of the opposite case. Now the standard requires all of the is... functions and/or macros to generate correct results for every integer value between 0 and UCHAR_MAX, and for EOF as well. This is particularly necessary because even in a 8 bit character set other languages use values greater than 127 to represent characters in their language which are not part of the ASCII set but can be alpha, punctuation, etc., in their languages. And toupper() and tolower() are required not to modify values which aren't alpha and of the respective opposite case. Of course it's trivial to implement one's own isascii() function (or macro). </Jack>
|
Tue, 22 May 2001 03:00:00 GMT |
|
 |
Stephan Wilm #9 / 13
|
 When Is EOF not EOF?
Quote:
> The byte that stopped your program is hex 1A, or Ctrl+Z. This special > character is used to mark the end of a text file.
Please do mention this this behaviour is specific to the MS-DOS operating system (and maybe some others, but most certainly not all). Quote: > If you want to list the contents of a binary file, you must open it in > binary mode. > FILE* fp; > fp = fopen("filename","rb");
... because the absence of the little "b" make "fopen()" default to text mode. Stephan (initiator of the campaign against grumpiness in c.l.c)
|
Tue, 22 May 2001 03:00:00 GMT |
|
 |
Pete Becke #10 / 13
|
 When Is EOF not EOF?
Quote:
> > According to my hex > > editor, the value of the 7th byte is "1A". However, while running the > > program with the de{*filter*} I get "-1" (EOF) as the return value for this > > byte and the loop is exited. > > Could someone please explain this behavior to me and possibly offer a > > way to correct this? > 0x1A, better known as Control-Z, is the end of file marker > for TEXT FILES in Windows. You reads Control-Z, the > I/O library thinks you are at EOF. > To correct this, open the file in binary mode.
No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible runtime library makes this behavior optional, and usually it's off by default. -- Pete Becker Dinkumware, Ltd. http://www.*-*-*.com/
|
Tue, 22 May 2001 03:00:00 GMT |
|
 |
Lawrence Kir #11 / 13
|
 When Is EOF not EOF?
Quote:
>> > According to my hex >> > editor, the value of the 7th byte is "1A". However, while running the >> > program with the de{*filter*} I get "-1" (EOF) as the return value for this >> > byte and the loop is exited. >> > Could someone please explain this behavior to me and possibly offer a >> > way to correct this? >> 0x1A, better known as Control-Z, is the end of file marker >> for TEXT FILES in Windows. You reads Control-Z, the >> I/O library thinks you are at EOF. >> To correct this, open the file in binary mode. >No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible >runtime library makes this behavior optional, and usually it's off by >default.
Opening in binary mode is the standard, portable solution to the problem. The original post said that binary data was being read so it will be needed anyway. -- -----------------------------------------
-----------------------------------------
|
Wed, 23 May 2001 03:00:00 GMT |
|
 |
Pete Becke #12 / 13
|
 When Is EOF not EOF?
Quote:
> >No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible > >runtime library makes this behavior optional, and usually it's off by > >default. > Opening in binary mode is the standard, portable solution to the problem. > The original post said that binary data was being read so it will be needed > anyway.
No, it's not portable. Since the original file was not written out under the same implementation, there are no guarantees of what data you will get when you try to read the file. -- Pete Becker Dinkumware, Ltd. http://www.dinkumware.com
|
Wed, 23 May 2001 03:00:00 GMT |
|
 |
Lawrence Kir #13 / 13
|
 When Is EOF not EOF?
Quote:
> writes: >> >No, open it in not-Control-Z-is-end-of-file mode. <g> Any sensible >> >runtime library makes this behavior optional, and usually it's off by >> >default. >> Opening in binary mode is the standard, portable solution to the problem. >> The original post said that binary data was being read so it will be needed >> anyway. >No, it's not portable. Since the original file was not written out under >the same implementation, there are no guarantees of what data you will >get when you try to read the file.
It is a portable way of eliminating any in-band data dependencies for the stream. -- -----------------------------------------
-----------------------------------------
|
Fri, 25 May 2001 03:00:00 GMT |
|
|
|