state of input stream after bad scanf ?5
Author |
Message |
Spider plant breeding progra #1 / 25
|
 state of input stream after bad scanf ?5
What state is scanf supposed to leave the input stream in after it encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such as +.qa 1e-e I wrote a test program to check the above two cases and get the following results on different platforms with buffered and unubuffered FILEs. For SunOS:
Buffered: Failed to ungetc ('!', file) With '+.qa' got 0 3 '+.qa' With '1e-e' got 1 1 '!e-e' Unbuffered: Failed to ungetc ('!', file) With '+.qa' got -1 1 'qa' Failed to ungetc ('!', file) With '1e-e' got 1 1 'e'
SunOS basil 4.1.3 3 sun4m For Solaris 2.5.1:
Buffered: With '+.qa' got 0 3 '!+.qa' With '1e-e' got 1 1 '!e-e' Unbuffered: With '+.qa' got 0 1 '!+.qa' With '1e-e' got 1 1 '!e-e'
SunOS uxe 5.5.1 Generic_103640-18 sun4u sparc SUNW,Ultra-2 For Irix: Buffered: With '+.qa' got 0 3 '!qa' With '1e-e' got 1 1 '!e-e' Unbuffered: With '+.qa' got 0 1 '!qa' With '1e-e' got 1 1 '!'
IRIX64 cia 6.2 03131016 IP25 For an Acorn: Buffered: With '+.qa' got 0 3 '!qa' With '1e-e' got 0 3 '!e' Unbuffered: With '+.qa' got 0 3 '!qa' With '1e-e' got 0 3 '!e' So Irix and SunOS give inconsistent results when vufferend and unbuffered files are used, and no 2 platforms give the same answers. What does ANSI mandate should happen? Thanks in advance for any help you can offer Nicholas Clark --------------------------------test program-------------------------------- #include <stdio.h> int main(int argc, char **argv) { const char *tests[] = {"+.qa", "1e-e", 0}; const char **test = tests; char buf[16]; double d = 3; int result; FILE *file = tmpfile(); puts ("Buffered:"); while (*test) { rewind (file); buf[0] = '\0'; fputs (*test, file); rewind (file); result = fscanf (file, "%lg", &d); if (ungetc ('!', file) == EOF) { printf ("Failed to ungetc ('!', file)\n"); } fgets (buf, sizeof(buf), file); printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf); test++; } test = tests; setbuf (file, NULL); /* And now unbuffered. */ puts ("\nUnbuffered:"); while (*test) { rewind (file); buf[0] = '\0'; fputs (*test, file); rewind (file); result = fscanf (file, "%lg", &d); if (ungetc ('!', file) == EOF) { printf ("Failed to ungetc ('!', file)\n"); } fgets (buf, sizeof(buf), file); printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf); test++; } return 0; Quote: }
-- #!perl -wlpi[finger.liv.ac.uk] # Black olives http://www.*-*-*.com/ BEGIN{$_="use SocketYIN;sockeXPZSOCK_STREAM,~proto'tcp'and\$|=connecXpack'S na4x8',AZ79,\$;=~host qq$^Ior die\$!;print'/w nickc\r'YOUT;print\$^I";s;X;t STDIN,;g;s\Y\;select STD\g;s$Z$F_INET,$g;s{~(\w*)}}get$1byname}g;eval||die}
|
Wed, 18 Oct 2000 03:00:00 GMT |
|
 |
fungu #2 / 25
|
 state of input stream after bad scanf ?5
Quote: > What state is scanf supposed to leave the input stream in after it > encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such > as > +.qa > 1e-e
You've just found out why scanf() is regarded as an evil function and to be avoided - if it fails then the input data is lost. You will be much better off reading the input into a buffer with fgets() then using sscanf() to interpret it. The effect is the same but you can have as many retries as you need. -- <\___/> / O O \ My web site: http://artlum.com/
|
Wed, 18 Oct 2000 03:00:00 GMT |
|
 |
Lawrence Kir #3 / 25
|
 state of input stream after bad scanf ?5
I've cross-posted this to comp.std.c because I'm not entirely clear about whether scanf and fscanf can implicitly call ungetc(). I can't see any explicit prohibition on this. Quoting from the C9X description of fscanf: [#16] If conversion terminates on a conflicting input character, the offending input character is left unread in the input stream.218 "left unread" doesn't suggest "pushed back with ungetc" to me although the footnote uses the term "pushed back" so it is rather vague. If other functions aren't allowed to use ungetc() implicitly a statement to that effect should be included as part of the ungetc() description, as it is with functions like getenv(). If other standard library functions are allowed to call ungetc() implicitly the standard ought to state which and in what circumstances. Quote: >What state is scanf supposed to leave the input stream in after it >encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such >as
The function should read characters until it encounters a character that can't be matched in terms of the current converson specification (or end-of-file or an error is encountered). The single character that can't be matched is left on the input stream. Quote: > +.qa
For %lg +. is a valid prefix for a floating point number but +.q is not. Therefore +. should be removed from the input stream leaving qa. +. isn't a valid form, not does it contain a prefix that is a valid form of floating point number. Therefore the %lg conversion as a whole will fail to match and no argument will be assigned. Quote: > 1e-e
1e- is a valid prefix but 1e-e is not. Therefore 1e- is removed from the input stream and the e is left on the stream. Within 1e- 1 is the maximal prefix that is a validly formed floating point number (as accepted by strtod) therefore this will be matched and should cause 1.0 to be assigned through the argument pointer. Quote: >I wrote a test program to check the above two cases and get the following >results on different platforms with buffered and unubuffered FILEs. >For SunOS:
>Buffered: >Failed to ungetc ('!', file)
Still in question. Quote: >With '+.qa' got 0 3 '+.qa' >With '1e-e' got 1 1 '!e-e'
Incorrect, the +. and e- should have been removed from the stream. Quote: >Unbuffered: >Failed to ungetc ('!', file) >With '+.qa' got -1 1 'qa' >Failed to ungetc ('!', file) >With '1e-e' got 1 1 'e'
>SunOS basil 4.1.3 3 sun4m
Unfortunately all of your unbuffered tests are invalid (see later). Quote: >For Solaris 2.5.1:
>Buffered: >With '+.qa' got 0 3 '!+.qa' >With '1e-e' got 1 1 '!e-e'
Again, +. and e- should have been removed. ... Quote: >For Irix: >Buffered: >With '+.qa' got 0 3 '!qa'
Correct. Quote: >With '1e-e' got 1 1 '!e-e'
The e- should have been removed. ... Quote: >For an Acorn: >Buffered: >With '+.qa' got 0 3 '!qa'
Correct. Quote: >With '1e-e' got 0 3 '!e'
1.0 should have been assigned. ... Quote: >--------------------------------test program-------------------------------- >#include <stdio.h> >int main(int argc, char **argv) >{ > const char *tests[] = {"+.qa", "1e-e", 0}; > const char **test = tests; > char buf[16]; > double d = 3;
I'd duggest a more obvious ``error'' number like -1. Also the number should be assigned before each call to fscanf, not just once. Quote: > int result; > FILE *file = tmpfile(); > puts ("Buffered:"); > while (*test) > { > rewind (file); > buf[0] = '\0'; > fputs (*test, file); > rewind (file); > result = fscanf (file, "%lg", &d); > if (ungetc ('!', file) == EOF) > { > printf ("Failed to ungetc ('!', file)\n"); > } > fgets (buf, sizeof(buf), file); > printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf);
%lg is an invalid printf conversion specification (although it is correct for scanf). The printf specifivation for double is %g (%e, %f etc.) which also happens to work for float types since they automatically get promoted to double in the caller before being passed. Quote: > test++; > } > test = tests; > setbuf (file, NULL); /* And now unbuffered. */
This is an error. setbuf() and setvbuf() can only be called directly after the stream has been opened i.e. before any other operation has been performed on it. You can't change a stream's buffering on the fly. You'll need to close the stream and open a new one here. -- -----------------------------------------
-----------------------------------------
|
Wed, 18 Oct 2000 03:00:00 GMT |
|
 |
Douglas A. Gwy #4 / 25
|
 state of input stream after bad scanf ?5
Quote:
> "left unread" doesn't suggest "pushed back with ungetc" to me although the > footnote uses the term "pushed back" so it is rather vague.
I think there was a DR on this; the committee as I recall has always intended fscanf() pushback to be implemented *in addition to* the support for ungetc(). I think four or five extra buffer characters are necessary.
|
Thu, 19 Oct 2000 03:00:00 GMT |
|
 |
Lawrence Kir #5 / 25
|
 state of input stream after bad scanf ?5
Quote:
>> "left unread" doesn't suggest "pushed back with ungetc" to me although the >> footnote uses the term "pushed back" so it is rather vague. >I think there was a DR on this; >the committee as I recall has always intended fscanf() pushback >to be implemented *in addition to* the support for ungetc().
If there was a DR it doesn't seem to have made it into the C9X wording. Quote: >I think four or five extra buffer characters are necessary.
How so? AFAIK library functions will only ``push back'' one character and then only after successfully reading a character so the effects can't be cumulative. The only problem case I can think of is: ungetc('x', fp); fscanf(fp, "y"); ungetc('z', fp); i.e. where the character ``pushed back'' by fscanf() was originally ungetc()'d. I guess that the second ungetc() should be allowed to fail, whereas in: ungetc('x', fp); fseek(fp, 0, SEEK_SET); fscanf(fp, "y"); /* say the first character encountered is 'a' */ ungetc('z', fp); it can't fail due to the push-back limit being exceeded. -- -----------------------------------------
-----------------------------------------
|
Thu, 19 Oct 2000 03:00:00 GMT |
|
 |
Clive D.W. Feathe #6 / 25
|
 state of input stream after bad scanf ?5
Quote: >I've cross-posted this to comp.std.c because I'm not entirely clear >about whether scanf and fscanf can implicitly call ungetc().
When we worked this through for TC1, we decided not. Quote: >>What state is scanf supposed to leave the input stream in after it >>encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such >>as >The function should read characters until it encounters a character that >can't be matched in terms of the current converson specification (or >end-of-file or an error is encountered). The single character that can't be >matched is left on the input stream.
Right. Quote: >> 1e-e >1e- is a valid prefix but 1e-e is not. Therefore 1e- is removed from the >input stream and the e is left on the stream.
Right. Quote: >Within 1e- 1 is the maximal >prefix that is a validly formed floating point number (as accepted by >strtod) therefore this will be matched and should cause 1.0 to be assigned >through the argument pointer.
No. 1e- is not a valid floating point number, and so a conversion error happens. Quote: >> printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf); >%lg is an invalid printf conversion specification (although it is correct >for scanf).
%lg is wrong for double, but it is a valid specification (it's used for long double). -- Clive D.W. Feather | Director of Software Development | Home email:
Written on my laptop; please observe the Reply-To address |
|
Thu, 19 Oct 2000 03:00:00 GMT |
|
 |
Clive D.W. Feathe #7 / 25
|
 state of input stream after bad scanf ?5
Quote: >I think there was a DR on this; >the committee as I recall has always intended fscanf() pushback >to be implemented *in addition to* the support for ungetc(). >I think four or five extra buffer characters are necessary.
No, you need one slot for ungetc() and you need the ability to hold the lookahead character for fscanf() (the latter can be in the normal stdio buffer). That's all. -- Clive D.W. Feather | Director of Software Development | Home email:
Written on my laptop; please observe the Reply-To address |
|
Thu, 19 Oct 2000 03:00:00 GMT |
|
 |
Clive D.W. Feathe #8 / 25
|
 state of input stream after bad scanf ?5
Quote: >>I think there was a DR on this; >If there was a DR it doesn't seem to have made it into the C9X wording.
It did - it was in TC1. Quote: >How so? AFAIK library functions will only ``push back'' one character and >then only after successfully reading a character so the effects can't be >cumulative. The only problem case I can think of is: > ungetc('x', fp); > fscanf(fp, "y"); > ungetc('z', fp);
I don't think we looked at that case. I would personally allow the second ungetc to fail. -- Clive D.W. Feather | Director of Software Development | Home email:
Written on my laptop; please observe the Reply-To address |
|
Thu, 19 Oct 2000 03:00:00 GMT |
|
 |
Lawrence Kir #9 / 25
|
 state of input stream after bad scanf ?5
... Quote: >>Within 1e- 1 is the maximal >>prefix that is a validly formed floating point number (as accepted by >>strtod) therefore this will be matched and should cause 1.0 to be assigned >>through the argument pointer. >No. 1e- is not a valid floating point number, and so a conversion error >happens.
1e- isn't, but 1 is. strtod() must accept this so, as far as I can see from the standard, fscanf must treat it as a conversion match. The only difference is that fscanf() can only ``push back'' a single character so therefore e- can't be restored to the input stream so they get dumped. I can't see anything that allows that to affect whether a successful match occurred or not. Incidentally the original article which gave evidence that 4 (maybe 3) different ANSI compilers couldn't agree on the interpretation of this is good evidence that the wording on this point needs to be improved, even if we end up agreeing on an interpretation. Quote: >>> printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf); >>%lg is an invalid printf conversion specification (although it is correct >>for scanf). >%lg is wrong for double, but it is a valid specification (it's used for >long double).
No, you're thinking of %Lg. %lg is undefined for fprintf() and specifies a double conversion for fscanf(). -- -----------------------------------------
-----------------------------------------
|
Thu, 19 Oct 2000 03:00:00 GMT |
|
 |
Chris Tor #10 / 25
|
 state of input stream after bad scanf ?5
Quote:
>1e- isn't, but 1 is. strtod() must accept this so, as far as I can see >from the standard, fscanf must treat it as a conversion match. The only >difference is that fscanf() can only ``push back'' a single character so >therefore e- can't be restored to the input stream so they get dumped. I >can't see anything that allows that to affect whether a successful match >occurred or not.
I have never found a copy of TC1 or TC2 myself; all I have is the original standard, which to me said that the "e-" must be restored to the input stream. So, my stdio *can* restore them. This turns out not to be all that difficult. My stdio guarantees that any open stream can have at least three arbitrary characters "pushed back", by having three bytes in the "FILE" structure itself in which they can reside, and a state in which getc() reads from this three-byte buffer. (This three bytes covers "e-" and one "user" ungetc(), and since the scanf innards get to peek ahead, you effectively get a fourth byte as well.) Quote: >Incidentally the original article which gave evidence that 4 (maybe 3) >different ANSI compilers couldn't agree on the interpretation of this is >good evidence that the wording on this point needs to be improved, even if >we end up agreeing on an interpretation.
Personally, I am inclined to vote for my interpretation. If I can do it, everyone else should be able to -- heck, my code is freely available; all they have to do is copy it! :-) -- In-Real-Life: Chris Torek, Berkeley Software Design Inc
Antispam notice: unsolicited commercial email will be handled at my consulting rate; pyramid-scheme mail will be forwarded to the FTC.
|
Thu, 19 Oct 2000 03:00:00 GMT |
|
 |
Lawrence Kir #11 / 25
|
 state of input stream after bad scanf ?5
Quote:
>>1e- isn't, but 1 is. strtod() must accept this so, as far as I can see >>from the standard, fscanf must treat it as a conversion match. The only >>difference is that fscanf() can only ``push back'' a single character so >>therefore e- can't be restored to the input stream so they get dumped. I >>can't see anything that allows that to affect whether a successful match >>occurred or not. >I have never found a copy of TC1 or TC2 myself; all I have is the >original standard, which to me said that the "e-" must be restored >to the input stream.
C90 says "If conversion terminates on a conflicting input character, the offending input character is left unread in the input stream." That specifies a single character as being ``left unread''. There's nothing that specifies that previously read characters are restored to the input stream. NA1 adds as a footnote: "fscanf pushes back at most one input character onto the input stream. Therefore some sequences that are acceptable to strtod, strtol, or strtoul are unacceptable to fscanf" The second sentence supports Clive's interpretation but IMHO it conflicts with the normative text so must be disregarded. Quote: >So, my stdio *can* restore them. This turns out not to be all that >difficult. My stdio guarantees that any open stream can have at >least three arbitrary characters "pushed back", by having three >bytes in the "FILE" structure itself in which they can reside, and >a state in which getc() reads from this three-byte buffer. (This >three bytes covers "e-" and one "user" ungetc(), and since the >scanf innards get to peek ahead, you effectively get a fourth byte >as well.) >>Incidentally the original article which gave evidence that 4 (maybe 3) >>different ANSI compilers couldn't agree on the interpretation of this is >>good evidence that the wording on this point needs to be improved, even if >>we end up agreeing on an interpretation. >Personally, I am inclined to vote for my interpretation. If I can >do it, everyone else should be able to -- heck, my code is freely >available; all they have to do is copy it! :-)
Are there any efficiency issues that result from this? That and code size would be the only objections I can think of that could be levelled. There's another issue in this, perhaps even more fundamental. Consider the case where fp has been opened to a seekable binary file whose first character is 'a'. Assuming no I/O errors: rewind(fp); fscanf(fp, "x"); fseek(fp, 0, SEEK_CUR); filepos = ftell(fp); What should the value of filepos now be? If fscanf() uses ungetc() to replace the 'a' then it should be 1. If not then it should be 0. I'm inclinted to believe that 0 is the correct result as per the current documents but that may not be easy to implement, and may lead to undesirable consequences. -- -----------------------------------------
-----------------------------------------
|
Fri, 20 Oct 2000 03:00:00 GMT |
|
 |
Clive D.W. Feathe #12 / 25
|
 state of input stream after bad scanf ?5
Quote: >>>Within 1e- 1 is the maximal >>>prefix that is a validly formed floating point number >>No. 1e- is not a valid floating point number, and so a conversion error >>happens. >1e- isn't, but 1 is. strtod() must accept this so, as far as I can see >from the standard, fscanf must treat it as a conversion match.
You see wrong. 7.13.6.2 in C9X CD1, but the text is unchanged since TC1 altered it. The process has three steps (%n is different and I ignore it here): (1) Except for %[ and %c, all characters for which isspace() is true are skipped. (2) An "input item" is read. This is the longest sequence of characters that could be a valid match or a prefix of a valid match *and* does not exceed the field width if given. So in the case of "1e-e" the input item is "1e-". The second e "remains unread", to quote the Standard. (3) The input item is converted to a value. "If the input item is not a matching sequence, the execution of the driective fails: this condition is a matching failure". "1e-" is not a matching sequence and so there is an error. The matching sequence for %g is the "subject string" of strtod(). This *excludes* any trailing junk. There is a similar example in the Standard where "100e" is the input item for %g, and it is made explicit that no store takes place. See also footnote 218. Quote: >Incidentally the original article which gave evidence that 4 (maybe 3) >different ANSI compilers couldn't agree on the interpretation of this is >good evidence that the wording on this point needs to be improved, even if >we end up agreeing on an interpretation.
I wonder how many actually implement the current Standard (with TC1 applied); we made these changes because the old text *was* ambiguous. Quote: >>%lg is wrong for double, but it is a valid specification (it's used for >>long double). >No, you're thinking of %Lg.
Oops, so I am. -- Clive D.W. Feather | Director of Software Development | Home email:
Written on my laptop; please observe the Reply-To address |
|
Fri, 20 Oct 2000 03:00:00 GMT |
|
 |
Clive D.W. Feathe #13 / 25
|
 state of input stream after bad scanf ?5
writes Quote: >I have never found a copy of TC1 or TC2 myself; all I have is the >original standard, which to me said that the "e-" must be restored >to the input stream.
When we wrote TC1 we felt that the original Standard was ambiguous, and that it was not the intention of the original wording to require more than one character of pushback. The normative text of TC1 is in an article I wrote that can be found at <http://www.lysator.liu.se/c>. The changes made for each DR can be found in the full Records of Response on the dkuug.dk web site. Quote: >Personally, I am inclined to vote for my interpretation. If I can >do it, everyone else should be able to
Perhaps, but I prefer the current interpretation (one lookahead, no pushback). -- Clive D.W. Feather | Director of Software Development | Home email:
Written on my laptop; please observe the Reply-To address |
|
Fri, 20 Oct 2000 03:00:00 GMT |
|
 |
Larry Jone #14 / 25
|
 state of input stream after bad scanf ?5
Quote:
> rewind(fp); > fscanf(fp, "x"); > fseek(fp, 0, SEEK_CUR); > filepos = ftell(fp); > What should the value of filepos now be? If fscanf() uses ungetc() to replace > the 'a' then it should be 1. If not then it should be 0.
No, ungetc() on a binary stream decrements the file position indicator and fseek() determines the new position before undoing the effects of ungetc(), so the resulting file position is 0 in either case. -Larry Jones In my opinion, we don't devote nearly enough scientific research to finding a cure for jerks. -- Calvin
|
Fri, 20 Oct 2000 03:00:00 GMT |
|
 |
Larry Jone #15 / 25
|
 state of input stream after bad scanf ?5
Quote:
> No, you're thinking of %Lg. %lg is undefined for fprintf() and specifies > a double conversion for fscanf().
In C9X, %lg and friends *are* defined for fprintf() and friends -- the l modifier is ignored. This makes a whole bunch of previously incorrect programs correct, breaks no known implementations, and makes printf and scanf formats more nearly parallel. -Larry Jones There's a connection here, I just know it. -- Calvin
|
Fri, 20 Oct 2000 03:00:00 GMT |
|
|
Page 1 of 2
|
[ 25 post ] |
|
Go to page:
[1]
[2] |
|