state of input stream after bad scanf ?5 
Author Message
 state of input stream after bad scanf ?5

What state is scanf supposed to leave the input stream in after it
encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such
as

        +.qa
        1e-e

I wrote a test program to check the above two cases and get the following
results on different platforms with buffered and unubuffered FILEs.

For SunOS:

Buffered:
Failed to ungetc ('!', file)
With '+.qa' got 0 3 '+.qa'
With '1e-e' got 1 1 '!e-e'

Unbuffered:
Failed to ungetc ('!', file)
With '+.qa' got -1 1 'qa'
Failed to ungetc ('!', file)
With '1e-e' got 1 1 'e'

SunOS basil 4.1.3 3 sun4m

For Solaris 2.5.1:

Buffered:
With '+.qa' got 0 3 '!+.qa'
With '1e-e' got 1 1 '!e-e'

Unbuffered:
With '+.qa' got 0 1 '!+.qa'
With '1e-e' got 1 1 '!e-e'

SunOS uxe 5.5.1 Generic_103640-18 sun4u sparc SUNW,Ultra-2

For Irix:
Buffered:
With '+.qa' got 0 3 '!qa'
With '1e-e' got 1 1 '!e-e'

Unbuffered:
With '+.qa' got 0 1 '!qa'
With '1e-e' got 1 1 '!'

IRIX64 cia 6.2 03131016 IP25

For an Acorn:
Buffered:
With '+.qa' got 0 3 '!qa'
With '1e-e' got 0 3 '!e'

Unbuffered:
With '+.qa' got 0 3 '!qa'
With '1e-e' got 0 3 '!e'

So Irix and SunOS give inconsistent results when vufferend and unbuffered
files are used, and no 2 platforms give the same answers.
What does ANSI mandate should happen?

Thanks in advance for any help you can offer

Nicholas Clark

--------------------------------test program--------------------------------
#include <stdio.h>

int main(int argc, char **argv)
{
  const char *tests[] = {"+.qa", "1e-e", 0};
  const char **test = tests;
  char buf[16];
  double d = 3;
  int result;
  FILE *file = tmpfile();

  puts ("Buffered:");
  while (*test)
    {
      rewind (file);
      buf[0] = '\0';
      fputs (*test, file);
      rewind (file);
      result = fscanf (file, "%lg", &d);
      if (ungetc ('!', file) == EOF)
        {
          printf ("Failed to ungetc ('!', file)\n");
        }
      fgets (buf, sizeof(buf), file);
      printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf);
      test++;
    }

  test = tests;
  setbuf (file, NULL);  /* And now unbuffered.  */

  puts ("\nUnbuffered:");
  while (*test)
    {
      rewind (file);
      buf[0] = '\0';
      fputs (*test, file);
      rewind (file);
      result = fscanf (file, "%lg", &d);
      if (ungetc ('!', file) == EOF)
        {
          printf ("Failed to ungetc ('!', file)\n");
        }
      fgets (buf, sizeof(buf), file);
      printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf);
      test++;
    }

  return 0;

Quote:
}

--
#!perl -wlpi[finger.liv.ac.uk]  # Black olives http://www.*-*-*.com/
BEGIN{$_="use SocketYIN;sockeXPZSOCK_STREAM,~proto'tcp'and\$|=connecXpack'S
na4x8',AZ79,\$;=~host qq$^Ior die\$!;print'/w nickc\r'YOUT;print\$^I";s;X;t
STDIN,;g;s\Y\;select STD\g;s$Z$F_INET,$g;s{~(\w*)}}get$1byname}g;eval||die}


Wed, 18 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5


Quote:

> What state is scanf supposed to leave the input stream in after it
> encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such
> as

>         +.qa
>         1e-e

You've just found out why scanf() is regarded as an evil function
and to be avoided - if it fails then the input data is lost.

You will be much better off reading the input into a buffer
with fgets() then using sscanf() to interpret it. The effect
is the same but you can have as many retries as you need.

--
<\___/>
/ O O \                     My web site:  http://artlum.com/



Wed, 18 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5



I've cross-posted this to comp.std.c because I'm not entirely clear
about whether scanf and fscanf can implicitly call ungetc(). I can't see
any explicit prohibition on this. Quoting from the C9X description of
fscanf:

       [#16]  If  conversion  terminates  on  a  conflicting  input
       character,  the  offending input character is left unread in
       the input stream.218

"left unread" doesn't suggest "pushed back with ungetc" to me although the
footnote uses the term "pushed back" so it is rather vague. If other
functions aren't allowed to use ungetc() implicitly a statement to that
effect should be included as part of the ungetc() description, as it is
with functions like getenv().  If other standard library functions are
allowed to call ungetc() implicitly the standard ought to state which and
in what circumstances.

Quote:
>What state is scanf supposed to leave the input stream in after it
>encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such
>as

The function should read characters until it encounters a character that
can't be matched in terms of the current converson specification (or
end-of-file or an error is encountered). The single character that can't be
matched is left on the input stream.

Quote:
>        +.qa

For %lg +. is a valid prefix for a floating point number but +.q is not.
Therefore +. should be removed from the input stream leaving qa. +. isn't
a valid form, not does it contain a prefix that is a valid form of
floating point number. Therefore the %lg conversion as a whole will fail to
match and no argument will be assigned.

Quote:
>        1e-e

1e- is a valid prefix but 1e-e is not. Therefore 1e- is removed from the
input stream and the e is left on the stream. Within 1e- 1 is the maximal
prefix that is a validly formed floating point number (as accepted by
strtod) therefore this will be matched and should cause 1.0 to be assigned
through the argument pointer.

Quote:
>I wrote a test program to check the above two cases and get the following
>results on different platforms with buffered and unubuffered FILEs.

>For SunOS:

>Buffered:
>Failed to ungetc ('!', file)

Still in question.

Quote:
>With '+.qa' got 0 3 '+.qa'
>With '1e-e' got 1 1 '!e-e'

Incorrect, the +. and e- should have been removed from the stream.

Quote:
>Unbuffered:
>Failed to ungetc ('!', file)
>With '+.qa' got -1 1 'qa'
>Failed to ungetc ('!', file)
>With '1e-e' got 1 1 'e'

>SunOS basil 4.1.3 3 sun4m

Unfortunately all of your unbuffered tests are invalid (see later).

Quote:
>For Solaris 2.5.1:

>Buffered:
>With '+.qa' got 0 3 '!+.qa'
>With '1e-e' got 1 1 '!e-e'

Again, +. and e- should have been removed.

...

Quote:
>For Irix:
>Buffered:
>With '+.qa' got 0 3 '!qa'

Correct.

Quote:
>With '1e-e' got 1 1 '!e-e'

The e- should have been removed.

...

Quote:
>For an Acorn:
>Buffered:
>With '+.qa' got 0 3 '!qa'

Correct.

Quote:
>With '1e-e' got 0 3 '!e'

1.0 should have been assigned.

...

Quote:
>--------------------------------test program--------------------------------
>#include <stdio.h>

>int main(int argc, char **argv)
>{
>  const char *tests[] = {"+.qa", "1e-e", 0};
>  const char **test = tests;
>  char buf[16];
>  double d = 3;

I'd duggest a more obvious ``error'' number like -1. Also the number
should be assigned before each call to fscanf, not just once.

Quote:
>  int result;
>  FILE *file = tmpfile();

>  puts ("Buffered:");
>  while (*test)
>    {
>      rewind (file);
>      buf[0] = '\0';
>      fputs (*test, file);
>      rewind (file);
>      result = fscanf (file, "%lg", &d);
>      if (ungetc ('!', file) == EOF)
>        {
>          printf ("Failed to ungetc ('!', file)\n");
>        }
>      fgets (buf, sizeof(buf), file);
>      printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf);

%lg is an invalid printf conversion specification (although it is correct
for scanf). The printf specifivation for double is %g (%e, %f etc.) which
also happens to work for float types since they automatically get promoted
to double in the caller before being passed.

Quote:
>      test++;
>    }

>  test = tests;
>  setbuf (file, NULL);  /* And now unbuffered.  */

This is an error. setbuf() and setvbuf() can only be called directly
after the stream has been opened i.e. before any other operation has been
performed on it. You can't change a stream's buffering on the fly. You'll
need to close the stream and open a new one here.

--
-----------------------------------------


-----------------------------------------



Wed, 18 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5

Quote:

> "left unread" doesn't suggest "pushed back with ungetc" to me although the
> footnote uses the term "pushed back" so it is rather vague.

I think there was a DR on this;
the committee as I recall has always intended fscanf() pushback
to be implemented *in addition to* the support for ungetc().
I think four or five extra buffer characters are necessary.


Thu, 19 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5


Quote:

>> "left unread" doesn't suggest "pushed back with ungetc" to me although the
>> footnote uses the term "pushed back" so it is rather vague.

>I think there was a DR on this;
>the committee as I recall has always intended fscanf() pushback
>to be implemented *in addition to* the support for ungetc().

If there was a DR it doesn't seem to have made it into the C9X wording.

Quote:
>I think four or five extra buffer characters are necessary.

How so? AFAIK library functions will only ``push back'' one character and
then only after successfully reading a character so the effects can't be
cumulative. The only problem case I can think of is:

     ungetc('x', fp);
     fscanf(fp, "y");
     ungetc('z', fp);

i.e. where the character ``pushed back'' by fscanf() was originally
ungetc()'d. I guess that the second ungetc() should be allowed to fail,
whereas in:

     ungetc('x', fp);
     fseek(fp, 0, SEEK_SET);
     fscanf(fp, "y");        /* say the first character encountered is 'a' */
     ungetc('z', fp);

it can't fail due to the push-back limit being exceeded.

--
-----------------------------------------


-----------------------------------------



Thu, 19 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5



Quote:
>I've cross-posted this to comp.std.c because I'm not entirely clear
>about whether scanf and fscanf can implicitly call ungetc().

When we worked this through for TC1, we decided not.

Quote:
>>What state is scanf supposed to leave the input stream in after it
>>encounters bad input? For example, scanf ("%lg", &d); meeting "numbers" such
>>as
>The function should read characters until it encounters a character that
>can't be matched in terms of the current converson specification (or
>end-of-file or an error is encountered). The single character that can't be
>matched is left on the input stream.

Right.

Quote:
>>        1e-e
>1e- is a valid prefix but 1e-e is not. Therefore 1e- is removed from the
>input stream and the e is left on the stream.

Right.

Quote:
>Within 1e- 1 is the maximal
>prefix that is a validly formed floating point number (as accepted by
>strtod) therefore this will be matched and should cause 1.0 to be assigned
>through the argument pointer.

No. 1e- is not a valid floating point number, and so a conversion error
happens.

Quote:
>>      printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf);
>%lg is an invalid printf conversion specification (although it is correct
>for scanf).

%lg is wrong for double, but it is a valid specification (it's used for
long double).

--
Clive D.W. Feather    | Director of Software Development  | Home email:


Written on my laptop; please observe the Reply-To address |



Thu, 19 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5



Quote:
>I think there was a DR on this;
>the committee as I recall has always intended fscanf() pushback
>to be implemented *in addition to* the support for ungetc().
>I think four or five extra buffer characters are necessary.

No, you need one slot for ungetc() and you need the ability to hold the
lookahead character for fscanf() (the latter can be in the normal stdio
buffer). That's all.

--
Clive D.W. Feather    | Director of Software Development  | Home email:


Written on my laptop; please observe the Reply-To address |



Thu, 19 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5



Quote:
>>I think there was a DR on this;
>If there was a DR it doesn't seem to have made it into the C9X wording.

It did - it was in TC1.

Quote:
>How so? AFAIK library functions will only ``push back'' one character and
>then only after successfully reading a character so the effects can't be
>cumulative. The only problem case I can think of is:

>     ungetc('x', fp);
>     fscanf(fp, "y");
>     ungetc('z', fp);

I don't think we looked at that case. I would personally allow the
second ungetc to fail.

--
Clive D.W. Feather    | Director of Software Development  | Home email:


Written on my laptop; please observe the Reply-To address |



Thu, 19 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5



...

Quote:
>>Within 1e- 1 is the maximal
>>prefix that is a validly formed floating point number (as accepted by
>>strtod) therefore this will be matched and should cause 1.0 to be assigned
>>through the argument pointer.

>No. 1e- is not a valid floating point number, and so a conversion error
>happens.

1e- isn't, but 1 is. strtod() must accept this so, as far as I can see
from the standard, fscanf must treat it as a conversion match. The only
difference is that fscanf() can only ``push back'' a single character so
therefore e- can't be restored to the input stream so they get dumped. I
can't see anything that allows that to affect whether a successful match
occurred or not.

Incidentally the original article which gave evidence that 4 (maybe 3)
different ANSI compilers couldn't agree on the interpretation of this is
good evidence that the wording on this point needs to be improved, even if
we end up agreeing on an interpretation.

Quote:
>>>      printf ("With '%s' got %d %lg '%s'\n", *test, result, d, buf);
>>%lg is an invalid printf conversion specification (although it is correct
>>for scanf).

>%lg is wrong for double, but it is a valid specification (it's used for
>long double).

No, you're thinking of %Lg. %lg is undefined for fprintf() and specifies
a double conversion for fscanf().

--
-----------------------------------------


-----------------------------------------



Thu, 19 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5


Quote:

>1e- isn't, but 1 is. strtod() must accept this so, as far as I can see
>from the standard, fscanf must treat it as a conversion match. The only
>difference is that fscanf() can only ``push back'' a single character so
>therefore e- can't be restored to the input stream so they get dumped. I
>can't see anything that allows that to affect whether a successful match
>occurred or not.

I have never found a copy of TC1 or TC2 myself; all I have is the
original standard, which to me said that the "e-" must be restored
to the input stream.

So, my stdio *can* restore them.  This turns out not to be all that
difficult.  My stdio guarantees that any open stream can have at
least three arbitrary characters "pushed back", by having three
bytes in the "FILE" structure itself in which they can reside, and
a state in which getc() reads from this three-byte buffer.  (This
three bytes covers "e-" and one "user" ungetc(), and since the
scanf innards get to peek ahead, you effectively get a fourth byte
as well.)

Quote:
>Incidentally the original article which gave evidence that 4 (maybe 3)
>different ANSI compilers couldn't agree on the interpretation of this is
>good evidence that the wording on this point needs to be improved, even if
>we end up agreeing on an interpretation.

Personally, I am inclined to vote for my interpretation.  If I can
do it, everyone else should be able to -- heck, my code is freely
available; all they have to do is copy it! :-)
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc

Antispam notice: unsolicited commercial email will be handled at my
consulting rate; pyramid-scheme mail will be forwarded to the FTC.


Thu, 19 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5

Quote:



>>1e- isn't, but 1 is. strtod() must accept this so, as far as I can see
>>from the standard, fscanf must treat it as a conversion match. The only
>>difference is that fscanf() can only ``push back'' a single character so
>>therefore e- can't be restored to the input stream so they get dumped. I
>>can't see anything that allows that to affect whether a successful match
>>occurred or not.

>I have never found a copy of TC1 or TC2 myself; all I have is the
>original standard, which to me said that the "e-" must be restored
>to the input stream.

C90 says "If conversion terminates on a conflicting input character, the
offending input character is left unread in the input stream."

That specifies a single character as being ``left unread''. There's nothing
that specifies that previously read characters are restored to the input
stream.

NA1 adds as a footnote:

"fscanf pushes back at most one input character onto the input stream.
 Therefore some sequences that are acceptable to strtod, strtol, or strtoul
 are unacceptable to fscanf"

The second sentence supports Clive's interpretation but IMHO it conflicts
with the normative text so must be disregarded.

Quote:
>So, my stdio *can* restore them.  This turns out not to be all that
>difficult.  My stdio guarantees that any open stream can have at
>least three arbitrary characters "pushed back", by having three
>bytes in the "FILE" structure itself in which they can reside, and
>a state in which getc() reads from this three-byte buffer.  (This
>three bytes covers "e-" and one "user" ungetc(), and since the
>scanf innards get to peek ahead, you effectively get a fourth byte
>as well.)

>>Incidentally the original article which gave evidence that 4 (maybe 3)
>>different ANSI compilers couldn't agree on the interpretation of this is
>>good evidence that the wording on this point needs to be improved, even if
>>we end up agreeing on an interpretation.

>Personally, I am inclined to vote for my interpretation.  If I can
>do it, everyone else should be able to -- heck, my code is freely
>available; all they have to do is copy it! :-)

Are there any efficiency issues that result from this? That and code
size would be the only objections I can think of that could be levelled.

There's another issue in this, perhaps even more fundamental. Consider
the case where fp has been opened to a seekable binary file whose
first character is 'a'. Assuming no I/O errors:

    rewind(fp);
    fscanf(fp, "x");
    fseek(fp, 0, SEEK_CUR);
    filepos = ftell(fp);

What should the value of filepos now be? If fscanf() uses ungetc() to replace
the 'a' then it should be 1. If not then it should be 0. I'm inclinted to
believe that 0 is the correct result as per the current documents but that
may not be easy to implement, and may lead to undesirable consequences.

--
-----------------------------------------


-----------------------------------------



Fri, 20 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5



Quote:
>>>Within 1e- 1 is the maximal
>>>prefix that is a validly formed floating point number
>>No. 1e- is not a valid floating point number, and so a conversion error
>>happens.
>1e- isn't, but 1 is. strtod() must accept this so, as far as I can see
>from the standard, fscanf must treat it as a conversion match.

You see wrong.

7.13.6.2 in C9X CD1, but the text is unchanged since TC1 altered it. The
process has three steps (%n is different and I ignore it here):
(1) Except for %[ and %c, all characters for which isspace() is true are
    skipped.
(2) An "input item" is read. This is the longest sequence of characters
    that could be a valid match or a prefix of a valid match *and* does
    not exceed the field width if given. So in the case of "1e-e" the
    input item is "1e-". The second e "remains unread", to quote the
    Standard.
(3) The input item is converted to a value. "If the input item is not a
    matching sequence, the execution of the driective fails: this
    condition is a matching failure". "1e-" is not a matching sequence
    and so there is an error.

The matching sequence for %g is the "subject string" of strtod(). This
*excludes* any trailing junk. There is a similar example in the Standard
where "100e" is the input item for %g, and it is made explicit that no
store takes place. See also footnote 218.

Quote:
>Incidentally the original article which gave evidence that 4 (maybe 3)
>different ANSI compilers couldn't agree on the interpretation of this is
>good evidence that the wording on this point needs to be improved, even if
>we end up agreeing on an interpretation.

I wonder how many actually implement the current Standard (with TC1
applied); we made these changes because the old text *was* ambiguous.

Quote:
>>%lg is wrong for double, but it is a valid specification (it's used for
>>long double).
>No, you're thinking of %Lg.

Oops, so I am.

--
Clive D.W. Feather    | Director of Software Development  | Home email:


Written on my laptop; please observe the Reply-To address |



Fri, 20 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5


writes

Quote:
>I have never found a copy of TC1 or TC2 myself; all I have is the
>original standard, which to me said that the "e-" must be restored
>to the input stream.

When we wrote TC1 we felt that the original Standard was ambiguous, and
that it was not the intention of the original wording to require more
than one character of pushback.

The normative text of TC1 is in an article I wrote that can be found at
<http://www.lysator.liu.se/c>. The changes made for each DR can be found
in the full Records of Response on the dkuug.dk web site.

Quote:
>Personally, I am inclined to vote for my interpretation.  If I can
>do it, everyone else should be able to

Perhaps, but I prefer the current interpretation (one lookahead, no
pushback).

--
Clive D.W. Feather    | Director of Software Development  | Home email:


Written on my laptop; please observe the Reply-To address |



Fri, 20 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5

Quote:

>     rewind(fp);
>     fscanf(fp, "x");
>     fseek(fp, 0, SEEK_CUR);
>     filepos = ftell(fp);

> What should the value of filepos now be? If fscanf() uses ungetc() to replace
> the 'a' then it should be 1. If not then it should be 0.

No, ungetc() on a binary stream decrements the file position indicator
and fseek() determines the new position before undoing the effects of
ungetc(), so the resulting file position is 0 in either case.

-Larry Jones

In my opinion, we don't devote nearly enough scientific research
to finding a cure for jerks. -- Calvin



Fri, 20 Oct 2000 03:00:00 GMT  
 state of input stream after bad scanf ?5

Quote:

> No, you're thinking of %Lg. %lg is undefined for fprintf() and specifies
> a double conversion for fscanf().

In C9X, %lg and friends *are* defined for fprintf() and friends -- the
l modifier is ignored.  This makes a whole bunch of previously incorrect
programs correct, breaks no known implementations, and makes printf and
scanf formats more nearly parallel.

-Larry Jones

There's a connection here, I just know it. -- Calvin



Fri, 20 Oct 2000 03:00:00 GMT  
 
 [ 25 post ]  Go to page: [1] [2]

 Relevant Pages 

1. What's so bad about scanf anyway???

2. Bad data in cryptography while closing memory stream.

3. Stream state not cleaned on close()

4. Input buffer state after fgets() returns NULL

5. Bad Input

6. Bad input, from a text file.

7. Only number input thru scanf()

8. Help with scanf and double float input

9. Q: Checking scanf input for variable type

10. repeat scanf calls with RETURN as input

11. handling improper input to "scanf()"

12. Unable to sync input and output streams

 

 
Powered by phpBB® Forum Software