ANS READ-LINE question 
Author Message
 ANS READ-LINE question

In 11.6.1.2090 READ-LINE ( c-addr u1 fileid -- u2 flag ior ) should the line
terminator(s) be read when the length of the line (excluding terminator(s))
is exactly u1 characters?

From the sentence "The line buffer provided by c-addr should be at least u1
+ 2 characters long" I initially inferred that they should. However in this
case u2 will be equal to u1 and further down we have "When u1 = u2, the line
terminator has yet to be reached" which implies that they should not. Which
interpretation is correct?

Philip.



Fri, 23 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

Quote:

>In 11.6.1.2090 READ-LINE ( c-addr u1 fileid -- u2 flag ior ) should the line
>terminator(s) be read when the length of the line (excluding terminator(s))
>is exactly u1 characters?
>From the sentence "The line buffer provided by c-addr should be at least u1
>+ 2 characters long" I initially inferred that they should. However in this
>case u2 will be equal to u1 and further down we have "When u1 = u2, the line
>terminator has yet to be reached" which implies that they should not. Which
>interpretation is correct?

It looks like they should not.  If you want to make sure the terminators are
included and you expect never to have a line longer than 84 characters, you
should make your buffer hold 88 characters and make u1 = 86.  Then when
u2=84 you know you got an 84-character line.  When u2=86 you know you got
a line longer than 84 characters and you haven't reached the terminators.  
When u2=85 then you got a line that was 85 characters plus terminators and
there might be two terminators in positions 85 and 86.

I think the u1+2 rule is to handle unusual conditions on some systems, and
not to hold the terminators generally.  READ-LINE was crafted to be
reasonably easy to implement and to work correctly on a variety of operating
systems, and so you can expect peculiarities that might not show up on any
one of them.

I could be wrong about it, but this is how I read it.



Fri, 23 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question


Quote:
> In 11.6.1.2090 READ-LINE ( c-addr u1 fileid -- u2 flag ior ) should
> the line terminator(s) be read when the length of the line (excluding
> terminator(s)) is exactly u1 characters?

Should not.

This came up in the thread Find the Bug in the fall of 1998,
but the discussion did not seem to make much of an impression.

A couple of quotes from back then:

Quote:
> To me, the ANS document is wrong.  It says the line
> buffer should be at least u1+2 chararacters long.  OK.
> Then READ-LINE reads at most u1 characters and returns
> u2 characters actually read, not including the line
> terminator (0 <= u2 <= u1).  OK.  When u1 = u2, the line
> terminator has not been reached.  Not OK.  Because if
> u1 = u2 , the next READ-LINE will just read the line
> terminator and give u2 = 0 .

> So I make both the line buffer and u1 = linelength+2.
> Then READ-LINE should always read the line terminator.

...

Quote:
> Encouraged by Elko Tchernev, I got my code to work
> with Win32For by following the ANS document: ALLOTing
> u1 + 2 characters and asking for u1 characters from
> READ-LINE.  The resulting file was what I wanted.

> But I'm still not satisfied.  Unless I ALLOT line-length
> + 4 characters and read line-length + 2 characters, I get
> several 0-length reads, which seems to me ugly (and a
> possible source of bugs?).

> So Win32For follows ANS, and I wish ANS were different.

I was speaking as an applications programmer, not as an
implementor.

If you make your buffer 4 (5?) characters longer than the
longest line and read at least 2 (3?) characters more than
the longest line, you should be ok, provided your system
implements READ-LINE correctly, or at least ANSily.

--
Leo Wong

http://www.albany.net/~hello/
The Forth Ring: http://zForth.com/

Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.



Fri, 23 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

I thought that one of the nice things about  READ-LINE  is that
it can handle long lines using a short buffer.

    80 CONSTANT MAXLINE

    MAXLINE 2 + BUFFER: INBUFFER

    : READ  ( fileid -- false | line length true )
        >R

                ABORT" Can't READ-LINE "
            ( line length more) DUP 0= IF

            THEN
        R> DROP ;

    : .LINE  ( str len -- )
        DUP >R  ?TYPE  R> MAXLINE <> ?? CR ;

    ( Common Usage Dependency:
    (   1 CHARS is an address unit; 2's Complement Arithmetic. )

Example.

    MACRO ?REPEAT  " CONTINUE? UNTIL THEN "

    27 CONSTANT ESC-CHAR

    : CONTINUE?  ( -- flag )
        KEY? IF
            KEY ESC-CHAR = ?DUP ?? EXIT
            KEY ESC-CHAR = ?DUP ?? EXIT
        THEN FALSE ;

    0 VALUE FID  \  For fileid.

    : LISTING  CR  BEGIN  FID READ WHILE  .LINE  ?REPEAT  CR ;

(I hope the undefined words are intuitive.)

(
--

)



Fri, 23 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

"P> In 11.6.1.2090 READ-LINE ( c-addr u1 fileid -- u2 flag ior ) should
"P> the line terminator(s) be read when the length of the line
"P> (excluding terminator(s)) is exactly u1 characters?

"P> From the sentence "The line buffer provided by c-addr should be at
"P> least u1 + 2 characters long" I initially inferred that they
should.
"P> However in this case u2 will be equal to u1 and further down we
have
"P> "When u1 = u2, the line terminator has yet to be reached" which
"P> implies that they should not. Which interpretation is correct?

The new-line chars are read twice, the first time for u1=u2, the second
time for u2=0. The first time they go into the additional two bytes at
the end of the provided buffer.



Fri, 23 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question


READ-LINE ( c-addr u1 fileid -- u2 flag ior ):

JT> If you want to make sure the
JT> terminators are included and you expect never to have a line longer
JT> than 84 characters, you should make your buffer hold 88 characters
JT> and make u1 = 86.  Then when  u2=84 you know you got an
84-character
JT> line.  When u2=86 you know you got  a line longer than 84
characters
JT> and you haven't reached the terminators.   When u2=85 then you got
a
JT> line that was 85 characters plus terminators and there might be two
JT> terminators in positions 85 and 86.

Please, no, why those complications? If you expect a maximum of 84 chars
so you need a buffer as specified of 84+2=86 chars. This is what the
definition of READ-LINE specifies explicitely. You need to be aware then
of zero-length lines which need to be discarded. Better to discard empty
lines than to fiddle with odd lengths.

JT> I think the u1+2 rule is to handle unusual conditions on some
JT> systems, and not to hold the terminators generally.  READ-LINE was
JT> crafted to be  reasonably easy to implement and to work correctly
on
JT> a variety of operating systems, and so you can expect peculiarities
JT> that might not show up on any one of them.

The u1+2 is needed if the line is exactly u1 chars long, so the
line-terminator sits in those two bytes and will be ignored since a
re-read is to take place to input an empty line. (After the re-read the
same line-terminator sits at c-addr.)

A proper implementation is not quite trivial if it is to deal correctly
with all combinations which are in normal use on desktop machines: CR,
LF, CR-LF. (Turned out for me to be the most tricky part to resolve
correctly all the combinations near the end of a file.)

I can assure you, to have a proper implementation for sure is worth the
effort, since it will save you easily the time it needed to implement
it. Just found a minor bug with the same issue where Linux does this
wrong when mount is to read /etc/fstab residing in an umsdos partition
(linux-filesystem mirrored into a DOS fat-drive): Took me several hours
to determine that fstab lines were expected with LF even if residing on
a FAT drive in this case. As well I realize gratefully that the old
emacs-clone which I use since long silently presents me what is to be
presented, having a proper implementation for the READ-LINE stuff. I'd
hardly regard Windows Notepad as state of the art in this regards.

Better not to ask for a trivial implementation. It costs too much time.
The additional two bytes are solely there to avoid copying the strings
around as I understand it, so in favour of efficient implementations.
For sure it is not an invitation for quick and dirty hacks.



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

Quote:

>Please, no, why those complications? If you expect a maximum of 84 chars
>so you need a buffer as specified of 84+2=86 chars. This is what the
>definition of READ-LINE specifies explicitely. You need to be aware then
>of zero-length lines which need to be discarded. Better to discard empty
>lines than to fiddle with odd lengths.

How does one distinguish a blank line that needs to be kept from a
zero-length line that needs to be discarded?

Leo Wong

http://www.albany.net/~hello/
The Forth Ring: http://zForth.com



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

Quote:

>Please, no, why those complications? If you expect a maximum of 84 chars
>so you need a buffer as specified of 84+2=86 chars. This is what the
>definition of READ-LINE specifies explicitely. You need to be aware then
>of zero-length lines which need to be discarded. Better to discard empty
>lines than to fiddle with odd lengths.

That makes sense.  In that case, if you actually get a line that's 85
characters will it give you 84 the first time and 1 the next time?

And if it's 84 characters (plus terminators) then you'll get 84 characters
and you'll know to do one extra READ-LINE to flush the terminators?

That sounds sensible.  

So when you get 84 characters followed by 0 characters next time, you know
you had a single line that was 84 characters, while if you get 84 characters
followed by +n characters next time, you know you had a single line that
wouldn't fit into your buffer.

And in general you shouldn't try to do anything with the line terminators
since whatever you do with them won't be portable.  Having the space
available for them is only a convenience to implementors who can make OS
calls and in most cases use whatever the OS gives them without needing to
massage it; there might be no terminator presented, or one or two, and
whatever the OS does the Forth system can handle with minimal fuss.

Quote:
>A proper implementation is not quite trivial if it is to deal correctly
>with all combinations which are in normal use on desktop machines: CR,
>LF, CR-LF. (Turned out for me to be the most tricky part to resolve
>correctly all the combinations near the end of a file.)



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

Quote:

>>Please, no, why those complications? If you expect a maximum of 84 chars
>>so you need a buffer as specified of 84+2=86 chars. This is what the
>>definition of READ-LINE specifies explicitely. You need to be aware then
>>of zero-length lines which need to be discarded. Better to discard empty
>>lines than to fiddle with odd lengths.
>How does one distinguish a blank line that needs to be kept from a
>zero-length line that needs to be discarded?

If the previous line was exactly 84 chars then this line is a zero-length
one that needs to be discarded.

If there's supposed to be a blank line after the 84 char line then it will
be the next one after the one that gets discarded.



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question
It occurs to me that my original question would have been better posed in
terms of where FILE-POSITION is left pointing for the next read, since this
is what really matters to the application programmer. The consensus of
opinion appears to be that after a line containing exactly u1 characters,
FILE-POSITION should point to the line terminator(s) rather than the
character following the line terminator(s).

Leo doesn't like this, but one justification I can see for having it work
this way is that it makes it much easier for an application program to
maintain a line count.

Thank you Jonah, Leo and Ewald for your help.

Philip.



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

Quote:


>> >How does one distinguish a blank line that needs to be kept from a
>> >zero-length line that needs to be discarded?
>> If the previous line was exactly 84 chars then this line is a zero-length
>> one that needs to be discarded.
>> If there's supposed to be a blank line after the 84 char line then it will
>> be the next one after the one that gets discarded.
>So you need to test - a complication.

How could you avoid testing?

When the buffer is full, how do you know whether it reached the terminators
or not?

You could have READ-LINE return another flag to say that, and then you'd need
to do something with the flag....



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

Quote:



> >>Please, no, why those complications? If you expect a maximum of 84 chars
> >>so you need a buffer as specified of 84+2=86 chars. This is what the
> >>definition of READ-LINE specifies explicitely. You need to be aware then
> >>of zero-length lines which need to be discarded. Better to discard empty
> >>lines than to fiddle with odd lengths.

> >How does one distinguish a blank line that needs to be kept from a
> >zero-length line that needs to be discarded?

> If the previous line was exactly 84 chars then this line is a zero-length
> one that needs to be discarded.

> If there's supposed to be a blank line after the 84 char line then it will
> be the next one after the one that gets discarded.

So you need to test - a complication.

Leo
--

http://www.albany.net/~hello/
The Forth Ring:  http://zForth.com



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question
[snip]

Quote:
>Leo doesn't like this, but one justification I can see for having it work
>this way is that it makes it much easier for an application program to
>maintain a line count.

On further consideration I think if it worked the other way it might not
even be possible for an application program to count lines, since ANS does
not appear to *require* any line terminating characters to be read into
memory.

I am in the position of implementing this on a Forth running on top of Java.
Where most implementations would call an OS to get the data I am calling a
Java method over which I have full control. My Java method checks for CR or
LF or both together in either order. At present it passes on to the Forth
system whatever combination of line terminating characters it finds in the
file being read, but there appear to be two other options:

1. Always pass on a single consistent line terminator, say LF, so from the
Forth programmer's point of view the line terminating character really would
be  implementation-defined rather than determined by the file.

2. Don't pass on any line terminating characters at all.

Any opinions as to which of these 3 options would be preferable for the
Forth programmer?

Philip.



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question
On Tue, 08 Jun 1999 07:01:45 -0400, Mary Murphy and Leo Wong

Another annoying thing about READ-LINE is that it leaves the
line terminators in the buffer.  At work (oh,ho, maybe it's
an annoying thing about work), I sometimes have to combine
records from two or more files.  The lines in a file may
have different line lengths.  The combined record is of
fixed length.

I'd like to be able just to do something like:

outpad outlength BLANK
outpad 1length 1fid READ-LINE ...
outpad 1length CHARS + 2length 2fid READ-LINE ...
outpad outlength outfid WRITE-LINE ...

but in addition I have to clear out the line terminators.

Leo Wong

http://www.albany.net/~hello/
The Forth Ring: http://zForth.com



Sat, 24 Nov 2001 03:00:00 GMT  
 ANS READ-LINE question

Quote:
>I'd like to be able just to do something like:
>outpad outlength BLANK
>outpad 1length 1fid READ-LINE ...
>outpad 1length CHARS + 2length 2fid READ-LINE ...
>outpad outlength outfid WRITE-LINE ...
>but in addition I have to clear out the line terminators.

You know the total length will be no more than outlength.
You know the length of both strings to start with.

outpad outlength BLANK
outpad DUP 1length 1fid READ-LINE \ outpad u2 flag ior
.... \ outpad outpad u2
CHARS + DUP 2length 2fid READ-LINE \ outpad+u2 u3 flag ior
.... \ outpad+u2 u3
CHARS + 2 BLANK
outpad outlength outfid WRITE-LINE ...

I see the problem.  You have to keep track of the end of the
combined line so you can blank out the last terminators.  

It might be simpler to think about if you put the lines into
some other buffer and then just move them into the output
buffer.  Then you can ignore the line terminators but you
still have to remember where the first line ends long enough
to store the second line in the right place.



Sat, 24 Nov 2001 03:00:00 GMT  
 
 [ 78 post ]  Go to page: [1] [2] [3] [4] [5] [6]

 Relevant Pages 

1. new question in how can i read data from file line by line

2. READ-LINE Questions

3. Expect question reading more than one var from same line from file

4. reading a line and executing a command on that line

5. how to read text files line by line?

6. reading file line by line

7. reading line by line

8. Reading line-by-line on a socket

9. line-by-line file read

10. Help: how to read line by line...

11. Read a text file line by line

12. Socket reads messed up?? readline long lines (HTTPServer)

 

 
Powered by phpBB® Forum Software