A good idiom for EOF when READing files? 
Author Message
 A good idiom for EOF when READing files?

Hi.

I am trying to learn CL, and was looking at a problem in Paul Graham's
_ANSI Common Lisp_ (Ex. 7.2), where the idea is to define a function
which takes a filename and reads all the s-expressions in the file
into a list which is then returned.  My first attempt looked like
this:

(defun list-expressions (file)
  (with-open-file (instream file :direction :input)
    (do ((sexp (read instream nil 'eof) (read instream nil 'eof))
         (output nil (push sexp output)))
        ((eql sexp 'eof) (nreverse output)))))

but then I noticed that if the file had the symbol EOF in it, then the
READing would be short-circuited.  In particular, if a file "foo" has
the contents:

----------------------------------------
(We should see this)

eof

(This should never be seen)
----------------------------------------

then I see that

* (list-expressions "foo")
((WE SHOULD SEE THIS))

After a bit of thought, it occurred to me that I could use a gensym as
the EOF indicator, as:

(defun list-expressions (file)
  (with-open-file (instream file :direction :input)
    (let ((eof (gensym)))               ;generate a safe EOF marker
      (do ((sexp (read instream nil eof) (read instream nil eof))
           (output nil (push sexp output)))
          ((eql sexp eof) (nreverse output))))))

which lead to

* (list-expressions "foo")
((WE SHOULD SEE THIS) EOF (THIS SHOULD NEVER BE SEEN))

This approach with the gensym seems to work, but it occurred to me
that this must be a well-known issue with READ, and that there is
probably a standard CL idiom to handle this.  I took a look at CLTL2
(in addition to Graham's book), but I didn't see anything obvious.  Is
there a standard idiom for this that I should know about?  (Perhaps
more importantly, is there a good unified source for language style
issues which the newbie should be aware of?)

Thanx.

Dan

P.S.  It just occurred to me that I probably could have solved this
problem with a combination of READ-LINE and READ-FROM-STRING, which
would have avoided the EOF issue entirely, but that seems rather
inelegant.



Fri, 02 Jul 2004 12:11:28 GMT  
 A good idiom for EOF when READing files?

Quote:
> but then I noticed that if the file had the symbol EOF in it, then the
> READing would be short-circuited.  In particular, if a file "foo" has
> the contents:
 [...]
> After a bit of thought, it occurred to me that I could use a gensym as
> the EOF indicator, as:

Yep, that's one common way of dealing with it.  Personally, I use:

  (defvar *eof* (gensym))

So at least this way, I only have one gensym per image used on eof
values.  Plus I find it a tiny bit clearer.

Quote:
> (defun list-expressions (file)
>   (with-open-file (instream file :direction :input)
>     (let ((eof (gensym)))          ;generate a safe EOF marker
>       (do ((sexp (read instream nil eof) (read instream nil eof))
>       (output nil (push sexp output)))
>      ((eql sexp eof) (nreverse output))))))

  (defun list-expressions (file)
    (with-open-file (instream file :direction :input)
      (loop for sexp = (read instream nil *eof*)
            until (eql sexp *eof*)
            collect sexp)))

Oh yeah, unlike Graham, I like LOOP.  Go ahead and use DO until you
feel comfortable with it, and with Lisp in general, then give LOOP a
spin.

Quote:
> This approach with the gensym seems to work, but it occurred to me
> that this must be a well-known issue with READ, and that there is
> probably a standard CL idiom to handle this.  I took a look at CLTL2
> (in addition to Graham's book), but I didn't see anything obvious.  Is
> there a standard idiom for this that I should know about?  (Perhaps
> more importantly, is there a good unified source for language style
> issues which the newbie should be aware of?)

Another approach people take is to use the stream being read from as
the eof value.  Clever because it obviously can't be read in from the
stream, but I like my system-wide *eof* variable.

--
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                              
   |     ) |                              
  (`-.  '--.)                              
   `. )----'                              



Fri, 02 Jul 2004 12:15:17 GMT  
 A good idiom for EOF when READing files?



Quote:

> This approach with the gensym seems to work, but it occurred to me
> that this must be a well-known issue with READ, and that there is
> probably a standard CL idiom to handle this.  I took a look at CLTL2
> (in addition to Graham's book), but I didn't see anything obvious.  Is
> there a standard idiom for this that I should know about?  (Perhaps
> more importantly, is there a good unified source for language style
> issues which the newbie should be aware of?)

A common idiom is (read stream nil stream).  I usually use :eof (unless the
data could conceivably contain that) just for a bit more readability....

Quote:

> P.S.  It just occurred to me that I probably could have solved this
> problem with a combination of READ-LINE and READ-FROM-STRING, which
> would have avoided the EOF issue entirely, but that seems rather
> inelegant.

Why would that avoid the eof problem?  Same issue as read!

--
Coby



Fri, 02 Jul 2004 15:14:06 GMT  
 A good idiom for EOF when READing files?

Quote:

> > P.S.  It just occurred to me that I probably could have solved this
> > problem with a combination of READ-LINE and READ-FROM-STRING, which
> > would have avoided the EOF issue entirely, but that seems rather
> > inelegant.

> Why would that avoid the eof problem?  Same issue as read!

Because READ-LINE can only return strings, so it's fair to use :eof as
an end-of-file return value?

Christophe
--
Jesus College, Cambridge, CB5 8BL                           +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/                  (defun pling-dollar
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)



Fri, 02 Jul 2004 17:52:17 GMT  
 A good idiom for EOF when READing files?

Quote:


>> P.S.  It just occurred to me that I probably could have solved this
>> problem with a combination of READ-LINE and READ-FROM-STRING, which
>> would have avoided the EOF issue entirely, but that seems rather
>> inelegant.

>Why would that avoid the eof problem?  Same issue as read!

Christophe> Because READ-LINE can only return strings, so it's fair to
Christophe> use :eof as an end-of-file return value?

This is certainly what I was thinking.  Are there other issues
involved?

Dan



Fri, 02 Jul 2004 22:55:58 GMT  
 A good idiom for EOF when READing files?


Quote:

> > > P.S.  It just occurred to me that I probably could have solved this
> > > problem with a combination of READ-LINE and READ-FROM-STRING, which
> > > would have avoided the EOF issue entirely, but that seems rather
> > > inelegant.

> > Why would that avoid the eof problem?  Same issue as read!

> Because READ-LINE can only return strings, so it's fair to use :eof as
> an end-of-file return value?

True, true...I was too hasty with that one...

--
Coby



Sat, 03 Jul 2004 00:01:06 GMT  
 A good idiom for EOF when READing files?

Quote:
> I am trying to learn CL, and was looking at a problem in Paul Graham's
> _ANSI Common Lisp_ (Ex. 7.2), where the idea is to define a function
> which takes a filename and reads all the s-expressions in the file
> into a list which is then returned.  My first attempt looked like
> this:

> (defun list-expressions (file)
>   (with-open-file (instream file :direction :input)
>     (do ((sexp (read instream nil 'eof) (read instream nil 'eof))
>     (output nil (push sexp output)))

You don't need a push here, just a cons (see below).

Quote:
>    ((eql sexp 'eof) (nreverse output)))))

> but then I noticed that if the file had the symbol EOF in it, then the
> READing would be short-circuited.  In particular, if a file "foo" has

This is of course the case.  In order to avoid that problem, you
really want to provide an eof-value, that cannot possibly be read from
the given stream.  Since read can (especially when *read-eval* is
true) more or less return arbitrary data, there are not many possible
solutions.  One of them is creating a fresh uninterned symbol, via
gensym, as you suggested, e.g.:

(defun list-expressions (file)
  (with-open-file (instream file :direction :input)
    (do* ((eof-value (gensym))
          (sexp (read instream nil eof-value) (read instream nil eof-value))
          (output nil (cons sexp output)))
        ((eql sexp eof-value) (nreverse output)))))

Since the symbol isn't interned in any package, the reader cannot
possibly find it, and since it cannot get at it in any other way
(e.g. via a special-binding and #.), there is no way that read could
return it at any time other than EOF.  You can also use other
"non-interned", freshly consed structures, like e.g. a fresh cons
cell.

A cleverer (and cheaper) approach -- thanks to KMP for pointing that
one out -- is to use the stream object itself as the EOF value, which,
if it isn't bound to a special variable somewhere -- something we can
rule out if we created the stream object ourselves -- is again not a
possible value of read:

(defun list-expressions (file)
  (with-open-file (instream file :direction :input)
    (do ((sexp (read instream nil instream) (read instream nil instream))
         (output nil (cons sexp output)))
        ((eql sexp instream) (nreverse output)))))

Quote:
> This approach with the gensym seems to work, but it occurred to me
> that this must be a well-known issue with READ, and that there is
> probably a standard CL idiom to handle this.  I took a look at CLTL2
> (in addition to Graham's book), but I didn't see anything obvious.  Is
> there a standard idiom for this that I should know about?  (Perhaps
> more importantly, is there a good unified source for language style
> issues which the newbie should be aware of?)

There is a collection of style hints, presented by Peter Norvig and
Kent Pitman as a slide show at LUV'93, available on Norvig's website
at http://www.norvig.com/luv-slides.ps

There are also lots of other style guidelines, many of which you can
find with a search on www.google.com with the keywords "Lisp style".

Regs, Pierre.

--

 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein



Fri, 02 Jul 2004 23:30:25 GMT  
 A good idiom for EOF when READing files?


Quote:
>Yep, that's one common way of dealing with it.  Personally, I use:

>  (defvar *eof* (gensym))

>So at least this way, I only have one gensym per image used on eof
>values.  Plus I find it a tiny bit clearer.

Not iron-clad though. The file could contain #.*eof* and then
read would return your EOF object.

  (let ((eof (cons nil)))
    ...

is cheaper than gensym, and safe, as is using the stream
variable itself. I find the former more readable, the latter
cleverer. The latter would be fine if it was standard
coding practice.



Sat, 03 Jul 2004 04:24:00 GMT  
 A good idiom for EOF when READing files?

Quote:



> >Yep, that's one common way of dealing with it.  Personally, I use:

> >  (defvar *eof* (gensym))

> >So at least this way, I only have one gensym per image used on eof
> >values.  Plus I find it a tiny bit clearer.

> Not iron-clad though. The file could contain #.*eof* and then
> read would return your EOF object.

Oops, you're right.  I must confess to using read fairly rarely.
Mostly when I'm reading in something code-like, in which case I impose
the restriction that it's not pathological (or I'll bind *read-eval*
to nil) -- and in my mental taxonomy, #.*eof* is right up there with
#.(ext:quit).  But, yeah, my bad.

--
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                              
   |     ) |                              
  (`-.  '--.)                              
   `. )----'                              



Sat, 03 Jul 2004 14:36:59 GMT  
 A good idiom for EOF when READing files?

Quote:

>(defun list-expressions (file)
>  (with-open-file (instream file :direction :input)
>    (do ((sexp (read instream nil 'eof) (read instream nil 'eof))
>     (output nil (push sexp output)))
>    ((eql sexp 'eof) (nreverse output)))))

[ snip ]
>but then I noticed that if the file had the symbol EOF in it, then the
>READing would be short-circuited.  In particular, if a file "foo" has
>the contents:

One way is to catch the end of input as a condition. There
is a ``Conditions for Dummies'' form called handler-case that
you can use:

(with-open-file (instream file :direction :input)
  (handler-case
    form-that-reads-from-file
    (end-of-file (condition) forms-that-deal-with-condition)))

The end-of-file object passed to the various input functions is just
a way to avoid the condition. But there can be other conditions besides
end-of-file, which you may have to worry about, especially if your
program did not write that file.

The value returned by the forms in the handler-case will be the overall
result of the handler-case if the condition is signaled and handled,
otherwise the value returned by form-that-reads-from-file will be
the result.

This means you can still have an ambiguity at this level; the result of
the overall with-open-file form might be mistaken to be the object read
from the file. But here you have the option of being rescued by multiple
values, rather than playing tricks to create a unique object that cannot
possibly be read from the file:

(with-open-file (instream file :direction :input)
  (handler-case
    (... (values whatever T))
    (end-of-file (condition) ... (values nil nil))))

It may seem a little circuitous for simple reads. But if you are already
handling some other conditions, then adding a case for end-of-file
is easy.



Sat, 03 Jul 2004 15:23:34 GMT  
 A good idiom for EOF when READing files?

Quote:



>>Yep, that's one common way of dealing with it.  Personally, I use:

>>  (defvar *eof* (gensym))

>>So at least this way, I only have one gensym per image used on eof
>>values.  Plus I find it a tiny bit clearer.

>Not iron-clad though. The file could contain #.*eof* and then
>read would return your EOF object.

>  (let ((eof (cons nil)))
>    ...

>is cheaper than gensym, and safe, as is using the stream
>variable itself. I find the former more readable, the latter
>cleverer. The latter would be fine if it was standard
>coding practice.

How about this, wouldn't this be even cheaper?

   (let ((eof '#:EOF)) ...)

The uninterned symbol object is create when the let form is read
and incorporated into it, correct?  So when you evaluate the form,
it can just ``pull out'' the object and not have to cons anything.

Or is there some non-obvious (to me) problem?



Sat, 03 Jul 2004 15:26:12 GMT  
 A good idiom for EOF when READing files?

Quote:

>How about this, wouldn't this be even cheaper?

>   (let ((eof '#:EOF)) ...)

>The uninterned symbol object is create when the let form is read
>and incorporated into it, correct?  

The standard says "Every time this syntax is encountered, a
distinct uninterned symbol is created." So it's not the
same symbol your variable EOF has. When the LET is read,
a new symbol is generated. Symbols have stuff attached
to them, which is why they're not as cheap as (list nil)
which is 1 cons cell.

It *is* cheaper then (gensym) because the symbol is
genearated only once, at read time. But then you could go
even cheaper with

  (let ((eof #.(list nil))) ...)

Wiser heads than mine have to say how safe these read-time
non-global values are.



Sun, 04 Jul 2004 02:32:59 GMT  
 A good idiom for EOF when READing files?

Quote:



>>How about this, wouldn't this be even cheaper?

>>   (let ((eof '#:EOF)) ...)

>... But then you could go
>even cheaper with

>  (let ((eof #.(list nil))) ...)

>Wiser heads than mine have to say how safe these read-time
>non-global values are

A wiser head via email pointed out that I should've written
'#.(list nil), since (nil) is not an evaluatable form, but
'#.(list nil) would be open to coalescing by the fasloader
to another already existing (nil) object. That means it
could become the result of a read.

The same wiser head noted that '#:eof wouldn't be
coalesced, but you could still be in other trouble if
this code is inside a recursive reader.

I would think that aspect argues against *any*
one-time constant, and for using the stream, (list nil),
(gensym), or some other cheap dynamically generated
object.



Sun, 04 Jul 2004 23:09:13 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. Q: EoF marker and reading file

2. Problem reading data from a file (encounter EOF after second last record)

3. eof fails when reading encrypted data in text file

4. Whats best way to detect EOF from read?

5. I want to stop a while condition when I read a EOF of text file

6. I want to stop a while condition when I read a EOF of text file

7. file.read(num) returns empty string before EOF.

8. HOW READ A FILE UNTIL EOF ???

9. Help with binary files, read, and eof

10. EOF condition in (read) from file

11. eof on channel, not eof on transform: [eof] returns true

12. Best way of reading data IN the same file of a scheme program

 

 
Powered by phpBB® Forum Software