Newbie Help Please: Reading into a list from a file 
Author Message
 Newbie Help Please: Reading into a list from a file

Hi there,

...

Quote:
> Solution depends a lot on how the words are separated.

Speaking of file formats, tab-delimited text is a very common one (where
spaces are part of fields), and there are a bunch of others.  What is
the common practice here?  It's very easy to quickly put together some
code, I am just wondering if people prefer do this, or use some public
interface libraries that maybe cover multiple formats such as .csv, .dbf
or .wk1 in the spirit of reuse.

Regards
Robert



Sun, 14 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

Quote:

> Hi all, I'm having some trouble reading from a file.
> The text file is in the form:

> Cat.
> Bird.
> Dog.

> I need to do the equivalent of (setq animals '(Cat. Bird. Dog.)) but I
> need to read the elements of the list (however many there are in the
> file) from the text file.

I'm assuming this is not a homework problem.  (No one I know ever assigns
anything useful like file I/O for homework. Sigh.)

Solution depends a lot on how the words are separated.
READ reads lisp expressions, and "Cat.", etc. are technically
lisp expressions. To retain case but still use READ,
you have to use an appropriate readtable.

READ-LINE will read lines of text.  The result is a string, which I
would think would be better than a symbol.  I can't really seriously
believe you want symbols with dots in their names, but it is a possible
thing.  See the function INTERN if you want to convert a string to
a symbol.

 (defun read-the-file (filename)
   (with-open-file (stream filename)
     (loop for line = (read-line stream nil nil)
           while line
           collect line)))

will return ("Cat." "Bird." "Dog.")
If instead you use (read stream nil nil), you'll get
 (CAT. BIRD. DOG.)
If you use (intern (read-line stream nil nil)) you'll get
 (|Cat.| |Bird.| |Dog.|)
You could also write your own reader to deal with custom separator chars and
return value type.  For example:

 (defun whitespace? (ch)
   (or (eql ch #\Space)
       (eql ch #\Tab)
       (eql ch #\Newline)))

 (defun peek-char-after-whitespace (stream)
   (loop for ch = (read-char stream nil nil)
         while ch
         when (not (whitespace? stream))
           do (return ch)))

 (defun read-word (stream)
   (let ((ch (peek-char-after-whitespace stream)))
     (when ch
       (intern (with-output-to-string (str)
                 (write-char ch str)
                 (loop for ch = (read-char stream nil nil)
                       while (and ch (not (whitespace? ch)))
                          do (write-char ch str)))))))

Then if you use (read-word stream) instead of the (read-line stream nil nil)
you will end up able to have Cat. and Bird. and Dog. all on one line with
only whitespace between.  You also have better control over what happens
if you do "Cat, Dog, etc." since "," is a character that Lisp doesn't want
to see in places that English likes it to be.

I only did very cursory testing on the above, so it's possible I goofed
somewhere, but it should be close.  For doc on how the various
operators involved work, see the Common Lisp HyperSpec at
http://www.harlequin.com/education/books/HyperSpec/FrontMatter/index....



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

Quote:

> Hi there,


> ...
> > Solution depends a lot on how the words are separated.

> Speaking of file formats, tab-delimited text is a very common one (where
> spaces are part of fields), and there are a bunch of others.  What is
> the common practice here?  It's very easy to quickly put together some
> code, I am just wondering if people prefer do this, or use some public
> interface libraries that maybe cover multiple formats such as .csv, .dbf
> or .wk1 in the spirit of reuse.

I don't personally know of a library that does this, but there may
be one.  You could poke around at the ALU's interim web site.
 http://www.elwoodcorp.com/alu/

The thing is, though, it's so completely trivial to write that many
people probably don't include a library just because finding the name
of library name to use could take about as long as writing the 10 lines
of code.  I don't mean to attach a value judgment to that; I'm all for
having shared libraries.  But as a practical matter, people do resist
writing them when the amount of work they save is relatively small.



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

Quote:

> Speaking of file formats, tab-delimited text is a very common one (where
> spaces are part of fields), and there are a bunch of others.  What is
> the common practice here?  It's very easy to quickly put together some
> code, I am just wondering if people prefer do this, or use some public
> interface libraries that maybe cover multiple formats such as .csv, .dbf
> or .wk1 in the spirit of reuse.

This is heresy of the worst kind, but when I have to do this I use
the normal string-bashing tools -- some combination of awk, sed, perl
and other normal Unix stuff -- to read the format and spit out
something Lisp can read easily.  That lets me do the interesting bit
in Lisp and the boring bit in tools better suited to boring problems.

I'm reassured by the fact that people I know who do really serious
data-mashing stuff in C *also* use this technique (perl for input
processing basically).

--tim



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Quote:
>For doc on how the various operators involved work, see the Common
>Lisp HyperSpec at
>www.harlequin.com/education/books/HyperSpec/FrontMatter/index.html

Is someone still maintaining it, correcting typos, etc.?

                                                       /L/e/k/t/u

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.0.2i

iQA/AwUBNygYSP4C0a0jUw5YEQKfNwCggWrDBVW20a1KYrQUYDFYcVZ+ddEAoLQl
t10X2cDB8mcYzTb3teqTEdYb
=6q7C
-----END PGP SIGNATURE-----



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file


Quote:

>> Hi there,


>> ...
>> > Solution depends a lot on how the words are separated.

>> Speaking of file formats, tab-delimited text is a very common one (where
>> spaces are part of fields), and there are a bunch of others.  What is
>> the common practice here?  It's very easy to quickly put together some
>> code, I am just wondering if people prefer do this, or use some public
>> interface libraries that maybe cover multiple formats such as .csv, .dbf
>> or .wk1 in the spirit of reuse.

>I don't personally know of a library that does this, but there may
>be one.  You could poke around at the ALU's interim web site.
> http://www.elwoodcorp.com/alu/

>The thing is, though, it's so completely trivial to write that many
>people probably don't include a library just because finding the name
>of library name to use could take about as long as writing the 10 lines
>of code.  I don't mean to attach a value judgment to that; I'm all for
>having shared libraries.  But as a practical matter, people do resist
>writing them when the amount of work they save is relatively small.

You might find 'split-sequence' useful. The implementation given below was
co-evolved in this newsgroup half a year ago:

;;; full-fledged version ala position
(defun split-sequence (delimiter seq
                           &key
                           (empty-marker nil keep-empty-subseqs)
                           (from-end nil)
                           (start 0)
                           (end nil)
                           (test nil test-supplied)
                           (test-not nil test-not-supplied)
                           (key nil key-supplied)
                           &aux
                           (len (length seq)))

  "Return list of subsequences in SEQ delimited by DELIMITER.
   If an EMPTY-MARKER is supplied, empty subsequences will be
   represented by EMPTY-MARKER, otherwise they will be discarded.
   All other keywords work analogously to POSITION."

  (unless end (setq end len))

  (when from-end
    (setf seq (reverse seq))
    (psetf start (- len end)
           end (- len start)))

  (loop with other-keys = (nconc (when test-supplied (list :test test))
                                 (when test-not-supplied (list :test-not test-not))
                                 (when key-supplied (list :key key)))
        for left = start then (+ right 1)
        for right = (min (or (apply #'position delimiter seq :start left other-keys)
                             len)
                         end)
        if (< left right)
        collect (subseq seq left right)
        else when keep-empty-subseqs collect empty-marker
        until (eq right end)))

Splitting tab-delimited strings then is just:

USER(13): (split-sequence #\tab (coerce '(#\a #\tab #\space #\b) 'string))
("a" " b")

You can even abuse the :test keyword to deal with the original example:

USER(14): (split-sequence '(#\space #\tab #\newline) "Cat.
Bird.
Dog.
"
:test #'(lambda (l x)(member x l)))
("Cat." "Bird." "Dog.")

That way one could read in the complete file at once into a string (using READ-SEQUENCE)
and do all the parsing in Lisp.

cheers, Bernhard
--
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for  http://www.ai.univie.ac.at/~bernhard/



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

Quote:


> > Speaking of file formats, tab-delimited text is a very common one (where
> > spaces are part of fields), and there are a bunch of others.  What is
> > the common practice here?  It's very easy to quickly put together some
> > code, I am just wondering if people prefer do this, or use some public
> > interface libraries that maybe cover multiple formats such as .csv, .dbf
> > or .wk1 in the spirit of reuse.

> This is heresy of the worst kind, but when I have to do this I use
> the normal string-bashing tools -- some combination of awk, sed, perl
> and other normal Unix stuff -- to read the format and spit out
> something Lisp can read easily.  That lets me do the interesting bit
> in Lisp and the boring bit in tools better suited to boring problems.

For a while I was exchanging numerical data files a lot between Clasp
(a Lisp stat package) and other applications, and I settled on the
fairly useful hack of putting a list of numbers on every line, with
tabs between all of the numbers *and* between the open paren and the
first number and the last number and the close paren:

(       1       2       3       )

This let my Lisp program read things in normally, and just created a
couple of garbage columns in other stat packages I was using.

--
Rob St. Amant



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

Quote:



> >For doc on how the various operators involved work, see the Common
> >Lisp HyperSpec at
> >www.harlequin.com/education/books/HyperSpec/FrontMatter/index.html

> Is someone still maintaining it, correcting typos, etc.?

This is a popular question.  The answer is a good deal more complicated
than you probably expected.  Here goes...

I am going to answer for what I know, but you should keep in mind that
I don't speak for Harlequin, who claim the name Common Lisp HyperSpec
as a trademark, and who own copyright in the hypertext markup.  (The
underlying text copyright ownership is an issue I'll speak to people
about privately if they approach me about it, but I try not to comment
about in public.)  If your question is one of corporate policy of the
document owner, you must ask Harlequin.  Information by me below should
be regarded as purely anecdotal, historical, trivia, and the like:

Even when I was at Harlequin, no one was "maintaining it and
correcting typos" in the sense that you probably mean.  That is, the
typos are largely in the underlying ANSI CL spec, not in the hypertext
layer of the document.  (I was and am redirecting typos reported about
CLHS as implicit requests for J13 to do something, but that's a
separate matter.)  It was/is important to the integrity of the
document that the hypertext be precisely what is in the ANSI CL
hardcopy.  Once you fix typos, a divergence arises, and some such
divergences could create material disputes over meaning.  I and others
wanted to avoid that where possible.  True typos are things you can read
past; if they are "typos that matter" one must be very wary of fixing
them quietly.  And historical documents are historical documents; one
doesn't update spellings in the Declaration of Independence (or whatever
your country's equivalent of that might be :-).

ANSI CL is still maintained through the ANSI process (NCITS committee
J13, formerly known as X3J13).  I and others will continue to be doing
that, but that's a long-arc timeline between updates.  A J13 meeting
is coming up, though.

Back to CLHS, as I said, its status is something you could approach
Harlequin to ask about, since that particular hypertextification item
is copyrighted by them.  There was some talk of having me continue to
maintain it, but it was left in limbo for various reasons I'm going to
try not to go into here.  [Bottom line: if they want me to do it, they
need to contact me and talk to me about the terms under which that
might be done.  They should not think they are waiting for me to
contact them.  If I were to decide to do something new, it would
probably be to start over from the public TeX sources and write
all-new code to do the conversion so that the result was mine to
control and I didn't have to risk later having to again ask someone
else's permission for the right to update something that came from the
sweat of my own brow, as it were.  I'm not necessarily likely to mount
such an effort, especially absent funding to do so, but that would be
what I would be inclined to do if I did get the urge, I guess is what
I'm saying.]

At any rate, the virtue of CL qua language is its stability, so the
fact that documents about it don't change regularly is not an
automatic thing to panic about.

Little known CLHS versioning trivia:

 Last I checked, the main version of CLHS that Harlequin distributes
 is version 4.  Versions 1 and 2 were internal only; you never saw
 them unless you worked at Harlequin.  Version 3 was the initial
 rollout; most people probably have that.  You can find the version
 identifier in the HTML source code of every page.  I recommend that
 you do NOT race to replace v3 with v4; the *only* change is a
 one-word legally required change in a trademark claim to claim
 "Liquid Common Lisp" instead of "Lucid Common Lisp".  It's not worth
 downloading a whole new copy for that.

 There is a version 5 in existence, though.  It is different in
 substance in several ways: it contains 8.3 dos-style filenames, so
 probably works better on the Mac (there being 2 32-character-long
 filenames in Version 3 which exceed the 31-character Mac limit).
 Version 5 differs also in that it has some minor corrections to the
 HTML markup, and majorly better indexing of the format ops and
 sharpsign read macros.  (The CLHS index is not part of the underlying
 X3J13 document, so is something I could update without deviating
 from the ANSI CL spec.) Version 5 also does not have the dorky little Java
 widget on the Symbol Index page that never worked right for me back
 when version 3 first issued (earlier versions of Netscape, and all that)
 and that finally got me fed up with Java enough to remove it
 in version 5.  ("Write once, debug everywhere."  I got tired of doing
 so.) In house, some fans of that widget complained, but their complaints
 fell on my deaf ears. Java might be stable enough to have put it back,
 but I never got around to doing that before I, uh, "left" Harlequin.
 Anyway, if you liked that Java widget as your customary interface,
 version 5 might seem like a bit of a downgrade.  I'd always meant to
 make a v6 to fix that... Oh well.

 [Free advice to Harlequin for what it's worth:
 Because so many people have by now probably bookmarked individual
 pages within CLHS (against my examples, btw; I have always stubbornly
 resisted posting individual pointers to pages, preferring instead
 to cite the main page and give English navigation instructions to
 the detail page in order to preserve the possibility of changing
 the internal URLs without invalidating a zillion DejaNews items),
 it would not be a good plan, in my personal opinion, for Harlequin
 to wholesale replace v3 with v5 on their web site without ALSO
 either (1) making a shadow directory containing HTML stubs for
 each of the old pages, redirecting people to each of the corresponding
 new pages, or (2) perhaps easier to do: telling the Harlequin web
 site server to specially redirect all references to books/Hyperspec to
 books/CLHS/Front/index.htm, which is the name of the cover page in the
 DOS/8.3 filenaming scheme that v5 uses.  Absent such a compatibility
 plan, I'd recommend staying with v4 on the web site, but maybe that's
 just me.]

 Incidentally, don't panic that v5 DOS/8.3 names are shorter--I
 went to enormous trouble to make them also be "predictable" in case
 there are people out there who like to think they know the algorithm
 for page naming and type it in raw; the 8.3 filenames are also fairly
 "predictable", after a fashion.  That is, the algorithm, though
 different, it is intended to be learnable.  Coming up with an invertible
 and human-readable algorithm for saying the chapter names to have 21.1
 not get confused with 2.1.1 and still fit in 8 characters was fun.
 A sample is: CLHS/Body/21_aaaa.htm, which is 21.1.1.1.1
 The use of alphabetics accomodates some section numbers that roll
 above 9 but fortunately don't get above 26.

Oh, and in answer to the big question some of you were probably
wondering if I'd get to: To my knowledge, the only way you can get
version 5, by the way, is to get a LispWorks.  Though the free
Personal Edition has it, so it's not like you have to pay dollars.  It
is not, to my knowledge, available as a separate item at their web
site--but then, I haven't looked recently.

And, on balance, the pressure for CLHS to be THE source of hypertext
lisp doc is less these days because Franz has an approximate
equivalent of the hyperspec that it associates with its product as
well.  (I think one reason you don't hear as much about it is that
they didn't give it a jazzy name--or a name at all that I can
discern.)  But it seems to have essentially the same underlying
reference text.  My impression is that it might have been produced
from the last "draft" of the CL specification instead of the final
version, but if so that's only a legal matter (which I'm going to try
not to go into here because it's a rat's nest), not a technical one,
since the technicalese in the last draft and the final version was
identical.

One thing all this version stuff should tell you is that there's a
tension in the world between "the need to fix typos" and "the need to
upgrade".  If typos were being fixed all the time, people would want
to download copies all the time.  And that would mean there would be a
zillion subtly different versions all over the place.  While at
Harlequin, when I had a say in such things, I generally resisted
making much noise about different versions because it seemed like a
lot of effort for people to download a new version for remarkably
little benefit.  At some point, a new version will be needed, but I
think for now the main issue is the care and feeding of the standard,
not the care and feeding of its webification.  And that's in the hands
of a committee, not some single individual.  But "web versioning"
is still very much a great "unsolved problem".  Coordinating updates
to something depended on world-wide is tricky; ANSI has long
made a whole business out of it.



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

Quote:

> You might find 'split-sequence' useful.

Certainly a useful function to have.

Quote:
> That way one could read in the complete file at once into a string
> (using READ-SEQUENCE) and do all the parsing in Lisp.

For bounded-size files.  A serious virtue of the other approach is
that it doesn't require you redundantly buffer the whole file's contents
in memory.  This exercise in parsing clearly requires a minimum of
state on an ongoing basis, and while the solution you propose has
that kind of APL feel of piping two powerful operators together to
get a nice result, it's not the best way to teach a newbie how to
make good engineering choices in a lot of practical settings.
Even if the file size starts small, it might grow, and then people
start to wonder what's taking up so much space.  If the wrong person
looks in to fixing it, not knowing there are alternatives, it can earn
Lisp a bad name for appearing to "not having the good way to do things",
and what was a hack for pleasant convenience can turn into a reason
that someone at a certain shop thinks Lisp is never appropriate
for serious use.

Things like split-sequence should be used where there is strong
confidence that the dataset size is bounded.  The mere mention of "file"
makes me nervous in that regard.  Most text editors make it painful
enough to parse individual long lines that I'm pretty comfortable about
split-sequence being used to split a "line" or a "token", but not a "file".
Even though at an abstract level there is an unbroken continuum between
tokens, lines, and files, and you can think of files as "mere tokens"
conceptually, the practical fact is that there are subtle psychological
shifts we make as we move from one datastructure to another, and I think
when most people say "file", they mean "might have arbitrary length"
and when m ost people say "line" they mean "probably has bounded length,
usually less than 256."  I feel pretty comfortable allocating
 (make-array 256 :element-type 'character :adjustable t :fill-pointer 0)
for line buffers, for example, without worrying these will grow under
normal use, and without worrying I have to re-adjust them back down in
size periodically if they do grow.  I feel a lot less sure of file buffers.

None of this really contradicts anything you said.  I just worry for
newbies (since that was what the subject line said was involved) who
might be looking on and thinking this was the green light to not learn
about conventional I/O tools, which are there and should be used
sometimes.

And all just my personal opinion, of course.  Other perspectives welcome.



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

Quote:


>> Hi there,


>> ...
>> > Solution depends a lot on how the words are separated.

>> Speaking of file formats, tab-delimited text is a very common one (where
>> spaces are part of fields), and there are a bunch of others.  What is
>> the common practice here?  It's very easy to quickly put together some
>> code, I am just wondering if people prefer do this, or use some public
>> interface libraries that maybe cover multiple formats such as .csv, .dbf
>> or .wk1 in the spirit of reuse.

> I don't personally know of a library that does this, but there may
> be one.  You could poke around at the ALU's interim web site.
>  http://www.elwoodcorp.com/alu/

For future reference (the original poster didn't ask about .csv formats), I
wrote a .csv reader/writer some years ago.  It's on my web site at
<http://www.teleport.com/~dlamkins/ftp-catalog.html#csv-streams>.

--
David B. Lamkins <http://www.teleport.com/~dlamkins/>

There are many ways to abbreviate something, but only one way not to.



Mon, 15 Oct 2001 03:00:00 GMT  
 Newbie Help Please: Reading into a list from a file

|  (defun peek-char-after-whitespace (stream)
|    (loop for ch = (read-char stream nil nil)
|          while ch
|          when (not (whitespace? stream))
|          do (return ch)))

  I'd've used (peek-char t stream nil nil) for this.  have I read the
  specification too well, again?  :)

#:Erik



Tue, 16 Oct 2001 03:00:00 GMT  
 
 [ 27 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Reading and then Writing to a file - small newbie question - please

2. Newbie: Dynamically create a Tree List with items read from multiple files

3. Please help newbie Haskell Q: Lists from Strings

4. Please help Visual Basic Programer with simple read/write text file routine---Real Basic 2.1

5. Please help: can't read .dbv memo file

6. Please Help: Syntax for reading from a file.

7. How to read a text file - please help

8. please help with lindex reading from text file

9. list of lists?? please help

10. Graphics help by newbie, please help

11. the represent of the LIST in the HEEP (please read it)

12. please, please, please, please, help

 

 
Powered by phpBB® Forum Software