list formats 
Author Message
 list formats

I think the way I posed the question was too complicated in latter
postings (under the name loop problem) so I try to samplify it:

we have this list:

(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
hui))"))

and want to convert it to this format:

(("car" ((noun automoblie sks hui)(noun vehicle sks hui))
  "is" ((verb hji ska) (noun ako hui))
  "blue" ((adj hji hui)))
 ("strong" ((adj hji hui) (adv ly hui))
  "is" ((verb hji ska) (verb kos hji))
  "man" ((noun ako hui) (prop hui))))

appriciate any guidelines
(for more details, if needed please look at the postings, under "loop
problem"

tnx

ab talebi



Tue, 13 Jul 2004 18:25:19 GMT  
 list formats

Quote:

> I think the way I posed the question was too complicated in latter
> postings (under the name loop problem) so I try to samplify it:

I have been following your postings for a while.  I believe a little
Socratic method would help here.

Quote:

> we have this list:

> (("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> hui))"))

Before we go on, please answer the following question.

What are the elements of this list?

Ciao

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.



Wed, 14 Jul 2004 00:27:52 GMT  
 list formats

Quote:

>I think the way I posed the question was too complicated in latter
>postings (under the name loop problem) so I try to samplify it:

>we have this list:

>(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
>ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
>ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
>hui))"))

>and want to convert it to this format:

>(("car" ((noun automoblie sks hui)(noun vehicle sks hui))
>  "is" ((verb hji ska) (noun ako hui))
>  "blue" ((adj hji hui)))
> ("strong" ((adj hji hui) (adv ly hui))
>  "is" ((verb hji ska) (verb kos hji))
>  "man" ((noun ako hui) (prop hui))))

Here's my solution.  It's untested and doesn't do any error checking.

(defun parse-my-list (list)
  (loop for string in (car list)
        collect (parse-my-string string)))

(defun parse-my-string (string)
  "Parse a string of repeating "<name> <list-of-attributes> ..."
  (loop with end = (length string)
        with last-end
        for start = 0 then last-end
        while (start < end)
        collect (let* ((space-pos (position #\space string :start start))
                       (word (subseq string start space-pos)))
                  (setq start (1+ space-pos))
                  word)
        collect (multiple-value-bind (list end-pos)
                      (read-from-string string t nil :start start)
                  ;; skip over whitespace after the list
                  (setq last-end
                        (position #\space string :start end-pos :test #'char/=))
                  list)))

--

Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.



Wed, 14 Jul 2004 00:41:32 GMT  
 list formats


Quote:


> > I think the way I posed the question was too complicated in latter
> > postings (under the name loop problem) so I try to samplify it:

> I have been following your postings for a while.  I believe a little
> Socratic method would help here.

> > we have this list:

> > (("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> > ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> > ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> > hui))"))

> Before we go on, please answer the following question.

> What are the elements of this list?

> Ciao

Thank you for your answer.

To answer your question first:

This list has to strings. The elements of the first string are: car, is and
blue

The elements of the second string are: strong, is and man

----------------------------

Please take a minute to read this while I explain the problem's origin.

This is an extract of  my original corpus-file (c:\corpus-all.txt) is like
this:

LEXEME         vehicle

CLASSIFICATION    M1x

ARTNR           27405

DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))

=

LEXEME         people

ARTNR           27406

DEF     strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos
hji)) man ((noun ako hui) (prop hui))

DEF     beautiful ((adj hji hui) (adv ly hui)) women ((verb hji ska) (verb
kos hji)) are ((noun ako hui) (prop hui)) successfull ((adj sko hui) (adv ly
hui))

=

the enteries are followed by the =sign and I put them in a variable

(setf corpus-all "c:\corpus-all.txt")

The very first thing I have to do is to make this file more lisp-friendly.
This is a task for "read-database". Let me first give you the code:

(defun read-entry (stream)

  (loop

      for line = (read-line stream nil nil)

      when (null line)

      return stream

      until  (string-equal line "=")

      collect

            (multiple-value-bind (key position) (let ((*read-eval* nil))

(read-from-string line))

              (loop

                  while (eq #\tab (char line position))

                  do (incf position))   ; skip multiple tabs

              (list key

                        (subseq line position)))))

(defun read-database (pathname)

  (with-open-file (stream pathname :direction :input)

    (loop

            for address = (read-entry stream)

            until (eq address stream)

            collect address)))

The problem I can not solve with this function is that it converts the
values into strings so we end up having:

(LEXEME "vehicle")

(CLASSIFICATION "M1x")

(ARTNR "27405")

(DEF "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))")

Whereas I would like to have:

((LEXEME (vehicle))

(CLASSIFICATION (M1x))

(ARTNR (27405))

(DEF (car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui)))))

I think this is the problem that I carry on throught out the whole program.

Do you know what I can do to resolve this problem?

Then we go on .. The most important part of the corpus is the DEF-part so I
write a function that extracts the DEF part:

(defun find-category-values (category record)

  (loop for (current-category value) in record

    when (eq current-category category) collect value))

(defun extract (category value list)

  (if (null list)

      '()

      (let ((record (first list)))

        (if (equal (cadr (assoc 'lexeme record)) value)

            (cons (find-category-values category record)

                  (extract category value (rest list)))

          (extract category value (rest list))))))

(defun get-def-all (list)

  (cond ((null list) nil)

        (t (cons (find-category-values 'def (car list))

                   (get-def-all (rest list))))))

(defun get-def (x)

  (extract 'def x corpus-all))

(get-def "vehicle")

(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska)
(noun ako hui)) blue ((adj hji hui))"))

note that if we fix the "read-database" and "read-entry" we should be able
to call the get-def function like this:

(get-def 'vehicle)

if we can make it so far then I think a lot of problems are already solved.
Once again tanks for helping

tnx

ab talebi



Thu, 15 Jul 2004 18:30:46 GMT  
 list formats


Quote:


> >I think the way I posed the question was too complicated in latter
> >postings (under the name loop problem) so I try to samplify it:

> >we have this list:

> >(("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> >ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> >ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> >hui))"))

> >and want to convert it to this format:

> >(("car" ((noun automoblie sks hui)(noun vehicle sks hui))
> >  "is" ((verb hji ska) (noun ako hui))
> >  "blue" ((adj hji hui)))
> > ("strong" ((adj hji hui) (adv ly hui))
> >  "is" ((verb hji ska) (verb kos hji))
> >  "man" ((noun ako hui) (prop hui))))

> Here's my solution.  It's untested and doesn't do any error checking.

> (defun parse-my-list (list)
>   (loop for string in (car list)
>         collect (parse-my-string string)))

> (defun parse-my-string (string)
>   "Parse a string of repeating "<name> <list-of-attributes> ..."
>   (loop with end = (length string)
> with last-end
> for start = 0 then last-end
>         while (start < end)
>         collect (let* ((space-pos (position #\space string :start start))
>                        (word (subseq string start space-pos)))
>                   (setq start (1+ space-pos))
>                   word)
>         collect (multiple-value-bind (list end-pos)
>                       (read-from-string string t nil :start start)
>                   ;; skip over whitespace after the list
>   (setq last-end
>                         (position #\space string :start end-pos :test
#'char/=))
>                   list)))

unfortenatley it doesn't work, amoung other things because
while (start < end)
doesn't make sence. I tried changing it to while (< start end) but then
(setq start (1+ space-pos)) expectes a number which we don't have, so I
don't know....

tnx

ab talebi



Thu, 15 Jul 2004 18:36:48 GMT  
 list formats

Quote:

>unfortenatley it doesn't work, amoung other things because
>while (start < end)
>doesn't make sence. I tried changing it to while (< start end) but then
>(setq start (1+ space-pos)) expectes a number which we don't have, so I
>don't know....

If it can't find a space, SEARCH will return NIL, so you'll have to handle
this case.

Consider my function the starting point, and debug it further until you get
something that works, just as you would have to do if you had written it
yourself.

--

Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.



Sat, 17 Jul 2004 04:28:26 GMT  
 list formats

Hi,

I am back ready to answer.

Quote:




> > > I think the way I posed the question was too complicated in latter
> > > postings (under the name loop problem) so I try to samplify it:

> > I have been following your postings for a while.  I believe a little
> > Socratic method would help here.

> > > we have this list:

> > > (("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
> > > ska) (noun ako hui)) blue ((adj hji hui))" "strong ((adj hji hui) (adv
> > > ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop
> > > hui))"))

> > Before we go on, please answer the following question.

> > What are the elements of this list?

> > Ciao

> Thank you for your answer.

> To answer your question first:

> This list has to strings. The elements of the first string are: car, is and
> blue

> The elements of the second string are: strong, is and man

Wrong.  If you do not get this right, you are missing the whole point.

The list has one sublist (let's call it L1) that is

("car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))"
"strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop hui))")

L1 has contains two strings
S1:
"car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))"

and
S2:
"strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos hji)) man ((noun ako hui) (prop hui))"

Now S1 and S2 are just strings.  You need to parse them. Both S1 and
S2 have the same structure: a `token' followed by a list (of
attributes).

Let's work bottom up.  Let's take one such string and `parse' it.  Now
let's decide what are the basic data structures we want to
use.

Suppose you want to group the `token' and the attribute list. You may
have something like

(defstruct token
   name
   attributes)

Now let's parse the definition strings.  We will return a list of
`tokens'. (Assumption: no token will appear in the string as `nil' or
`()').

(defun parse-def-string (the-definition)
  (declare (type string the-definition))
  (with-input-from-string (s the-definition)
    (loop for name = (read s nil nil)
          for attrs = (read s nil nil)
          while name
          collect (make-token :name name :attributes attrs))))

As an example on S1

cl-prompt> (parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
(#S(TOKEN
      :NAME CAR
      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
 #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
 #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))

which is what you want.

Are you with me so far?

Ciao

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.



Sat, 24 Jul 2004 05:15:44 GMT  
 list formats


<snip snip>

Quote:

>(defun parse-def-string (the-definition)
>  (declare (type string the-definition))
>  (with-input-from-string (s the-definition)
>    (loop for name = (read s nil nil)
>      for attrs = (read s nil nil)
>      while name
>      collect (make-token :name name :attributes attrs))))

>As an example on S1

>cl-prompt> (parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
>(#S(TOKEN
>      :NAME CAR
>      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
> #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
> #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))

>which is what you want.

>Are you with me so far?

>Ciao

I am following the course like a hungry dog following a bone!! please
go on ...

tnx

ab talebi



Sat, 24 Jul 2004 17:10:00 GMT  
 list formats

Quote:



> <snip snip>

> >(defun parse-def-string (the-definition)
> >  (declare (type string the-definition))
> >  (with-input-from-string (s the-definition)
> >    (loop for name = (read s nil nil)
> >         for attrs = (read s nil nil)
> >         while name
> >         collect (make-token :name name :attributes attrs))))

> >As an example on S1

> >cl-prompt> (parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
> >(#S(TOKEN
> >      :NAME CAR
> >      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
> > #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
> > #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))

> >which is what you want.

> >Are you with me so far?

> >Ciao

> I am following the course like a hungry dog following a bone!! please
> go on ...

Ok.

Now where were we?  We know how to parse the strings which have the
format you describe.  The problem is that these strings are re result
of reading a file whose format does not seem to be all that precise.
I will extrapolate from your example.

You have a number of `definitions' in your file which seem to start
with the word `LEXEME' and end with a `=' (assuming a line oriented
format as you showed).  You have a number of options here.

1 - read the lines and the parse them,
2 - read and parse the lines as you go.

Let's choose 1 for clarity purposes.

Assume we have an open file (i.e. a `stream') where you have a bunch
of `LEXEME' definitions.  Let's write a functions which will return a
list of strings (note: a LIST of STRINGS) comprising one such `LEXEME'

(defun read-lexeme-lines (instream)
  (declare (type stream instream))
  (loop for line = (read-line instream nil)
        while (and line
                   (string/= line "="
                             :start1 0 :end1 (min 1 (length line))
                             :start2 0 :end2 1))
          when (string/= "" line)
            collect line into lexeme-lines
        finally (return lexeme-lines)))

The `(min 1 (length line))' is a trick to keep track of empty lines,
even if they are discarded afterward.  The :start2 and :end2
parameters are now really needed, but that's ok.

Now, the above function reads *one* lexeme from an input stream. An
example of how this work is the following

==============================================================================
cl-prompt> (with-input-from-string (s "
LEXEME         vehicle

CLASSIFICATION    M1x

ARTNR           27405

DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji
ska) (noun ako hui)) blue ((adj hji hui))

=

LEXEME         people

ARTNR           27406

DEF     strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb kos
hji)) man ((noun ako hui) (prop hui))

DEF     beautiful ((adj hji hui) (adv ly hui)) women ((verb hji ska) (verb
kos hji)) are ((noun ako hui) (prop hui)) successfull ((adj sko hui) (adv ly
hui))

=")
  (read-lexeme-lines s))
==============================================================================

Which yields a LIST of STRINGS.

("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR           27405"
 "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji"
 "ska) (noun ako hui)) blue ((adj hji hui))")

This one list in particular contains 4 substrings.

Shall I go on? :)

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.



Sat, 24 Jul 2004 22:57:09 GMT  
 list formats


<snip snip>

Quote:
>(defun read-lexeme-lines (instream)
>  (declare (type stream instream))
>  (loop for line = (read-line instream nil)
>    while (and line
>               (string/= line "="
>                         :start1 0 :end1 (min 1 (length line))
>                         :start2 0 :end2 1))
>      when (string/= "" line)
>        collect line into lexeme-lines
>    finally (return lexeme-lines)))

<snip>

Quote:
>Which yields a LIST of STRINGS.

>("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR           27405"
> "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji"
> "ska) (noun ako hui)) blue ((adj hji hui))")

>This one list in particular contains 4 substrings.

>Shall I go on? :)

yes, please do, but what I don't understand is why we have to have
strings in the first place ? I mean, given a corpus with this format:

LEXEME         vehicle
CLASSIFICATION    M1x
ARTNR           27405
DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is
((verb hji ska) (noun ako hui)) blue ((adj hji hui))
=

(your assumption is correct we have a number of `definitions' in the
corpus that start with the word `LEXEME' and end with a `='  we do
have a line oriented format. The attributes, LEXEME CLASSIFICATION
ARTNR and DEF are seperated from their values by tab)

I'm not sure about what format i the best one to parse the corpus
into, what we finaly need is a bunch of functions that allow us to
extract these values:

vehicle, M1x 27405 car (and the containt of each parantes,)
we need to know for 'is' f.ex. that it is both 'verb' and 'noun'

do you know what I mean?

tnx

ab talebi



Sun, 25 Jul 2004 17:06:46 GMT  
 list formats

Quote:



> <snip snip>

> >(defun read-lexeme-lines (instream)
> >  (declare (type stream instream))
> >  (loop for line = (read-line instream nil)
> >       while (and line
> >                  (string/= line "="
> >                            :start1 0 :end1 (min 1 (length line))
> >                            :start2 0 :end2 1))
> >         when (string/= "" line)
> >           collect line into lexeme-lines
> >       finally (return lexeme-lines)))

> <snip>

> >Which yields a LIST of STRINGS.

> >("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR           27405"
> > "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is ((verb hji"
> > "ska) (noun ako hui)) blue ((adj hji hui))")

> >This one list in particular contains 4 substrings.

> >Shall I go on? :)

> yes, please do, but what I don't understand is why we have to have
> strings in the first place ? I mean, given a corpus with this format:

> LEXEME         vehicle
> CLASSIFICATION    M1x
> ARTNR           27405
> DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is
> ((verb hji ska) (noun ako hui)) blue ((adj hji hui))
> =

> (your assumption is correct we have a number of `definitions' in the
> corpus that start with the word `LEXEME' and end with a `='  we do
> have a line oriented format. The attributes, LEXEME CLASSIFICATION
> ARTNR and DEF are seperated from their values by tab)

I am going one step at a time.  The function I wrote nicely collects
the strings that comprise one definition.  Now, and only now, we
proceed to actually "parse" them.

Quote:
> I'm not sure about what format i the best one to parse the corpus
> into, what we finaly need is a bunch of functions that allow us to
> extract these values:

> vehicle, M1x 27405 car (and the containt of each parantes,)
> we need to know for 'is' f.ex. that it is both 'verb' and 'noun'

> do you know what I mean?

Yes.  But remember my first post.  I am trying to use a Socratic
method to make you understand how to carry on this task.
Incidentally, the corpus file format sucks, but that is another
problem.

Let' proceed to "parse" the list of strings into a more useful
object.  Remember.  We have a LIST of STRINGs which comprise ONE
definition.

First of all, let's define what a `corpus-definition' is

(defstruct corpus-definition
  lexeme
  classification
  artnr
  def)

Now we can write a function which will take the LIST of STRINGS which
we know is a definition from the file and which returns one such
`corpus-definition'.

(defun build-corpus-definition (definition-string-list)
  (let ((c-def (make-corpus-definition))) ; An empty definition.
    (dolist (ds definition-string-list c-def)
      (multiple-value-bind (op pos)
          (read-from-string ds nil)       ; The semantics of this call
                                          ; is important.
        (cond ((string-equal (symbol-name op) "LEXEME")
               (setf (corpus-definition-lexeme c-def)
                     (read-from-string ds nil nil :start pos)))
              ((string-equal (symbol-name op) "CLASSIFICATION")
               (setf (corpus-definition-classification c-def)
                     (read-from-string ds nil nil :start pos)))
              ((string-equal (symbol-name op) "ARTNR")
               (setf (corpus-definition-artnr c-def)
                     (read-from-string ds nil nil :start pos)))
              ((string-equal (symbol-name op) "DEF")
               ;; Note the next one!!!
               (push
                (parse-def-string (subseq ds pos))
                (corpus-definition-defs c-def))))
        ))))

Try this and let me know what you get.

Let's summarize.

Now you know:

1 - how to read one definition from the file in a list of strings.
2 - how to parse the DEF field into a `token-definition'
3 - how to build a `corpus-definition' from a list of strings
    comprising a definition in the file.

What do you need to do next? :)

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.



Mon, 26 Jul 2004 03:51:16 GMT  
 list formats


<snip>

Quote:
>(defstruct corpus-definition
>  lexeme
>  classification
>  artnr
>  def)

>Now we can write a function which will take the LIST of STRINGS which
>we know is a definition from the file and which returns one such
>`corpus-definition'.

>(defun build-corpus-definition (definition-string-list)
>  (let ((c-def (make-corpus-definition))) ; An empty definition.
>    (dolist (ds definition-string-list c-def)
>      (multiple-value-bind (op pos)
>      (read-from-string ds nil)       ; The semantics of this call
>                                          ; is important.
>    (cond ((string-equal (symbol-name op) "LEXEME")
>           (setf (corpus-definition-lexeme c-def)
>                 (read-from-string ds nil nil :start pos)))
>          ((string-equal (symbol-name op) "CLASSIFICATION")
>           (setf (corpus-definition-classification c-def)
>                 (read-from-string ds nil nil :start pos)))
>          ((string-equal (symbol-name op) "ARTNR")
>           (setf (corpus-definition-artnr c-def)
>                 (read-from-string ds nil nil :start pos)))
>          ((string-equal (symbol-name op) "DEF")
>           ;; Note the next one!!!
>           (push
>            (parse-def-string (subseq ds pos))
>            (corpus-definition-defs c-def))))
>    ))))

>Try this and let me know what you get.

I think the lesson is going very well, and I actually understand some
thing :) but to ask this question first: by "read-lexeme-lines" you
actually mean "read-corpus"  right?

here is my summary of what I understand so far:
#|
we are working bottom up.
we group the `token' and the attribute list.
|#

(defstruct token
   name
   attributes)

#|
we take one string from the DEF's attrib and parse it.
We will return a list of `tokens'. (Assumption: no token will appear
in the string as `nil').
#|

(defun parse-def-string (the-definition)
  (declare (type string the-definition))
  (with-input-from-string (s the-definition)
    (loop for name = (read s nil nil)
          for attrs = (read s nil nil)
          while name
          collect (make-token :name name :attributes attrs))))

#|
input:
(parse-def-string "car ((noun automoblie sks hui)(noun vehicle sks
hui)) is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")

output:
(#S(TOKEN
      :NAME CAR
      :ATTRIBUTES ((NOUN AUTOMOBLIE SKS HUI) (NOUN VEHICLE SKS HUI)))
 #S(TOKEN :NAME IS :ATTRIBUTES ((VERB HJI SKA) (NOUN AKO HUI)))
 #S(TOKEN :NAME BLUE :ATTRIBUTES ((ADJ HJI HUI))))

;; ==============================

#|
We write a functions which will return *a list of strings* comprising
one `LEXEME' (one entry)
the function get the whole corpus as argument but returns only *one*
lexeme from the corpus.
|#

(defun read-lexeme-lines (instream)
  (declare (type stream instream))
  (loop for line = (read-line instream nil)
        while (and line
                   (string/= line "="
                             :start1 0 :end1 (min 1 (length line))
                             :start2 0 :end2 1))
          when (string/= "" line)
            collect line into lexeme-lines
        finally (return lexeme-lines)))

#|
input:
(with-input-from-string (s "
LEXEME         vehicle
CLASSIFICATION    M1x
ARTNR           27405
DEF     car ((noun automoblie sks hui)(noun vehicle sks hui)) is
((verb hji ska) (noun ako hui)) blue ((adj hji hui))
=
LEXEME         people
ARTNR           27406
DEF     strong ((adj hji hui) (adv ly hui)) is ((verb hji ska) (verb
kos hji)) man ((noun ako hui) (prop hui))
DEF     beautiful ((adj hji hui) (adv ly hui)) women ((verb hji ska)
(verb kos hji)) are ((noun ako hui) (prop hui)) successfull ((adj sko
hui) (adv ly hui))
=")
  (read-lexeme-lines s))

output:

("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")
|#

;; ===============================

#|
We have a LIST of STRINGs which comprise ONE definition.
We will now "parse" this into a more useful object.
We define what a `corpus-definition' is:
|#

(defstruct corpus-definition
  lexeme
  classification
  artnr
  def)

#|
Now we can write a function which will take the LIST of STRINGS which
we know is a definition from the file and which returns one such
`corpus-definition'.
|#

(defun build-corpus-definition (definition-string-list)
  (let ((c-def (make-corpus-definition))) ; An empty definition.
    (dolist (ds definition-string-list c-def)
      (multiple-value-bind (op pos)
          (read-from-string ds nil)       ; The semantics of this call
                                          ; is important.
        (cond ((string-equal (symbol-name op) "LEXEME")
               (setf (corpus-definition-lexeme c-def)
                     (read-from-string ds nil nil :start pos)))
              ((string-equal (symbol-name op) "CLASSIFICATION")
               (setf (corpus-definition-classification c-def)
                     (read-from-string ds nil nil :start pos)))
              ((string-equal (symbol-name op) "ARTNR")
               (setf (corpus-definition-artnr c-def)
                     (read-from-string ds nil nil :start pos)))
              ((string-equal (symbol-name op) "DEF")
               ;; Note the next one!!!
               (push
                (parse-def-string (subseq ds pos))
                (corpus-definition-defs c-def))))
        ))))

but I get an error message:
Error: Undefined function BUILD-CORPUS-DEFINITION called with
arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).



Mon, 26 Jul 2004 23:00:23 GMT  
 list formats

Quote:



> <snip>

> >(defstruct corpus-definition
> >  lexeme
> >  classification
> >  artnr
> >  def)

> >Now we can write a function which will take the LIST of STRINGS which
> >we know is a definition from the file and which returns one such
> >`corpus-definition'.

> >(defun build-corpus-definition (definition-string-list)
> >  (let ((c-def (make-corpus-definition))) ; An empty definition.
> >    (dolist (ds definition-string-list c-def)
> >      (multiple-value-bind (op pos)
> >         (read-from-string ds nil)       ; The semantics of this call
> >                                          ; is important.
> >       (cond ((string-equal (symbol-name op) "LEXEME")
> >              (setf (corpus-definition-lexeme c-def)
> >                    (read-from-string ds nil nil :start pos)))
> >             ((string-equal (symbol-name op) "CLASSIFICATION")
> >              (setf (corpus-definition-classification c-def)
> >                    (read-from-string ds nil nil :start pos)))
> >             ((string-equal (symbol-name op) "ARTNR")
> >              (setf (corpus-definition-artnr c-def)
> >                    (read-from-string ds nil nil :start pos)))
> >             ((string-equal (symbol-name op) "DEF")
> >              ;; Note the next one!!!
> >              (push
> >               (parse-def-string (subseq ds pos))
> >               (corpus-definition-defs c-def))))
> >       ))))

> >Try this and let me know what you get.

> I think the lesson is going very well, and I actually understand some
> thing :) but to ask this question first: by "read-lexeme-lines" you
> actually mean "read-corpus"  right?

No.  I mean what I write.  I am not reading the corpus.  I am reading
the lines comprising a single definition.

The question is now:  how do you read an entire corpus (i.e. a file)?
That is: write a function that does that.

        (defun read-corpus (corpus-file-name) ...)

The function will return a list of `corpus-definition's (which are
defined below).

Quote:
> (defun build-corpus-definition (definition-string-list)
        ...
>    ))))

> but I get an error message:
> Error: Undefined function BUILD-CORPUS-DEFINITION called with
> arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
> 27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
> is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).

... and the error reads as ... ?  :)  What does the error message mean?

--
Marco Socrate Antoniotti=====================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.



Mon, 26 Jul 2004 23:28:58 GMT  
 list formats


Quote:

>> >Try this and let me know what you get.

>> I think the lesson is going very well, and I actually understand some
>> thing :) but to ask this question first: by "read-lexeme-lines" you
>> actually mean "read-corpus"  right?

>No.  I mean what I write.  I am not reading the corpus.  I am reading
>the lines comprising a single definition.

>The question is now:  how do you read an entire corpus (i.e. a file)?
>That is: write a function that does that.

>    (defun read-corpus (corpus-file-name) ...)

something like:

(setf corpus (read-corpus "c:\corpus.txt"))

(defun read-corpus (pathname)
  (with-open-file (stream pathname :direction :input)
    (loop
        for corpus-definiton = (build-corpus-definition stream)
        until (eq corpus-definiton stream)
        collect corpus-definiton)))

Quote:

>The function will return a list of `corpus-definition's (which are
>defined below).

>> (defun build-corpus-definition (definition-string-list)
>    ...
>>        ))))

>> but I get an error message:
>> Error: Undefined function BUILD-CORPUS-DEFINITION called with
>> arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
>> 27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
>> is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).

>... and the error reads as ... ?  :)  What does the error message mean?

>--

oh, it was just a missing parantese error which I fixed. now when I
compile the file it says:
The following functions are undefined:
CORPUS-DEFINITION-DEFS which is referenced by BUILD-CORPUS-DEFINITION
(SETF CORPUS-DEFINITION-DEFS) which is referenced by
BUILD-CORPUS-DEFINITION

and this is the output I get:

CL-USER 1 > (build-corpus-definition '("LEXEME         vehicle"
"CLASSIFICATION    M1x" "ARTNR           27405" "DEF     car ((noun
automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun
ako hui)) blue ((adj hji hui))"))

Error: Undefined function CORPUS-DEFINITION-DEFS called with arguments
(#S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405
DEF NIL)).
  1 (continue) Try invoking CORPUS-DEFINITION-DEFS again.
  2 Return some values from the call to CORPUS-DEFINITION-DEFS.
  3 Try invoking something other than CORPUS-DEFINITION-DEFS with the
same arguments.
  4 Set the symbol-function of CORPUS-DEFINITION-DEFS to another
function.
  5 (abort) Return to level 0.
  6 Return to top loop level 0.

Type :b for backtrace, :c <option number> to proceed,  or :? for other
options

CL-USER 2 : 1 >

but if I go without the DEF part it works just fine except I get NIL
as DEF of course

CL-USER 2 : 1 > (build-corpus-definition '("LEXEME         vehicle"
"CLASSIFICATION    M1x" "ARTNR           27405"))

#S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405 DEF
NIL)



Tue, 27 Jul 2004 16:44:24 GMT  
 list formats

Quote:



> >> >Try this and let me know what you get.

> >> I think the lesson is going very well, and I actually understand some
> >> thing :) but to ask this question first: by "read-lexeme-lines" you
> >> actually mean "read-corpus"  right?

> >No.  I mean what I write.  I am not reading the corpus.  I am reading
> >the lines comprising a single definition.

> >The question is now:  how do you read an entire corpus (i.e. a file)?
> >That is: write a function that does that.

> >       (defun read-corpus (corpus-file-name) ...)

> something like:

> (setf corpus (read-corpus "c:\corpus.txt"))

> (defun read-corpus (pathname)
>   (with-open-file (stream pathname :direction :input)
>     (loop
>    for corpus-definiton = (build-corpus-definition stream)
>    until (eq corpus-definiton stream)
>    collect corpus-definiton)))

Almost.

Fist of all it is better to use one of the "definition forms" of CL.

        (defvar *the-corpus*)  ; Note the `*' convention.

        (setf *the-corpus* (read-corpus "/where/the/corpus/is/corpus.txt"))

Now let's see why your definition is wrong (meanwhile, we'll fix
problems in my code :) ).

The first question is: what does `build-corpus-definition' returns?
The second question is: what is its input?

You are calling `build-corpus-definition' with a `stream' (a variable
named `stream' whose type is `stream'.
Moreover, given your test in the 'UNTIL' clause, you seem to assume
that `build-corpus-definition' returns a stream.  Is this correct?

- Show quoted text -

Quote:
> >The function will return a list of `corpus-definition's (which are
> >defined below).

> >> (defun build-corpus-definition (definition-string-list)
> >       ...
> >>   ))))

> >> but I get an error message:
> >> Error: Undefined function BUILD-CORPUS-DEFINITION called with
> >> arguments (("LEXEME         vehicle" "CLASSIFICATION    M1x" "ARTNR
> >> 27405" "DEF     car ((noun automoblie sks hui)(noun vehicle sks hui))
> >> is ((verb hji ska) (noun ako hui)) blue ((adj hji hui))")).

> >... and the error reads as ... ?  :)  What does the error message mean?

> >--
> oh, it was just a missing parantese error which I fixed. now when I
> compile the file it says:
> The following functions are undefined:
> CORPUS-DEFINITION-DEFS which is referenced by BUILD-CORPUS-DEFINITION
> (SETF CORPUS-DEFINITION-DEFS) which is referenced by
> BUILD-CORPUS-DEFINITION

> and this is the output I get:

> CL-USER 1 > (build-corpus-definition '("LEXEME         vehicle"
> "CLASSIFICATION    M1x" "ARTNR           27405" "DEF     car ((noun
> automoblie sks hui)(noun vehicle sks hui)) is ((verb hji ska) (noun
> ako hui)) blue ((adj hji hui))"))

> Error: Undefined function CORPUS-DEFINITION-DEFS called with arguments
> (#S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405
> DEF NIL)).
>   1 (continue) Try invoking CORPUS-DEFINITION-DEFS again.
>   2 Return some values from the call to CORPUS-DEFINITION-DEFS.
>   3 Try invoking something other than CORPUS-DEFINITION-DEFS with the
> same arguments.
>   4 Set the symbol-function of CORPUS-DEFINITION-DEFS to another
> function.
>   5 (abort) Return to level 0.
>   6 Return to top loop level 0.

> Type :b for backtrace, :c <option number> to proceed,  or :? for other
> options

> CL-USER 2 : 1 >

> but if I go without the DEF part it works just fine except I get NIL
> as DEF of course

> CL-USER 2 : 1 > (build-corpus-definition '("LEXEME         vehicle"
> "CLASSIFICATION    M1x" "ARTNR           27405"))

> #S(CORPUS-DEFINITION LEXEME VEHICLE CLASSIFICATION M1X ARTNR 27405 DEF
> NIL)

Thanks.  Very nice error output.  Now what this tells me is that there
is a mishap in the `corpus-definition' definition.  As a matter of
fact, there was a missing `s'.

Here is the correct definition

(defstruct corpus-definition
  lexeme
  classification
  artnr
  defs)

Now. Back to basics.  How do you write `read-corpus'? :)

Cheers

--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.



Tue, 27 Jul 2004 22:52:04 GMT  
 
 [ 22 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Displaying in multicolumn list format -kinda repost

2. Multiple List Formats

3. BACKSPACE on list formatted file

4. List Box(Queue filled) Formatting Prob

5. changing format in one list column

6. Formatting Different Styles in List Box

7. Formatting Tree List Box

8. Formatting Tree List Box

9. Multi-Format List Box in C4

10. Selection List in a tree format

11. MAILING LIST: Architecture-Neutral Distribution Format

12. formatting output of a list

 

 
Powered by phpBB® Forum Software