Recognizer design choices 
Author Message
 Recognizer design choices

Hi all,

I'm in the process of building a simple recognizer for a closed,
context-free grammar, but I'm stuck as to which way to encode the various
parts of speech. I've read several different sources which advise several
different approaches.

On the one hand, there's the approach which suggests that each noun, for
instance, be recognized as part of a list in which the noun is the head
and the rest of the sentence is the tail. But on the other hand, I've seen
the approach where the concept of "nouniness" is created as a predicate,
i.e.

noun(boy).
noun(dog).

etc.

Is there any definitive guide as to which approach to take? Any and all
help vastly appreciated.

Thanks,
Clare Homan

-----
Clare Homan, master's candidate
Linguistics, UC Davis
Davis CA, USA



Mon, 31 May 2004 06:31:56 GMT  
 Recognizer design choices

Quote:

> Is there any definitive guide as to which approach to take? Any and all
> help vastly appreciated.

One possibility is to have just `noun' in the grammar, ie:

| noun --> [boy].
| noun --> [dog].

and

| np --> det, noun.

Doesn't get much simpler than this.

/Tomas



Mon, 31 May 2004 17:45:14 GMT  
 Recognizer design choices

Quote:

> Hi all,
> I'm in the process of building a simple recognizer for a closed,
> context-free grammar, but I'm stuck as to which way to encode the various
> parts of speech. I've read several different sources which advise several
> different approaches.
> On the one hand, there's the approach which suggests that each noun, for
> instance, be recognized as part of a list in which the noun is the head
> and the rest of the sentence is the tail. But on the other hand, I've seen
> the approach where the concept of "nouniness" is created as a predicate,
> i.e.
> noun(boy).
> noun(dog).
> etc.
> Is there any definitive guide as to which approach to take? Any and all
> help vastly appreciated.

* Normally, one would assume that the lexicon is big. So, indexing is
important. If we have a rule like

np --> det, noun.

noun --> [boy].

with very many other rules for noun, then this is going to be very
inefficient.

* Normally, one would assume that the lexicon is somehow separate, perhaps
a different module. So it would make sense to have a single predicate for
accessing that module.

This leads to the following setup:

the lexicon is a big set of facts of the form

lexicon(Word,Cat). % perhaps you want to add further argumetns for more info

e.g.:

lexicon(boy,noun).
lexicon(girl,noun).
lexicon(the,det).
lexicon(verb,kisses).

In the grammar, you'd have rules such as:

det --> [Word], { lexicon(Word,det) }.
noun --> [Word], { lexicon(Word,noun) }.

this might look a little redundant, but it pays off to worry about things
like indexing and modularity.

Gj

--
Gertjan van Noord Alfa-informatica, RUG,  Postbus 716, 9700 AS Groningen
vannoord at let dot rug dot nl            http://www.let.rug.nl/~vannoord



Mon, 31 May 2004 23:22:59 GMT  
 Recognizer design choices

Quote:


> > Hi all,

> > I'm in the process of building a simple recognizer for a closed,
> > context-free grammar, but I'm stuck as to which way to encode the various
> > parts of speech. I've read several different sources which advise several
> > different approaches.

> > On the one hand, there's the approach which suggests that each noun, for
> > instance, be recognized as part of a list in which the noun is the head
> > and the rest of the sentence is the tail. But on the other hand, I've seen
> > the approach where the concept of "nouniness" is created as a predicate,
> > i.e.

> > noun(boy).
> > noun(dog).

> > etc.

> > Is there any definitive guide as to which approach to take? Any and all
> > help vastly appreciated.

> * Normally, one would assume that the lexicon is big. So, indexing is
> important. If we have a rule like

> np --> det, noun.

> noun --> [boy].

> with very many other rules for noun, then this is going to be very
> inefficient.

> * Normally, one would assume that the lexicon is somehow separate, perhaps
> a different module. So it would make sense to have a single predicate for
> accessing that module.

> This leads to the following setup:

> the lexicon is a big set of facts of the form

> lexicon(Word,Cat). % perhaps you want to add further argumetns for more info

> e.g.:

> lexicon(boy,noun).
> lexicon(girl,noun).
> lexicon(the,det).
> lexicon(verb,kisses).

> In the grammar, you'd have rules such as:

> det --> [Word], { lexicon(Word,det) }.
> noun --> [Word], { lexicon(Word,noun) }.

> this might look a little redundant, but it pays off to worry about things
> like indexing and modularity.

It may well be that this is implementation dependent, but I was under
the impression that indexing on the first argument was more efficient,
so you would prefer

lexicon(noun,boy).

--
Andrew Eremin
IC Parc, William Penney Lab.,        Tel: +44 (0)20 7594 8299
Imperial College                     Fax: +44 (0)20 7594 8432



Tue, 01 Jun 2004 00:17:24 GMT  
 Recognizer design choices

Quote:


> > > Hi all,

> > > I'm in the process of building a simple recognizer for a closed,
> > > context-free grammar, but I'm stuck as to which way to encode the
various
> > > parts of speech. I've read several different sources which advise
several
> > > different approaches.

<snipped>
> > * Normally, one would assume that the lexicon is somehow separate,
perhaps
> > a different module. So it would make sense to have a single predicate
for
> > accessing that module.

> > This leads to the following setup:

> > the lexicon is a big set of facts of the form

> > lexicon(Word,Cat). % perhaps you want to add further argumetns for more
info

> > e.g.:

> > lexicon(boy,noun).
> > lexicon(girl,noun).
> > lexicon(the,det).
> > lexicon(verb,kisses).

> > In the grammar, you'd have rules such as:

> > det --> [Word], { lexicon(Word,det) }.
> > noun --> [Word], { lexicon(Word,noun) }.

> > this might look a little redundant, but it pays off to worry about
things
> > like indexing and modularity.

> It may well be that this is implementation dependent, but I was under
> the impression that indexing on the first argument was more efficient,
> so you would prefer

> lexicon(noun,boy).

lexicon(Word,Cat) is better because this is a "recognizer". Both arguments
will be bound when lexicon/2 is called, and Word will match fewer clauses
than Cat, making better use of the indexing.

lexicon( Cat, Word ) would be better for generating random sentences.

--
Regards

John Fletcher



Tue, 01 Jun 2004 02:11:32 GMT  
 Recognizer design choices

Quote:

> It may well be that this is implementation dependent, but I was under
> the impression that indexing on the first argument was more efficient,
> so you would prefer
> lexicon(noun,boy).

I was assuming first argument indexing.

In the applications I care about, there's typically *much* more different
words than different part of speech labels.

Gertjan

--
Gertjan van Noord Alfa-informatica, RUG,  Postbus 716, 9700 AS Groningen
vannoord at let dot rug dot nl            http://www.let.rug.nl/~vannoord



Tue, 01 Jun 2004 04:51:47 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. design choices

2. gc choice / design patterns

3. Server design choices

4. Additional recognizers?

5. C String recognizer in Python?

6. ReadME - a little online character recognizer

7. recognizer in lisp?

8. Lisp Recognizer

9. Design Patterns for HW design ?

10. Jobs, Denver/Boulder Colorado, ASIC Design, Board Design Engineers

11. Digital Design/ASIC Design @ Sun Mass

12. *** Job: Design VHDL/Verilog/Synthesis Design

 

 
Powered by phpBB® Forum Software