MT Developers Needed for Brainchild1 
Author Message
 MT Developers Needed for Brainchild1

Quote:

> Email bounced.

That is probably because my return address is not valid. If you would
take a look at it, you would see something like
"delete-until-dot.q-go.com".

Quote:
> I hesitate to make a public spectacle of us both, but here goes:

Well, you just did.

Quote:
> > Your solution to not being able to fly is to learn birds to walk???

> Translation problem corrected:
> Your solution to not being able to fly is to teach birds to walk?

> "You teach me, I learn from you."

I know.

Quote:
> This is a well known mistake, and was almost standard in some parts of
> Brooklyn when I was young.

Whatever my age, this is still a common mistake for Dutch people, as
the most frequent Dutch verb for expressing "to teach" also means "to
learn": leren.

Anyway, it does not really add anything to my argument...

   Theo



Sun, 03 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:
>>         As a completely different example, but one that this reminds
>> me of... many languages use both a verb like "to be" and a verb like
>> "to have" as the auxiliary for forming the perfect, depending,
>> generally, on whether the verb is stative or transitive.

> Well, this is no so tough, it just requires a lot of work to encode
> the preferences of each verb. I don't know Italian (but I do know
> Spanish), and I do know Dutch and German, and I don't think this is
> one of the worst problems. The problem is "just" one of different
> meanings (and how to distinguish them!).

        Spanish, of course, does not have this. English used to, but
doesn't anymore.

        If you propose to have a separate field for each verb, that is
a solution, but inelegant in the way that it misses the generalization
that *almost* works. It also means that acquisition is more difficult.

        What I think is REALLY interesting about the example is that
if you only knew German OR Italian, you might think it was really
easy: transitives use one auxiliary, and statives/intransitives use
the other. So (the English-and-German speaker ignorant of Italian
would think) when you see an English (say) verb, you could mark it
right away for which auxiliary it would need in German. And then when
you got a passing glance at Italian, you could think, "Oh -- same as
German; no big deal." You might suspect that out of 5000 languages,
maybe 1000 will do this, but that they'll all do it the same way. As
it turns out, Italian is SLIGHTLY different from German. It's a sure
bet that other languages do it different still. So then what? You encode
every ENGLISH verb with the marker for German auxiliary AND Italian
auxiliary? I hope it's obvious the can of worms this opens. If
Chaumont's panel of experts finds 500 languages that do this, and
there are 100 ways of doing it differently, then you end up with 100
markers on every verb you translate into Interlinguish -- just for
this one property! Imagine how case and gender will blow up on you as
well! And worse, noun classifiers.

        If I understand correctly, Chaumont's notion of translating a
sentence of English to Interlinguish is that every noun (to pick an
example) would be tagged with the information needed to apply a noun
classifier, later, in Chinese. What I am saying is that this is
essentially impossible, if you want to make it work also for every
language with classifiers. An English sentence would take over 100K in
Interlinguish. If Interlinguish does NOT try to provide semantic
information of this sort, and post-processing handles things like
this, then there's little accomplishment (from an MT perspective) in
taking the sentence out of English ASCII in the first place.

        I suspect that what he'll find is that he may get some nice
monolingual systems, or systems that handle a handful of NLs to some
extent, but an Interlinguish that captures the meaning of a sentence
as any language in the world might want to represent it is chimerical,
because each sentence would turn into a ridiculously large output.

Quote:
>>         You're right that there is a particular "joke" mode of
>> language, just as in Mandarin, there are four-word proverbs that havea
>> unique status in that language.

> Really?

        Yes. Many I should have said "expression" rather than proverb,
but some of them are proverbs, too.

        "five lake four sea" -- everywhere in China
        "east south west north" -- in every direction
        "three heart two meaning" -- [subject is] indecisive

        -JAR
--
If God doesn't destroy Hollywood Boulevard, he owes Sodom and Gomorrah
an apology.
        -Jay Leno



Sun, 03 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:

>I suspect that what he'll find is that he may get some nice
>monolingual systems, or systems that handle a handful of NLs to some
>extent,

Even that would be worth 10 million Euros to the European Commission!
--
Regards, John Woodgate, OOO - Own Opinions Only.
Phone +44 (0)1268 747839 Fax +44 (0)1268 777124.
http://www/jmwa.demon.co.uk Did you hear about
the hungry genetic engineer who made a pig of himself?
PLEASE DO ****NOT**** MAIL COPIES OF NEWSGROUP POSTS TO ME!!!!


Sun, 03 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1


Quote:
> If you propose to have a separate field for each verb, that is
> a solution, but inelegant in the way that it misses the generalization
> that *almost* works. It also means that acquisition is more difficult.

<snip>

For some reason you are assuming that every instance has to be separately
listed. You simply have to list the rule and then the exceptions to the
rule. If there are alot of exceptions to the rule, then you make a rule that
covers most of the exceptions and then list the exceptions to that rule etc.

We have a similar rule in English spelling: I before E except after C.
You have a rule and a rule for exceptions and then exceptions to the rule
such as weird and either.

I believe that is the way we learn most things. We generate rules and their
exceptions, otherwise we would have an awful lot to remeber.

The same solution works for most proverbs and figures of speech. Usually the
proverb can be translated into a thought after some thinking, but once
translated, we just remember what the proverb means without translating it
again.

The computer doesn't have to do the original translation. A human can do it
once, and then the computer has a database of all the difficult to translate
expressions and translates those expressions as one phrase.

The problem of translating words like saw which can mean different things
can be addressed differently.

I've found when translating that a single word has multiple meanings and it
is unclear which meaning is the intended one. I then try to find a word in
the language I'm translating to that has the same properties.

You will find that words that have more than one meaning in one language
usually have a word in another language that also has a similar group of
meanings. The other word might produce slightly awkward sentences, but it
will give the reader the chance to make the choice of what the meaning is.

It is not the job of the translator to understand everything being
translated so long as all the possible meanings are conveyed in the
translated text.

I think it is relatively simple to list all the possible meanings of a word
and try to find a corresponding word that has most of those meanings.

For example: straw means the lower part of a stalk of wheat and also thin
tube which is used to drink. In modern Hebrew kash also has both of those
meanings.



Mon, 04 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
From: "Shmuel Weidberg"

Quote:
> For some reason you are assuming that every instance has to be
> separately listed.

        No. I put "if" before that clause. I was answering in reply to
a specific proposal.

        With two languages (just German and English, say), it is
already not *that* easy. Is "to reside in" transitive or stative? I'm
not sure if it is an an exception in German, or in Italian... but
that's not the hard problem.
        The hard problem is for a system that seeks to encode the
semantics of a sentence so that it's ready for transfer to *any*
language. Really what you need is what you proposed, for every
language, with the inter-language level not helping a whole lot. All
the work happens in the second phase. So a claim that the first phase
has been (nearly) solved doesn't amount to much.
        -JAR
--
I do everything for a reason. Most of the time, the reason is money.
        -Suzy Parker



Mon, 04 Mar 2002 03:00:00 GMT  
 The most promising work in NLP ?
Hi,

I'm working on a book about Microsoft Research, which will be published by
Broadway Books (--an imprint of Random House).  One of the key areas we are
investigating is natural language processing and I would love hear your
suggestions about who is doing some of the most interesting work in NLP.

thanks - don



Tue, 12 Mar 2002 03:00:00 GMT  
 The most promising work in NLP ?

Quote:
> Hi,

> I'm working on a book about Microsoft Research, which will
> be published by Broadway Books (--an imprint of Random
> House).  One of the key areas we are investigating is natural
> language processing and I would love to hear your suggestions
> about who is doing some of the most interesting work in NLP.

OK, everybody, describe your NLP/MT/AI project here.  Chaumont?
Noam? Roger? Terry? Doug?

http://www.scn.org/~mentifex/aisource.html is the PD AI NLP project
I worked on for a few hours more today until fatigue set in and I
switched to Usenet relaxation.  Although this work is being done
here in Seattle across Lake Washington from Microsoft, it is in
no way connected with Microsoft Research (but free for them to use).

Quote:
> thanks - don


good luck - arthur



Tue, 12 Mar 2002 03:00:00 GMT  
 The most promising work in NLP ?

Quote:

> Hi,

> I'm working on a book about Microsoft Research, which will be published by
> Broadway Books (--an imprint of Random House).  One of the key areas we are
> investigating is natural language processing and I would love hear your
> suggestions about who is doing some of the most interesting work in NLP.

> thanks - don


Well, it wouldn't obviously be Microsoft Research.

-- Gary Merrill



Tue, 12 Mar 2002 03:00:00 GMT  
 
 [ 53 post ]  Go to page: [1] [2] [3] [4]

 Relevant Pages 
 

 
Powered by phpBB® Forum Software