
MT Developers Needed for Brainchild1
Quote:
>> As a completely different example, but one that this reminds
>> me of... many languages use both a verb like "to be" and a verb like
>> "to have" as the auxiliary for forming the perfect, depending,
>> generally, on whether the verb is stative or transitive.
> Well, this is no so tough, it just requires a lot of work to encode
> the preferences of each verb. I don't know Italian (but I do know
> Spanish), and I do know Dutch and German, and I don't think this is
> one of the worst problems. The problem is "just" one of different
> meanings (and how to distinguish them!).
Spanish, of course, does not have this. English used to, but
doesn't anymore.
If you propose to have a separate field for each verb, that is
a solution, but inelegant in the way that it misses the generalization
that *almost* works. It also means that acquisition is more difficult.
What I think is REALLY interesting about the example is that
if you only knew German OR Italian, you might think it was really
easy: transitives use one auxiliary, and statives/intransitives use
the other. So (the English-and-German speaker ignorant of Italian
would think) when you see an English (say) verb, you could mark it
right away for which auxiliary it would need in German. And then when
you got a passing glance at Italian, you could think, "Oh -- same as
German; no big deal." You might suspect that out of 5000 languages,
maybe 1000 will do this, but that they'll all do it the same way. As
it turns out, Italian is SLIGHTLY different from German. It's a sure
bet that other languages do it different still. So then what? You encode
every ENGLISH verb with the marker for German auxiliary AND Italian
auxiliary? I hope it's obvious the can of worms this opens. If
Chaumont's panel of experts finds 500 languages that do this, and
there are 100 ways of doing it differently, then you end up with 100
markers on every verb you translate into Interlinguish -- just for
this one property! Imagine how case and gender will blow up on you as
well! And worse, noun classifiers.
If I understand correctly, Chaumont's notion of translating a
sentence of English to Interlinguish is that every noun (to pick an
example) would be tagged with the information needed to apply a noun
classifier, later, in Chinese. What I am saying is that this is
essentially impossible, if you want to make it work also for every
language with classifiers. An English sentence would take over 100K in
Interlinguish. If Interlinguish does NOT try to provide semantic
information of this sort, and post-processing handles things like
this, then there's little accomplishment (from an MT perspective) in
taking the sentence out of English ASCII in the first place.
I suspect that what he'll find is that he may get some nice
monolingual systems, or systems that handle a handful of NLs to some
extent, but an Interlinguish that captures the meaning of a sentence
as any language in the world might want to represent it is chimerical,
because each sentence would turn into a ridiculously large output.
Quote:
>> You're right that there is a particular "joke" mode of
>> language, just as in Mandarin, there are four-word proverbs that havea
>> unique status in that language.
> Really?
Yes. Many I should have said "expression" rather than proverb,
but some of them are proverbs, too.
"five lake four sea" -- everywhere in China
"east south west north" -- in every direction
"three heart two meaning" -- [subject is] indecisive
-JAR
--
If God doesn't destroy Hollywood Boulevard, he owes Sodom and Gomorrah
an apology.
-Jay Leno