MT Developers Needed for Brainchild1 
Author Message
 MT Developers Needed for Brainchild1

Quote:

>>         1) In general, I think there are some extraordinarily subtle
>> points in semantics; most of my examples key on verb inflections.
>In my opinion, there are no subtle points in semantics other than
>pragmatics, i.e. the enormous amount of knowledge language users
>acquire over the years of the way expressions are used in non-literal
>ways. This is nearly impossible to mimic. For this, a model of language
>acquisition is needed and years of training.

This is the wrong problem to solve.

Quote:
>> I don't think ANYONE has a good theory of all of these yet.
>I don't even think anyone has a good theory of syntax yet. Let alone
>the rest.

Likewise, another wrong problem.

Quote:
>>         2) Some utterances are inherently ambiguous when taken alone,
>> but context can disambiguate which of multiple meanings was
>> intended.
>Much too hard.

Yes.

Quote:
>> This requires full-blown reasoning, which I'm quite sure is
>> not a solved problem.
>Not only that, it requires "reasoning" about illogical things (like
>pragmatics and common world knowledge) and huge amounts of them too.
>This is not the sort of reasoning computers are particularly good at.
>No, MT still has a long way to go...

These are the arguments of people who don't want a solution.  "Human
machines can never fly because we can't grow metal feathers and also
it would be far too hard to get machines to flap their wings precisely
the way birds do.  Besides, to really build a mechanical bird you have
to first solve the problem of machine self-reproduction, you need a
machine that can lay eggs and hatch them into new flying machines.  And
you can't solve that without a general solution to the chicken-and-egg
problem, you'd have to figure out which comes first."

To get computers to transcribe speech -- not translation, just listen
to words and write them down -- we need at a minimum to revise our
alphabet.  There is no justification for english spelling and it's
ridiculous to get machines to try to cater to it.  Spell-checkers are
an abomination, they're utterly regressive.

Similarly for pronunciation.  To get a computer to read a text we
should revise both the alphabet and also the language, we should write
texts that computers can pronounce adequately.  You could argue that
computer speech will fail until computer programs can read text and
understand the proper emotions to go with it and adjust their tone
appropriately.  But that isn't the point.

If it's that way for transcription and speech, even more so for
translation.  For each language we need a simplified grammar that
people will still understand, and when you create a text intended
for translation you will use your human skills to develop text
that's easy to translate.  You'll avoid weird idioms and concepts
that are hard to handle.  And part of how you'll get those skills
is from feedback from the people who read the translations.

We can get usable MT without needing the subtleties of a human
who's highly literate in two languages.  You needn't expect MT to
do a great job translating Poe or Pushkin.  But MT can be usable
without that, it's plausible that it can allow people to make
useful statements that can be effectively translated into a hundred
languages with very little trouble.  Statements for example like
"This is an emergency.  Clear the streets.  Curfew is from dusk to
dawn until further notice.  Essential services will be restored as
soon as possible.  Provisional self-government will be established
after the cessation of hostilities."  Certainly some simplified
version of this should be easy.  Something like "Beware!  Go hide
now.  Hide every night.  If we find you outside at night we will
kill you.  We will get your electricity and water and sewers
working again when we can.  After all of you stop fighting us we
will go away.  We will not come back unless you do something we
don't like."

There could be amusing flaws in the translation but it ought to
be quite possible to write so that the gist of the meaning comes
through.



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:

> >This is nearly impossible to mimic. For this, a model of language
> >acquisition is needed and years of training.
> This is the wrong problem to solve.

Excuse me, but *what* exactly is the wrong problem to solve? Modelling
language acquisition? I cannot see any other solution to full MT than
having a learning model.

Quote:
> >I don't even think anyone has a good theory of syntax yet. Let alone
> >the rest.
> Likewise, another wrong problem.

And what's the wrong problem here? Syntax???

Quote:
> These are the arguments of people who don't want a solution.

It is time to say: this argument s*cks. As a matter of fact, I have
worked on a MT system and since then have worked on grammar-based spell
checking (harder than you might think) and a model of human language
competence based on the known psycho-linguistic data. I would very much
like to see a solution.

Quote:
> To get computers to transcribe speech -- not translation, just listen
> to words and write them down -- we need at a minimum to revise our
> alphabet.  There is no justification for english spelling and it's
> ridiculous to get machines to try to cater to it.  Spell-checkers are
> an abomination, they're utterly regressive.

Your solution to not being able to fly is to learn birds to walk???

Quote:
> Similarly for pronunciation.  To get a computer to read a text we
> should revise both the alphabet and also the language, we should write
> texts that computers can pronounce adequately.

I am sorry. I missed your point. You are simply being ironic.

Haha.

Quote:
> We can get usable MT without needing the subtleties of a human
> who's highly literate in two languages.

Of course, pattern recognition and statistics will always do the trick.
But they're pretty boring. Imagine a statistic approach to physics. We
would have the certain knowledge that 99.97% of all objects will fall
whenever you release them without anybody knowing anything about
gravity.

Quote:
> "This is an emergency.  Clear the streets.  Curfew is from dusk to
> dawn until further notice.  Essential services will be restored as
> soon as possible.  Provisional self-government will be established
> after the cessation of hostilities."

Yeah right. I cannot imagine whatever language they speak on Eastern
Timor to include the correct translations of "curfew".

Quote:
> There could be amusing flaws in the translation but it ought to
> be quite possible to write so that the gist of the meaning comes
> through.

Amusing? They'd be killing in your example.

   Theo



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
From: Theo Vosse

Quote:
>> Utterance 1: "I was walking to school, when I saw a black cat."
>> Utterance 2: "I was walking to school, when I was a boy."

> Sorry, but the solution to this is well-known.

        Really? For MT, for just for having someone eyeball it and say
how the two are different? I'd appreciate a pointer. I think a general
approach that handles English->Russian and English->Navajo would be
hard.

        Let me add:

Utterance 3: "I was walking to school, when I saw Nadine."

        Here, "saw" could mean "caught sight of" (perfect) or "used to
date" (imperfect). I can't believe there's a good solution to this,
because the aspect of the second verb isn't always easy to come by.

        As a completely different example, but one that this reminds
me of... many languages use both a verb like "to be" and a verb like
"to have" as the auxiliary for forming the perfect, depending,
generally, on whether the verb is stative or transitive. Italian and
German are two examples. BUT, Italian and German, while handling it
the same way 99% of the time, do it differently for a few verbs. (I
believe "abitare" and "wohnen" ... "to reside" are an example.)
        You have a situation where there is ALMOST one rule for choice
of auxiliary, but that's obviously not quite enough. As with many
things in language, a rule won't handle it, and a table of which
auxiliary each verb takes misses the generalization that any person
would latch onto (for 99% of cases).

Quote:
>> Utterance 1: "A guy walks into a bar."
>> Utterance 2: "A guy walked into a bar."

> Translation of these things is very complex too. You need specific
> knowledge about the construction of jokes (just any plain narrative
> will not suffice). So, first you will need to be able to identify
> this as a joke and then to have rules for translating it
> properly. This is simply too much work for the moment.

        You're right that there is a particular "joke" mode of
language, just as in Mandarin, there are four-word proverbs that havea
unique status in that language. But even as straight narrative, the
tense problem exists, and the example someone else through out of the
future is another. Tense in the second clause of sentences with
intensionality is just as tricky.

        Day 1 - "I will show up." (future)
        Day 2 - "I told him that I would show up." (conditional)
        Day 1 - "Arrivero`." (future)
        Day 2 - "L'ho detto che` arrivaro`." (future)

        In Italian, the tense of the second verb is the same as the
tense one would use in the original statement. In English, it is not.

[onomatopoeia]

Quote:
> Hey, don't make it too hard for the poor guy. Well, once the problem
> of general translation is solved, this will follow automatically
> (this is not a joke; I am completely serious; I think general
> translation requires implementation of such a high level of
> intelligence that this will be implemented as a consequence).

        Not quite, because I think someone deaf could (not to say this
is true of everyone deaf) be quite adroit with the language and
utterly fail to comprehend onomatopoeia. It's sort of a separable
module, in some way, and I think there are others.

        -JAR
--
Man is quite ready to die for an idea, provided that idea is not quite
clear to him.
        -Paul Eldridge



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:


[snip]

> Anyway, those interested in Machine Translation, and some of the
> amusing errors that commercial systems make, may like to check out
> my web page http://ciips.ee.uwa.edu.au/~hutch/hal/Wacky.phtml.
> It interfaces to AltaVista's Babelfish (and, within the next hour or
> so, a second online translation engine), and automatically translates
> from English, to another language, and back again, which allows
> monolingual people like myself to evaluate performance (although the
> English to English translations will be twice as bad, one would think).
> The web page also has a feature whereby you can add amusing translation
> to an ever-increasing archive.

> I may add support for languages other than English if there is enough
> demand.  And I would appreciate any URL's of other online translation
> engines, so I can add them to the page.

> +=-- -- =--  =-== === --=  ---- === =- - --- = -=-- =-==  = ---- -- =- =-= +
> |  Mr Jason Lloyd Hutchens, PhD Student and Procrastinator Extraordinaire  |
> | TMBG/IF/MAME/MB/BEOS/PSX/AMIGA/MIDI  Me/Research/Spy/MegaHAL/Humour/More |

> |  Unsolicited email advertising is treated with the contempt it deserves  |
> +=-== === --= -====- =--  =--- -  - =- -=-- -- ==- ---- = - =- - =-- --==--+

It is said that an early attempt at machine translation to and from
Russian converted "Out of sight, out of mind." to "Blind and crazy."
Here are the results from the URL above.

Italian: From sight, the mind.
French: Out of the sight, of the spirit.
German: From sight from understanding out.
Portugese: It are of the sight, it are of the mind.
Spanish: Outside Vista, the mind.

That is, of course, an especially difficult locution. A sentence more
appropriate to the subject is "Don't count your chickens before they
hatch."

Italian: Conteggio di Don' t yours polli before that they brood.
French: Count of Don' T your chickens before they chop.
German: Counting pulse Don't your chickens, before they out-breed.
Portugese: Tally of Don't its hens before to shock.
Spanish: Account of Don't its chickens before they plot.

The contraction seems troublesome. Let's try again: "Do not count your
chickens before they hatch."

Italian: Not to count yours before polli that they brood.
French: Do not count your chickens before they chop.
German: Do not count your chickens, before they out-breed.
Portugese: It does not count its hens before they shock.
Spanish: It does not count its chickens before they plot.

We have a long way to go.

Jerry
--
Engineering is the art of making what you want from things you can get.
-----------------------------------------------------------------------



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:


> > We also argue a lot about POSTPONE and such.

> Can you prepone something?
> --
> Regards, John Woodgate, OOO - Own Opinions Only.
> Phone +44 (0)1268 747839 Fax +44 (0)1268 777124.
> http://www/jmwa.demon.co.uk Did you hear about
> the hungry genetic engineer who made a pig of himself?
> PLEASE DO ****NOT**** MAIL COPIES OF NEWSGROUP POSTS TO ME!!!!

****** You ask for no mail, so I post this instead. ******

I like that question. It's something else to argue about! As for an
answer, I suppose that one can. The commonly used word is "anticipate".

Jerry
--
Engineering is the art of making what you want from things you can get.
-----------------------------------------------------------------------



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:



> >In narrative, the choice of present or past can be largely arbitrary,
> >with no clear difference in denotation. In other contexts, past and
> >present are obviously not interchangeable!

> 'I'm sure everyone here will understand that.'

> No, is that really a valid future tense in English? I don't think so. It
> is a stylistic embellishment of a 'present tense' idea: 'I'm sure
> everyone here understands that.'. Yet I see it translated by real
> translators into (what I assume is) a real future tense in French and
> German. How could a poor, benighted machine cope?
> --
> Regards, John Woodgate, OOO - Own Opinions Only.
> Phone +44 (0)1268 747839 Fax +44 (0)1268 777124.
> http://www/jmwa.demon.co.uk Did you hear about
> the hungry genetic engineer who made a pig of himself?
> PLEASE DO ****NOT**** MAIL COPIES OF NEWSGROUP POSTS TO ME!!!!

Well, John, I post again instead of mailing. Your analysis is sound but
not exhaustive. It could be an eliptical form of 'I'm sure everyone here
will understand that when they read it.' My hunch favors elipsis. The
translation difficulties you mention are not eased thereby.

Jerry

P.S. The future tense in German is achieved with modal auxilliaries,
just as it is in English. There is another difficulty, though. Most
languages lack the distinction between "I talk too much." and "I am
talking too much." as a tense. They rely instead on context or extra
words.
Jerry
--
Engineering is the art of making what you want from things you can get.
-----------------------------------------------------------------------



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
Proponement: you have discovered the dark secret at Mind.Forth's mind core.
Of course, the clue was there all along: 'machine translation is germinally
if not seminally...'. I wonderd how germination could precede semination: I
would have thought that seed had to exist before it could sprout. But now
all is revealed. Thomistic philosophy accepts that if the cause is out of
time, then within time the effect may precede its cause. In this case,
Mind.Forth can develop through various releases (Mind.Fifth, Mind.Sixth,
Mind.N) until it is perfected, or until the end of time, whichever is the
sooner. Its introduction can then be preponed to the beginning of time, so
it will turn out that we've always had it. The principles of preponement are
clearly laid out in my seminal work 'Quantum Metaphysics' which I shall
write posthumously. Publication will then be proponed to shortly before my
demise so that I can answer my critics. N.B. please post abuse via the NG.
Abuse via e-mail tends to worry the staff.

James Lee

Quote:





>>>> We also argue a lot about POSTPONE and such.

>>>Can you prepone something?

>>Constantly -- that's one of the bedrocks of Forth development.  There's no
>>standard word PREPONE, though; we tend to write application-specific
>>versions.

>Ah, now I see the context. I'm not sure that this is on-topic at all in
>sci.lang.translation. 'Preponing' was quite common in the days of slow
>8-bit computers like the BBC Micro, especially the generation and
>storage of look-up tables at the beginning of a program run, to avoid
>having to do complex calculations several times during the run.
>--
>Regards, John Woodgate, OOO - Own Opinions Only.
>Phone +44 (0)1268 747839 Fax +44 (0)1268 777124.
>http://www/jmwa.demon.co.uk Did you hear about
>the hungry genetic engineer who made a pig of himself?
>PLEASE DO ****NOT**** MAIL COPIES OF NEWSGROUP POSTS TO ME!!!!



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
I should like to put on record that I am not now, nor ever have been, an
American sophomore. The word always makes me think of some structure
involved in the sex life of marine molluscs (U.S.: mollusks). However, I
understand that it is, in fact, the larval form of a sophist.

James Lee



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
As an experiment, I casually dropped the terms 'Weichwaren' and
'Geisteskerngehaeuse' in a discussion with some German engineers this
afternoon. Incidentally, I do happen to speak German. The anticipated result
was that (1) the terms would be queried, (2) I would be informed politely
(or otherwise - they are good friends) that such terms do not exist in
German or (3) - but of course less likely - people might laugh. The actual
result was extraordinary. The remarks vanished without trace. What I mean
is, that if one chanced to break wind, people might ignore it politely, but
one would know that they were ignoring it. In this case, the reaction was
zero. It confirms my suspicion that one only perceives what one expects to
perceive. The rest is edited out as noise.

James Lee



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
But seriously now. The parody I posted of the Mind.Forth ad. text was
intended to demonstrate the impracticality of word-by-word translation. The
words really were the first words that came up. The only edit I made was to
substitute 'Ohr' for 'Aehre' as the translation of 'ear', on the grounds
that 'Aehre' was not only out of context, but also rather corny. Artificial
insemination really was the first choice for AI and, of course, with
'seminally' in the same sentence, it appeared to be in context. However, I
note that Acronym.com has 590 listings for these initials.

I have, in fact, been involved in preparing texts for machine translation.
My guidelines then, as now, were: (1) use simple words, each with one
clearly defined meaning; (2) use simple grammar, with clearly defined rules;
(3) as far as possible, translate complete sentences rather than individual
words. Context, as always, is everything. In the meantime, these have been
amplified and, to some extent, superseded, by 'restricted languages'. I also
have several rules for texts for publication. One of these is: with the
exception of initials or abbreviations that are extremely well-known to the
expected readership, always write out the terms represented by initials or
acronyms in full the first time they occur [e.g. Endoscopic Retrograde
Pancreatography (ERCP)]. The poor translator is not necessarily as familiar
with the terms as the writer, and can be stymied, if not flummoxed, by
incomprehensible initials.

And, in the final analysis, the text should make sense.

The Mind.Forth ad. texts fails to meet these criteria.

James Lee



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
Email bounced. I hesitate to make a public spectacle of us both, but
here goes:

Quote:

  ...

> Your solution to not being able to fly is to learn birds to walk???

Translation problem corrected:
Your solution to not being able to fly is to teach birds to walk?

"You teach me, I learn from you."

This is a well known mistake, and was almost standard in some parts of
Brooklyn when I was young. Many of those speakers pronounced "ask"
"aks", and said "scrool" (euphemism?) to describe a wood{*filter*}(or
sometimes, a nail).

Quote:

  ...

>    Theo

Jerry
--
Ax me no questions; I'll tell you no lies.
------------------------------------------


Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1
But, of course, I am not opposed to manchine translation as such. Only a
foolish craftsman would scorn the use of machine tools for the routine work.
In fact, some of the results achieved are very impressive. However, I think
that the real breakthroughs can only be achieved by people who understand
the translation process. This is rather more than simply compiling
glossaries.

James Lee

.



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:



>[snip]

>> Anyway, those interested in Machine Translation>

>...a sentence more
>appropriate to the subject is "Don't count your chickens before they
>hatch."
>Jerry

Re: chickens. I suppose in Dutch one could say 'don't tell your chickens
anything in front of the trapdoor' but the real translation is 'don't sell
the hide before you've caught the bear'. I'd like to see a machine handle
that.

James Lee



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:

>   You're right that there is a particular "joke" mode of
>>   language, just as in Mandarin, there are four-word proverbs that have a
>> unique status in that language.

and Theo Vosse replied:

Quote:
> Really?

Yep.  You'll find some more information not about "four word proverbs" but
about "four character proverbs" (Whorf: "there is no word for 'word' in
Chinese") in the interview between Sandra Celt and myself, included as "The
Challenge of Translating Chinese Medicine," accessible from the secondary
menu of my program "Truth About Translation," instantaneously downloadable
from:

            ftp://oak.oakland.edu/pub/ad-edu/trutra.zip

and thanks to you all, John, Theo, & Bryan, for your insightful
contributions to this somewhat regrettable debate.

                                              Gruss von Gross!
                                                         alex



Sat, 02 Mar 2002 03:00:00 GMT  
 MT Developers Needed for Brainchild1

Quote:

>         Really? For MT, for just for having someone eyeball it and say
> how the two are different? I'd appreciate a pointer. I think a general
> approach that handles English->Russian and English->Navajo would be
> hard.

I misread your example. No, they're not necessarily easy. I just saw
two complelety distinct syntactic structures and thought: what's the
problem. But now I see in some languages you might need different
translations. No, the problem is just as hard as any of the tense
translation problems.

Quote:
> Utterance 3: "I was walking to school, when I saw Nadine."

>         Here, "saw" could mean "caught sight of" (perfect) or "used to
> date" (imperfect). I can't believe there's a good solution to this,
> because the aspect of the second verb isn't always easy to come by.

I see this as pointing out a different problem, which was already
mentioned.

Quote:
>         As a completely different example, but one that this reminds
> me of... many languages use both a verb like "to be" and a verb like
> "to have" as the auxiliary for forming the perfect, depending,
> generally, on whether the verb is stative or transitive.

Well, this is no so tough, it just requires a lot of work to encode the
preferences of each verb. I don't know Italian (but I do know Spanish),
and I do know Dutch and German, and I don't think this is one of the
worst problems. The problem is "just" one of different meanings (and
how to distinguish them!).

Quote:
>         You're right that there is a particular "joke" mode of
> language, just as in Mandarin, there are four-word proverbs that havea
> unique status in that language.

Really?

Quote:
>         Not quite, because I think someone deaf could (not to say this
> is true of everyone deaf) be quite adroit with the language and
> utterly fail to comprehend onomatopoeia. It's sort of a separable
> module, in some way, and I think there are others.

Yeah, but a translation system cannot be deaf in this way: it needs to
have some knowledge of pronunciation, and translating pronunciation is
not a problem. The phonetics of most languages are sufficiently
documented.

   Theo



Sun, 03 Mar 2002 03:00:00 GMT  
 
 [ 53 post ]  Go to page: [1] [2] [3] [4]

 Relevant Pages 
 

 
Powered by phpBB® Forum Software