Reviews for lisp implementations 
Author Message
 Reviews for lisp implementations


| It's good that it has been solved (well, I shouldn't say that when I
| don't know how).  I was never able to understand what made them use M-DEL
| for a printable character in the first place.

  ISO character sets come in 94-character and 96-character flavors, apart
  from ISO 10646.  the ISO 8859 family uses the ISO 4873 8-bit template,
  with a 94-character set in the left half and a 96-character set in the
  right half.

  in the 94-character set, 2/0 is SPACE and 7/15 is DELETE, both of which
  sort of dual as control and data characters.  in the 96-character set,
  2/0 and 7/15 are data characters.

  if you have a 94-character set and only 7 bits worth of data, the last
  bit is free to be used for other purposes, such as constant zero, parity,
  an application flag, or constant one.  most modern uses are constant zero
  and an application flag.  however, if you use an 8-bit character set, the
  only chance you have at using an application flag is with 10/0 and 15/15,
  in which case you'd probably want a non-breaking space and what IBM calls
  EO (eight ones), used as an "end of whatever" signal.  referring to 15/15
  as "M-DEL" regardless of whether it is a character or EO betrays a
  serious conceptual confusion about the usage of the code space.

  incidentally, there _is_ no upper-case version of ?, just as there is no
  upper-case version of ?.  pining for LATIN CAPITAL LETTER Y WITH DIARESIS
  is like pining for LATIN CAPITAL LETTER SHARP S -- a symptom of a strong
  inability to deal with practical matters and to understand the sometimes
  _very_ erratic history of writing systems.

  not that Vassil or anyone here is particularly to blame for this, but the
  history of the ?, oe (not in 8859-1 because some French moron told ECMA
  it wasn't needed and shouldn't be there, and then we got and stuck in
  the middle of the O's, only to have the smart French guy who designed
  this stuff return fully recuperated after some serious accident or other,
  only the voting had completed, to demand a 8859 member with OE and oe --
  which they got from ISO after a few years, but which nobody uses, not
  even the French1), and ? are one of dipthongs that merged over the course
  of centuries and then assumed phonemes of their own.  ae -> ? in Denmark
  and Norway are almost the same as ? in Sweden, but different from ? in
  Germany (and the decoration used to be different, too, until ECMA had
  enough of it).  the French oe has a long and arduous story I don't know
  in detail, but it's not unlike ? in Germany.

  now, ? is not a y with diaeresis at all.  it has more in common with et

  the Netherlands, it is pronounced like the English long I.  of course, as
  time goes by, various stupid people will do all kinds of stupid things,
  and in this case, we have the _reverse_ of what happened in France when
  some genius2 decided that capital letters should not have accents because
  that was too hard to do with early typewriters and printers -- this has
  since been reversed when computers learned how to handle French.  so now
  that we have these nifty computerized thingamajigs, let's just forget
  that neither I nor J have dots on them, even though i and j do (despite
  the linguist3 who decided that Turkish i and j should upcase to I and J
  with dots, but I and J should downcase to i and j without dots, which I
  think is at least part of the reason awful movies get Turkey awards), so
  the nifty computers should produce a _really_ historically moronic letter
  that nobody in their right mind would ever want to use.

  so, the single cluon in danger of being annihilated by swarms of morons
  upon contact is that just as ? is upcased to SS, ? is upcased to IJ.

[ this article was best viewed with an ISO 8859-1 capable font. ]

#:Erik
-------
1 the morale of this story is either to keep the morons away from standards
  bodies or not to have serious accidents if you're the only smart guy in
  France.
2 read: moron -- it wasn't the only smart guy in France alluded to above.
3 another moron; wouldn't surprise me if he was French.
--
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.



Wed, 03 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations
[ interesting thread, this ]

On 17 Apr 1999 17:23:24 +0000,

Erik> now, ? is not a y with diaeresis at all.  it has more in common with et

Being Dutch, I probably should have known or figured this out, but I didn't;
I always thought it was a Turkish letter.  I don't know who invented the
graphical form of this letter (?), but it probably wasn't a Dutchman. In
actual practice, "ij", although one letter (actually, diftong), is *always*
typed and typeset as an i followed by a j. As far as I'm concerned, i'd be
happy to ceede this ascii value to more important purposes (capital sharp s?)
When upcased, both i and j have to be upcased (which is rare, but a good
example is 'IJsselmeer', the big watery hole in the middle of
Holland^H^H^H^H^H^H^H^HThe Netherlands). However, most dictionaries sort the
'ij' as two separate letters. Confusing, sortof.
                                                                      Philip
--
To accurately forge this signature, use a lucidatypewriter-medium-12 font
-----------------------------------------------------------------------------

+44 (0)1223 49 4639                 | Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           | Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53



Wed, 03 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations

Quote:

>   now, ? is not a y with diaeresis at all.  it has more in common with et

>   the Netherlands, it is pronounced like the English long I.  of course, as
>   time goes by, various stupid people will do all kinds of stupid things,

Except that in the Dutch speaking parts of Belgium and the
Netherlands, everybody writes it as ij. The confusion could have been
started because some morons (this time not even French) collated the
ij combination with the y, although modern dictionaries have stopped
this a long time ago. There is also some difference of opinion how to
write an uppercase version of this. Some people use Ij but most -
especially in handwriting will use a variant of uppercase Y with
diaresis.

BTW: if Gordon's Introduction to Old Norse is accurate and can be
extrapolated to the modern variant, it's rather pronounced as the ei
diphtong in 'bein'.

--

If there are aliens, they play Go. -- Lasker



Wed, 03 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations

| Correct me if I am wrong, but the above (quoted) paragraph does not
| contradict a statement that using 15/15 for a printable character is
| inappropriate.  Or did I miss anything?

  yes.  10/0 and 15/15 are characters when the right-hand side of an 8-bit
  character set (GR) is filled with a 96-character set.  (the other 32 are
  control characters (C1).)  if you had filled it with a 94-character set,
  it would have been inappropriate to use 15/15 at all.

  the reason for this is that 10/0 and 15/15 are characters in their own
  right and must be coded with 8 bits, but if you use a shifting coding
  with only 7 bits and codes to swap between G0 and G1 (both now in GL)
  with the codes SO and SI, then it's important that 2/0 and 7/15 remain
  their usual semi-control characters even when G1 is invoked.

| I don't understand your point here.

  seems I was mistaken about the up/downcasing of I with/without dots.
  (shoot, gotta check and go back and fix those files for Emacs.)

| I wondered (as an academic exercise) what should CHAR-UPCASE and
| NSTRING-UPCASE do about LATIN SMALL LETTER Y WITH DIAERESIS (assuming
| STRING-UPCASE is allowed to return a longer string which isn't especially
| nice either).   Signal an error?  Or the implementation would state that
| the character sets it uses do not include this letter?  (Making
| CHAR-UPCASE return two values, like #\I and #\J in this case, appears
| more than perverse, though who knows.)

  I have come to think that people who use sick writing systems should pay
  for their own mistakes so they will have reason to fix them.  forcing
  everybody else to pay for them only causes software not to be available.
  e.g., the Spanish purportedly undid the silly sorting requirements of ll
  (treated as a separate "letter" between k and l, I think it was) due to
  the force of simplicity and logic of computers (or was it marketing :).
  a German spelling reform (which people seem to hate rather strongly) do
  away with the sharp s and spell it "ss" in lowercase, too.  the Norwegian
  and Danish sillitude of sorting "aa" as equivalent to "?" (a ring), and
  the hysterical requirement that German spelled out with "ue" instead of
  "" should be sorted as if it wasn't spelled out are examples of morons
  who got into standards bodies.  (now, the right way to do this is to
  store a sort key and a print string, but since people don't use tools
  easily extendible that way, forcing stupid people to do this causes a lot
  of grief and problems when they try to print the sort key or vice versa.)

  anyway, let's just ignore the issue and ask them to spell it out as ij,
  like the Dutch correctly do.  (the ? is Belgian, _from_ Dutch ij.)  (I'm
  not sure upcasing "ij" to "IJ" is all that great an idea, although it is
  obvious if you look at fonts designed in or for The Netherlands: they
  sport "ij" and "IJ" ligatures, just as fonts designed for Norway has a
  ligature for "fj" just like "fi", because of "fjord" and "fjell".)

  anyway.  8 bits would have been enough if we had been using floating
  diacritics and upcasing and downcasing would have needed to worry about
  A-Z, only.  ISO tried that, too, (ISO 6937) but computer people were not
  able to appreciate it, because they were thinking fonts, not character
  sets.  sigh.

  if there's reincarnation, I hope I won't remember any of this the next
  time around.

#:Erik
--
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.



Thu, 04 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations

* Erik Naggum
|
| now, ? is not a y with diaeresis at all.  it has more in common with et

* Philip Lijnzaad
|
| [...] In actual practice, "ij", although one letter (actually,
| diftong), is *always* typed and typeset as an i followed by a j. As
| far as I'm concerned, i'd be happy to ceede this ascii value to more
| important purposes (capital sharp s?)  When upcased, both i and j
| have to be upcased [...]. However, most dictionaries sort the 'ij'
| as two separate letters. Confusing, sortof.

Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a
separate letter after Z.  Can you elaborate on whether both happens or
whether I've been misinformed?

And if it's really sorted separately then I think makes sense to
consider it a separate character, as Unicode more or less does
(although it calls it a ligature): U+0132 and U+0133.

--Lars M.



Thu, 04 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations

| And if it's really sorted separately then I think makes sense to
| consider it a separate character, as Unicode more or less does
| (although it calls it a ligature): U+0132 and U+0133.

  this is getting a bit far afield, but collation order, characterness, and
  glyphness are distinct properties of a writing system element.  for one
  thing, there is no _single_ correct collation order.  character sets do
  _not_ imply collation order.  characterness of a writing system element
  is a fairly fundamental concept and is strongly associated with meaning.
  glyphness of a writing system element is strongly associated with looks.
  finally, fonts are made up instantiations of glyphs.  e.g., a writing
  system element may exhibit so different meanings that they deserve to be
  separate characters, although this is very rare.  in general, there is
  also one glyph per character, although some have more (the German short
  and long s, the open and baggy a, the open and broken vertical line), but
  more frequent is a glyph for a sequence of characters (ligatures in Latin
  scripts, but includes vowels in Indic scripts and Hebrew) or a character
  in contex (the connectives (single, initial, medial, final) in Arabic
  scripts), etc.  collation order is tightly coupled with character, but
  for hysterical raisins many languages collate sequences of characters as
  a single unit.  to represent all of this correctly, you need a whole
  bunch of tables.  there are therefore glyph set standards that are very
  separate from character set standards, and their mapping is non-trivial.
  there are huge tables of correct collation orders for different scripts
  and languages (French requires a five-level deep collation system in full
  name and dictionary sorting), and conflation of representation makes up
  most of it (e.g., no significance it attached to the ring in "?ngstr?m"
  in an English dictionary, where it is sorted with Angst, but you'll find
  it at the end of a Norwegian one because ? is a separate character).

  Unicode is a hybrid of a character and a glyph set.  the reason for this
  is fairly obvious when you consider its major proponents: Xerox and
  Microsoft.  Xerox makes printers and wanted a simple standard for which
  they could make huge fonts.  Microsoft are just too damn stupid to get it
  right or to respect any traditions.  (Xerox didn't want it to replace the
  first ISO 10646 draft, however, so they may be excused.)  in typical "is
  this a font or what?"-misunderstanding, ? was a ligature in Unicode, but
  I complained about it, so ISO 10646-1 has amended it to be a letter, and
  "ij" is a character, not a presentation form, which it should have been.

#:Erik
--
environmentalists are much too concerned with planet earth.  their geocentric
attitude prevents them from seeing the greater picture -- lots of planets are
much worse off than earth is.



Thu, 04 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Quote:
>  e.g., the Spanish purportedly undid the silly sorting requirements
>  of ll (treated as a separate "letter" between k and l, I think it
>  was) due to the force of simplicity and logic of computers (or was
>  it marketing :).

Between "l" and "m".

What it's stupid, IMHO, is not the fact of having "ll" as a single
letter, but having it so, and the same with "ch" (between "c" and "d")
and then having "rr" as r+r and "qu" as q+u. The sound of most of
those characters is not related to their spelling ("ll" is not an l+l,
etc., and "q" is *never* used in isolation in Spanish, it is *always*
q+u, the only case in Spanish where "u" is mute) so in a coherent
world either "ch", "ll", "rr" and "qu" should each be treated as a
single entity, or none of them at all (perhaps the best solution).

Regarding the reform of the sorting requirement, the Spanish RAE
("Real Academia Espa?ola de la Lengua") did it, but I think some
latin-american academies objected and the issue was dropped. Not sure,
thought.

                                                       /L/e/k/t/u

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.0.2i

iQA/AwUBNxr0ev4C0a0jUw5YEQJRdQCfWI/MKMWEMIMt4a28s8WlrhWBlZwAn0Fp
+tn5lYZRhWnsoNfQMxuJ7fML
=n4bS
-----END PGP SIGNATURE-----



Fri, 05 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations

* Erik Naggum
|
| now, ? is not a y with diaeresis at all.  it has more in common with
et

* Philip Lijnzaad
|
| [...] In actual practice, "ij", although one letter (actually,
| diftong), is *always* typed and typeset as an i followed by a j. As
| far as I'm concerned, i'd be happy to ceede this ascii value to more
| important purposes (capital sharp s?)  When upcased, both i and j
| have to be upcased [...]. However, most dictionaries sort the 'ij'
| as two separate letters. Confusing, sortof.

* Lars Marius Garshol
|
| Most? From what I've heard (from Dutch sources, BTW) IJ is sorted as a
| separate letter after Z.  

Not that any of this has much to do with Lisp, but:

- U+00FF (LATIN SMALL LETTER Y DIAERESIS) is described in the Unicode
  standard as being French, not Dutch. This probably explains why
  Philip didn't recognize it as a Dutch letter. It also casts some
  doubt on Erik's explanation that it's "ij" written together.
  I suppose we have to wait for the French to tell us more about this
  (I read some French from time to time, but I don't recall ever
   having seen a ?.)

- The Unicode version of Dutch 'ij', which _is_ "ij" written together
  and is probably what Erik had in mind, is U+0133.  Its upper case
  equivalent is U+0132.  

- IJ is _never_ sorted as a separate letter after Z. Maybe, sometimes,
  it has been sorted as Y (between X and Z). Modern dictionaries sort
  it as I followed by J. So you have '("iets" "ijdel" "ijsje" "ik").

- When a Dutchman doesn't have a U+0133 handy (which is very likely),
  he just uses #\i followed by #\j. As in "ijsje". If this needs
  capitalizing, he'll use #\I followed by #\J. Capitalizing the
  above list would result in '("Iets" "IJdel" "IJsje" "Ik").

* Lars Marius Garshol
|
| And if it's really sorted separately then I think makes sense to
| consider it a separate character, as Unicode more or less does
| (although it calls it a ligature): U+0132 and U+0133.

For _capitalization_ it makes some sense to consider it a separate
character. But _sorting_ will be much more likely to go wrong
when you use a separate character.

Arthur Lemmens



Sat, 06 Oct 2001 03:00:00 GMT  
 Reviews for lisp implementations

| Not that any of this has much to do with Lisp, but:
|
| - U+00FF (LATIN SMALL LETTER Y DIAERESIS) is described in the Unicode
|   standard as being French, not Dutch.

  I said _from_ Dutch "ij".  it's an _imported_ character.  it is used in a
  bunch of names in Belgia that historically had "ij" in their name.

|   It also casts some doubt on Erik's explanation that it's "ij" written
|   together.

  it does?  so the fact that ? is a Danish and Norwegian letter casts doubt
  on its history of being imported from Latin as its a+e ligature, too?
  appreciate that the history of writing systems is not a couple years old.

| - The Unicode version of Dutch 'ij', which _is_ "ij" written together
|   and is probably what Erik had in mind, is U+0133.

  I probably had in mind what I wrote.  so do other people.  please assume
  this next time you feel an overpowering urge to tell people what they
  think.

#:Erik



Sat, 06 Oct 2001 03:00:00 GMT  
 
 [ 40 post ]  Go to page: [1] [2] [3]

 Relevant Pages 
 

 
Powered by phpBB® Forum Software