RAD50 (was: Q: Why not (2^n)-bit?) 
Author Message
 RAD50 (was: Q: Why not (2^n)-bit?)


Quote:



>>>This is a large step backward.

>>>Are the XML tags English words? (I don't know much of anything about it.) If
>>>so, shouldn't they follow the English rules for capitalization?

>>Not neccessarily. You can, if you are so inclined, go and write the tags
>>in Japanese using Kanji. Now make _that_ one case-insensitive. There are
>>enough languages with special monocase characters, i.e. there is no separate
>>lowercase _and_ uppercase version of the character (like the german sharp s).
>>You would end up needing several hundred KB of character case translation
>>tables for every XML application if you want to handle all the special cases
>>correctly. Very ugly. And don't forget chinese ...

>It's not that bad at all. Take a look in

> http://www.*-*-*.com/

>The file UnicodeData.txt contains all Unicode characters (except
>Chinese and other ideographs, which are in Unihan.txt). It contains
>info on one-to-one case mappings. The file SpecialCasing.txt contains
>exceptions.

>But the real goodie is CaseFolding.txt. It defines, for every
>character that needs it, a mapping to one or more characters so that
>you can compare case-insensitively.

>Example: German es-zet "?", maps to "ss", Swedish "?" (O with two
>dots) maps to "?" (small o with two dots). The ligature "ffl" maps to
>"ffl" (three characters).

>The file is in plain text, one character per line, and some comments.
>The file is 821 lines. Not bad.

>BTW, I entirely agree with Jay Maynard. There are limits to how much
>we should adjust to machines. Consider a Unix system. I don't have a
>problem with having commands and options case-sensitive but I really
>would like filenames case-insensitive. But anyone who has used a Unix
>command line for a while know how people name their files: relatively
>short, with a few dots in them, no funny characters such spaces - and
>all-lowercase. Surely I should be able to have a file name Bengt
>without it being a hassle?

>I was very pleasantly surprised by the above Unicode files. It was a
>lot easier than I thought.

>The right way to do it is to write a C-program to translate the file
>into a C-program for a case-insensitive compare. Define a function say
>"utf8_strcasecmp()".

The right way todoit is to write a locale definition for
character set utf8, category LC_COLLATE then you can use the
standard functions setlocale() and strcoll()/strxfrm() or
wcscoll()/wcsxfrm().

Quote:
>Ideographic alphabets don't have case. The only ones that have are
>Latin, Cyrillic, Greek, and Armenian. I thought Hebrew had, but
>apparently not.

[pdp11 groups trimmed, c.l.c.m added]
Thanks. Take care, Brian Inglis         Calgary, Alberta, Canada
--

                                use address above to reply
--



Fri, 25 Apr 2003 03:00:00 GMT  
 
 [ 1 post ] 

 Relevant Pages 

1. Why am I not getting correct position?

2. Why am I not intercepting Enter key (VK_RETURN)

3. Why am I not getting a LIB?

4. I want GCC wchar_t to be 16-bit not 32-bit

5. deque - I am being bitten

6. Why am I getting an ExecutionEngineException?

7. Why am I being ignored ???

8. Why am I seeing such a delay between my catch block and my finally block

9. Macro with functions problem -- why am I stupid?

10. Why am I getting a segmentation fault?

11. Why am I getting a General protection Fault?

12. Why I am getting Error C2091: function returns function

 

 
Powered by phpBB® Forum Software