draft ANSI standard: one change that would *really* help Europe 
Author Message
 draft ANSI standard: one change that would *really* help Europe

[This is posted to comp.lang.c because mod.std.c seems to be dead.  Love
those mod groups!]

While considering my point of view on trigraphs, Laura Creighton pointed
out that the problem is that Europeans really need more than a 7-bit
character set.

In that vein, one possible change to the ANSI standard would require
"char" to be unsigned.  This would double the number of characters
that a strictly conforming program could easily handle, and European
Unix systems could use an 8-bit character set in which the first 128
characters were USASCII.  I believe that the various Unix
internationalization efforts are already doing working in this direction.

No strictly conforming programs would be broken by this change,
since a strictly conforming program cannot assume whether char is
signed or unsigned; in fact, it will make MORE programs strictly
conform, since programs that assume char is unsigned will now conform.

In an 8-bit character set, all the ANSI punctuation as well as all
the national characters could be supported without kludges.
--

Call +1 800 854 7179 or +1 714 540 9870 and order X3.159-198x (ANSI C) for $65.
Then spend two weeks reading it and weeping.  THEN send in formal comments!



Wed, 19 May 1993 01:49:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

ISO Latin 1 is an 8 bit character set that is a superset of ASCII.
Portability, then, is a matter of having standard transliteration rules, e.g.
c-cedilla --> c    , a-ring --> aa

But I sincerely doubt that code with native identifiers would ever make it to
public distribution. Such code is usually commented in English (or an
approximation thereof), with English identifiers (i, j, k, x, y, z, p, c, s
:-).

Jean-Francois Lamy
one day, I may have all the characters I need to type my name :-(






Wed, 19 May 1993 11:24:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

> While considering my point of view on trigraphs, Laura Creighton pointed
> out that the problem is that Europeans really need more than a 7-bit
> character set.

> In that vein, one possible change to the ANSI standard would require
> "char" to be unsigned.  This would double the number of characters
> that a strictly conforming program could easily handle, and European
> Unix systems could use an 8-bit character set in which the first 128
> characters were USASCII.  I believe that the various Unix
> internationalization efforts are already doing working in this direction.

  Actually, as mentioned in Byte magazine about 9-10 months ago, ANSI is
  in the process of soliciting comments regarding its proposed 8-bit ASCII
  standard, which does contain 7-bit ASCII as its first 128 characters, and
  includes all the European characters in the upper 128...  check the Letters
  section of Byte, around February 1986 or so for the exact positions of the
  various characters in the proposed standard...

                                      Bill Wolfe (ahe!k.cc.purdue.edu...)

                                      Purdue University Computing Center



Wed, 19 May 1993 02:50:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

>[This is posted to comp.lang.c because mod.std.c seems to be dead.  Love
>those mod groups!]
>[We really need an 8-bit character set and C needs to acknowledge this]
>[John thinks that perhaps "char" should be unsigned - most programs

  would be more correct since they assUme that chars are unsigned]

Well, I heartily agree, but I think that there must be some programs
out there that assume that chars are useful as small signed numbers,
which I would also prefer not to break.  

Also, I think that having chars have different semantics (assumed unsigned
rather than signed like int) would be a bad thing in general.  Perhaps
what is needed is a "tiny" type (ala long and short) that would be signed
and (for now) essentially a signed char.

Of course, this brings in yet another type (oh no!) and yet another
reserved word, but it would make programs nicer.
        andy
--
Andrew Scott Beals      (member of HASA - A and S divisions)

LLNL, P.O. Box 808, Mailstop L-419, Livermore CA 94550 (415) 423-1948
Primates who don't have tails should keep cats who don't have tails.



Wed, 19 May 1993 13:58:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

>>[This is posted to comp.lang.c because mod.std.c seems to be dead.  Love
>>those mod groups!]

>>[We really need an 8-bit character set and C needs to acknowledge this]

>>[John thinks that perhaps "char" should be unsigned - most programs
>  would be more correct since they assUme that chars are unsigned]

>Well, I heartily agree, but I think that there must be some programs
>out there that assume that chars are useful as small signed numbers,
>which I would also prefer not to break.  

Actually this breaks *LOTS* of programs. We have a compiler with unsigned
chars on some of our machines. This causes endless problems with what is
'affectionately' known as the "EOF bug".

The implementors of that compiler said it was the first thing they would
alter if they reimplemented it because of the number of problems it caused.
If you want to see for yourself have a look through your sources and find
every occurence of a comparision between EOF or -1, and a char. Typically,
where cp is a character pointer:-

                if (*cp == EOF)
        or
                while (*cp != EOF)

Older code is littered with these constructs.

        sean



Wed, 19 May 1993 07:04:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:
> >[We really need an 8-bit character set and C needs to acknowledge this]

> >[John thinks that perhaps "char" should be unsigned - most programs
>   would be more correct since they assUme that chars are unsigned]

> Well, I heartily agree, but I think that there must be some programs
> out there that assume that chars are useful as small signed numbers,
> which I would also prefer not to break.  

> Also, I think that having chars have different semantics (assumed unsigned
> rather than signed like int) would be a bad thing in general.  Perhaps
> what is needed is a "tiny" type (ala long and short) that would be signed
> and (for now) essentially a signed char.

> Of course, this brings in yet another type (oh no!) and yet another
> reserved word, but it would make programs nicer.
>    andy

Andy,

The standard does allow for a small signed char type called (would you believe)
"signed char". From section 3.1.2.5 of the draft dated Oct. 1, 1986.

        A signed char occupies the same amount of storage as a "plain" char.
        A "plain" int has the natural size suggested by the architecture of the
        execution environment ...

The committee wanted to "fix" the question of signedness of a char but couldn't
arrive at an acceptable compromise. We thought about having chars be signed
and unsigned chars unsigned but we were afraid it would break too much code that
depended on chars being unsigned. We ended up adopting the compromise of:
        char    - signed or unsigned, implementation defined
        unsigned char
        signed char

By the way, the draft is now released for formal public review, so if you
have any other technical comment, fire away now or it will be too late!

                                        a humble member of X3J11,
                                        Joe Mueller
                                        ...!nsc!nscpdc!joemu



Wed, 19 May 1993 12:47:00 GMT  
 draft ANSI standard: one change that would *really* help Europe
[]

One way to use the European chars, while leaving 'char' signed, is
to explicitly declare char variables 'unsigned' in European programs.
Since there is no 8-bit standard just yet, would this requirements
break many existing programs?

As for the use of chars for very short signed integers: seems to me
like a bad practice.  That's what 'short' is for.  Perhaps another,
smaller type ('tiny') would be useful, as has been suggested.

I object, though, to the complaints you hear from some people when a
given compiler breaks their code by using an unexpected size for, say,
'int's.  K&R say explicitly that no sizes are guaranteed, and that their
intention was that 'short' would be shorter than 'long', and 'int'
would be the "most natural word size for the machine".  On the 68000, say,
that pretty much translates to 8, 16 and 32 bits for short, int and long.

I suggest that the standard require:

        'short' to be at least 8 bits,

        'int' to be at least 16 bits (but 32 bits NOT promised!),

        'long' to be at least 32 bits and long enough to hold a pointer.

These requirements, being MINIMUM sizes, would only guarantee that a
program which assumes them would work everywhere.  But note: the
assumption that (unsigned shorts) 0200 + 0200 == 0 is NOT legal,
since it assumes the size is NO MORE than 8 bits!

- Moshe Braner



Wed, 19 May 1993 01:12:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

>   Actually, as mentioned in Byte magazine about 9-10 months ago, ANSI is
>   in the process of soliciting comments regarding its proposed 8-bit ASCII
>   standard, which does contain 7-bit ASCII as its first 128 characters, and
>   includes all the European characters in the upper 128... check the Letters
>   section of Byte, around February 1986 or so for the exact positions of the
>   various characters in the proposed standard...

>                                  Bill Wolfe (ahe!k.cc.purdue.edu...)

>                                  Purdue University Computing Center

    Make that the August or September 1985 issues...


Wed, 19 May 1993 12:21:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

> >>[John thinks that perhaps "char" should be unsigned - most programs
> >  would be more correct since they assUme that chars are unsigned]
> Actually this breaks *LOTS* of programs. We have a compiler with unsigned
> chars on some of our machines. This causes endless problems with what is
> 'affectionately' known as the "EOF bug".
> If you want to see for yourself have a look through your sources and find
> every occurence of a comparision between EOF or -1, and a char. Typically,
> where cp is a character pointer:-
>            if (*cp == EOF)
>            while (*cp != EOF)

Not in our code!  This type of code is not likely to work, even under K & R.
ANSI is only trying not to break *legal* programs.  The above essentially
is trying to use 255 (or whatever) instead of 0 as a string terminator.
Even if there was a legitimate reason for this, EOF is the wrong name
to use since it is _already defined as a return value of stdio functions_.

This code was broken already.

It's too bad that type checking doesn't can this sort of thing.  I wish
there was a way to define an enum type that is either a char or EOF, and
declare stdio functions to return that type.  Then if only enum weren't so
loose about converting to other types without a cast.  Sigh.

P.S.  Are you trying to tell me that official unix utilities are written like
that?
--
Stuart D. Gathman       <..!seismo!dgis!bms-at!stuart>



Wed, 19 May 1993 21:05:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

>Actually this breaks *LOTS* of programs. We have a compiler with unsigned
>chars on some of our machines. This causes endless problems with what is
>'affectionately' known as the "EOF bug".
>The implementors of that compiler said it was the first thing they would
>alter if they reimplemented it because of the number of problems it caused.
>If you want to see for yourself have a look through your sources and find
>every occurence of a comparision between EOF or -1, and a char. Typically,
>where cp is a character pointer:-
>            if (*cp == EOF)
>    or
>            while (*cp != EOF)
>Older code is littered with these constructs.
>    sean

Ugh.  "Littered" is the right term.  Not only is this nonportable, it will
be FOOLED on systems where it normally works (signed char, 2's complement
representation) given the character '\377'.  So if you getchar(), say, upon
an arbitrary binary file and you are looking for EOF you are likely scr*wed
with this kind of code.

EOF is meant to be an out-of-band value for things like getchar() etc.
That's why they return int, and not char.

(Lint should warn about this kind of comparison.  I have learned the slow,
hard way that when I get C code from elsewhere, yes even the mighty BTL, I
lint it first and fix the warnings before compiling on a system other than
from whence it came!)

Dan



Wed, 19 May 1993 00:44:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

....
>> If you want to see for yourself have a look through your sources and find
>> every occurence of a comparision between EOF or -1, and a char. Typically,
>> where cp is a character pointer:-

>>                if (*cp == EOF)

>>                while (*cp != EOF)

>Not in our code!  This type of code is not likely to work, even under K & R.

It will work on any machine that allows signed chars (despite being ideologically
unsound!)

Quote:
>ANSI is only trying not to break *legal* programs.  The above essentially
>is trying to use 255 (or whatever) instead of 0 as a string terminator.
>Even if there was a legitimate reason for this, EOF is the wrong name
>to use since it is _already defined as a return value of stdio functions_.

The case I was thinking of here is reading on a pipe. This seems to be
popular. The use of EOF is valid used in this context.

Another favorite is assigning the result of getchar to a char and then
testing to see if the char is -1.

There are others ....

Quote:

>This code was broken already.

>It's too bad that type checking doesn't can this sort of thing.  I wish
>there was a way to define an enum type that is either a char or EOF, and
>declare stdio functions to return that type.  Then if only enum weren't so
>loose about converting to other types without a cast.  Sigh.

>P.S.  Are you trying to tell me that official unix utilities are written like
>that?

Yes, worse luck :-(


Wed, 19 May 1993 11:10:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:

> The committee wanted to "fix" the question of signedness of a char but
> couldn't arrive at an acceptable compromise. We thought about having
> chars be signed and unsigned chars unsigned but we were afraid it would
> break too much code that depended on chars being unsigned. We ended up
> adopting the compromise of:
>    char    - signed or unsigned, implementation defined
>    unsigned char
>    signed char

Of course, this compromise breaks all the code that depends on chars
being EITHER signed OR unsigned!  To be portable and "strictly
conforming", you can't depend on =chars having signs= or =chars having no
signs=, you just can't depend.

I would rather they had broken half the code that makes assumptions,
rather than all of it.
--

Call +1 800 854 7179 or +1 714 540 9870 and order X3.159-198x (ANSI C) for $65.
Then spend two weeks reading it and weeping.  THEN send in formal comments!



Wed, 19 May 1993 02:53:00 GMT  
 draft ANSI standard: one change that would *really* help Europe
Organization : California Institute of Technology
Keywords:

Path: oddhack!jon

Quote:


>> The committee wanted to "fix" the question of signedness of a char but
>> couldn't arrive at an acceptable compromise. We thought about having
>> chars be signed and unsigned chars unsigned but we were afraid it would
>> break too much code that depended on chars being unsigned. We ended up
>> adopting the compromise of:
>>        char    - signed or unsigned, implementation defined
>>        unsigned char
>>        signed char

>Of course, this compromise breaks all the code that depends on chars
>being EITHER signed OR unsigned!  To be portable and "strictly
>conforming", you can't depend on =chars having signs= or =chars having no
>signs=, you just can't depend.

>I would rather they had broken half the code that makes assumptions,
>rather than all of it.

        I fail to see how this choice 'breaks' ANY code. It is
not possible to write portable code with either of the above assumptions
now. It will not be possible under ANSI - but it will then be possible
to explicitly choose signed chars if you want. What broke? If you
are saying ANSI should have chosen chars to always be signed or unsigned
just so currently broken code will become non-broken, I don't agree
with the complaint. Who knows how much inefficiency may result?


    Caltech Computer Science Graphics Group



Wed, 19 May 1993 04:56:00 GMT  
 draft ANSI standard: one change that would *really* help Europe

Quote:
> >       char    - signed or unsigned, implementation defined
> >       unsigned char
> >       signed char

> Of course, this compromise breaks all the code that depends on chars
> being EITHER signed OR unsigned!  To be portable and "strictly
> conforming", you can't depend on =chars having signs= or =chars having no
> signs=, you just can't depend.

> I would rather they had broken half the code that makes assumptions,
> rather than all of it.

It seems to me that what ANSI has done is maintain the status quo. Currently
whether or not characters are signed is implementation dependent. To write
portable code, you must make no assumptions about the signedness of characters.
The same situation will exist with the ANSI standard. Code which currently
works on a particular implementation should continue to work (unless the
implementation default is changed which seems unlikely). New code can be
written portably using signed or unsigned characters.

The solution chosen by ANSI seems to me to have broken no code so why change
to a solution which would break half. You can't expect ANSI to take
non-portable code and magically make it portable.

--
Ken Thompson  Phone : (404) 894-7089
Georgia Tech Research Institute
Georgia Insitute of Technology, Atlanta Georgia, 30332
...!{akgua,allegra,amd,hplabs,ihnp4,seismo,ut-ngp}!gatech!gitpyr!thomps



Wed, 19 May 1993 10:05:00 GMT  
 
 [ 33 post ]  Go to page: [1] [2] [3]

 Relevant Pages 

1. draft ANSI standard: major, quiet, unnoticed change

2. Draft ANSI C standard (again)

3. Draft ANSI C standard

4. draft ANSI standard: needs your tomatoes

5. draft ANSI standard: trigraphs rear their ugly heads again

6. draft ANSI standard: trigraphs rear their ugly heads again

7. Global Eng (draft ANSI std dist) address change

8. C standard draft VS C standard

9. Diff between Draft Proposed ANSI C and ANSI C K&R Books

10. ANSI-C standards changed ?

11. OT - Strangeness at the draft standard download site

12. Looking for text of C9X draft standard

 

 
Powered by phpBB® Forum Software