C/C++ Future - A standard proposal for numbers 
Author Message
 C/C++ Future - A standard proposal for numbers

C/C++ Future - A standard proposal for numbers

Friday, May 14, 1999

Abstract:

This is a call for a discussion and feedback on ways that programmers,
software engineers and computer scientists can make our lives easier by
putting some constructive criticism into addressing the short-comings
and issues of specifying integers of a known bit-length in a portable
fashion in C/C++.

This document is only meant as a starting point.  If solutions already
exist, please feel free to email links, documents, etc. at the above
mentioned email address.

Discussion:

We have different implementations of C/C++ on machines with varying:
 - MAU (Minimum Addressable Unit) ranging from a 4 bits to 10 bits.
   (chars range from 8 bits up to 10 bits.)
   (a nibble is the MAU on the HP48 calculator series,
    with 5 nibbles making up an 20-bit address space)
 - address spaces ranging from 0 bits up past 128-bits, and
 - ints natively supported by hardware from 4-bits up past 64-bits.
   (along with programmer implemented custom long ints of n-bits.)

The problem of how many values does a specified bit-length specify will
also have to be addressed, but is skipped in this proposal.

Using hacks such as "long long" do not help readability or portability.
Do we allow 'long long long' to represent a 128-bit integer?
What about 256-bit integers?

We must simply plan for larger integer sizes before such "de facto"
standards are entrenched.

Some of the criteria we must consider are:
 - readability (from a human view point)
 - portability (from a compiler perspective)
 - extendibility (from an engineering paradigm)

Bobby Schmidt has proposed one elegant scheme using templates.
(Refer to: C/C++ Users Journal, January 1988, "All This and C++ Too!")
i.e.
   int<8>       i;
   unsigned<64> u;

Unfortunately C does not support templates, so we are forced
to address this issue in another way.  This is real shame since the
bschmidt-integer-template is very readable, portable, and extendible.

Personally, for the past few years when I need a portable "guaranteed
known bit-length", I have been using a scheme where the type starts with
'u' or 's', and then is followed by the number of bits the integer needs
for the implementation, followed by 'bit'
i.e.
        //   char    8 bits
        //   short  16 bits
        //   long   32 bits
        //
        typedef unsigned char   u8bit;
        typedef unsigned short u16bit;
        typedef unsigned long  u32bit;

        typedef signed char   s8bit;
        typedef signed short s16bit;
        typedef signed long  s32bit;

i.e.
   u8bit  max_8_bit  = 255;
   u16bit max_16_bit = 65535;
   u32bit max_32_bit = 4294967295;

Constant literals should not have to be specified, as the compiler
should be able to "figure" this out by looking at the type of the
variable/constant.  Although if we must include literals, one could use
an intuitive scheme like this:
   s8bit  test_8_bit  = -1<8bit>;
   s16bit test_16_bit = -1<16bit>;
   s32bit test_32_bit = -1<32bit>;
   s64bit test_64_bit = -1<64bit>;

I think its a lot cleaner then:
   unsigned long long test_64_bit = -1ULL;

I have seen another scheme proposed (the numbers explicitly mean bits)
  int8
  int16
  int32
  int64
One way to specify signed or unsigned integers is:
  // signed
  sint8
  sint16
  sint32
  sint64
  // unsigned
  uint8
  uint16
  uint32
  uint64

Turning our attention for a minute to the naming and specifying the
precision of floating-point values, we find various sizes have also been
problematic.

   sizeof( float )         // how many bits? what is the specified
range?
   sizeof( double )        // how many bits? what is the specified
range?
   sizeof( double double ) // non-portable!

Why not overhaul the floating point naming system as well:
(the numbers explicitly mean bytes)

   float4;  // same as float
   float8;  // same as double
   float16; // same as double-double
   float32; // using 32 bytes to represent a float point number.

Or if we wanted to be pedantic, we could be consistent and use bits:
   float32;
   float64;
   float128;
   float256;

The problem of representing big numbers (i.e. using a million bits to
represent a number for certain math problems) still remains, but a
programmer is free to implement his own version taking into
consideration the constraints and precision specific to his problem.

On a 8-bit architecture, accessing a 32-bit value imposes considerable
overhead.  (C/C++ runs on 'small' embedded CPUs all the way up to
supercomputers and must remained designed as the 'portable assembler.')
There should be a way to query the compiler if a built-in type is native
or not.
i.e. // numbers represent number of bits
   if (sizeof(int<8>) == MAX_INT_SIZE)
      cout << "Largest native int is 8 bits";
   else
   if (sizeof(int<16>) == MAX_INT_SIZE)
      cout << "Largest native int is 16 bits";
   else
   if (sizeof(int<32>) == MAX_INT_SIZE)
      cout << "Largest native int is 32 bits";
   else
   if (sizeof(int<64>) == MAX_INT_SIZE)
      cout << "Largest native int is 64 bits";
   else
      cout << "Largest native int is " << MAX_INT_SIZE;

A remaining issue that I have not seen addressed is binary constant
literals.  C/C++ has decimal, octal, and hexadecimal constant literals.
i.e.
 int d = 10  ; // specified in base 10
 int o = 010 ; // 8 in base 10
 int h = 0x10; // 16 in base 10

Personally, I have never seen the need for octal constants, but for
completeness, I propose a simple scheme for binary constant literals:

   <binary-constant-literal> ::= 0z { binary-digit }*

i.e.
 int b = 0z1101; // specified in base 2

Currently I the best work-around solution I have come up is:
        const int Flag_one      = (1 << 0); // 00001
        const int Flag_similiar = (1 << 1); // 00010
        const int Flag_yet      = (1 << 2); // 00100
        const int_Flag_another  = (1 << 3); // 01000
        const int Flag_example  = (1 << 4); // 10000

But it would be much nicer to be able to do:
        const int Flag_one      = 0z00001;
        const int Flag_similiar = 0z00010;
        const int Flag_yet      = 0z00100;
        const int_Flag_another  = 0z01000;
        const int Flag_example  = 0z10000;

I would appreciate feedback as to whether the above mentioned
considerations are indeed problematic, or are just wishful thinking on
my part.

Cheers



Tue, 30 Oct 2001 03:00:00 GMT  
 C/C++ Future - A standard proposal for numbers

Quote:

>C/C++ Future - A standard proposal for numbers

>Friday, May 14, 1999

>Abstract:

>This is a call for a discussion and feedback on ways that programmers,
>software engineers and computer scientists can make our lives easier by
>putting some constructive criticism into addressing the short-comings
>and issues of specifying integers of a known bit-length in a portable
>fashion in C/C++.

>This document is only meant as a starting point.  If solutions already
>exist, please feel free to email links, documents, etc. at the above
>mentioned email address.

>Discussion:

>We have different implementations of C/C++ on machines with varying:
> - MAU (Minimum Addressable Unit) ranging from a 4 bits to 10 bits.
>   (chars range from 8 bits up to 10 bits.)
>   (a nibble is the MAU on the HP48 calculator series,
>    with 5 nibbles making up an 20-bit address space)
> - address spaces ranging from 0 bits up past 128-bits, and
> - ints natively supported by hardware from 4-bits up past 64-bits.
>   (along with programmer implemented custom long ints of n-bits.)

>The problem of how many values does a specified bit-length specify will
>also have to be addressed, but is skipped in this proposal.

>Using hacks such as "long long" do not help readability or portability.
>Do we allow 'long long long' to represent a 128-bit integer?
>What about 256-bit integers?

>We must simply plan for larger integer sizes before such "de facto"
>standards are entrenched.

>Some of the criteria we must consider are:
> - readability (from a human view point)
> - portability (from a compiler perspective)
> - extendibility (from an engineering paradigm)

>Bobby Schmidt has proposed one elegant scheme using templates.
>(Refer to: C/C++ Users Journal, January 1988, "All This and C++ Too!")
>i.e.
>   int<8>       i;
>   unsigned<64> u;

>Unfortunately C does not support templates, so we are forced
>to address this issue in another way.  This is real shame since the
>bschmidt-integer-template is very readable, portable, and extendible.

>Personally, for the past few years when I need a portable "guaranteed
>known bit-length", I have been using a scheme where the type starts with

Usually, if you depend on integral types that have a specific length,
you writing platform specific code rather than portable code.

In C there is already an adequate solution; simply know what the minimum
ranges of the various types are.  Then use a type which satisfies
your range requirements, and code in a way that doesn't assume that the
type is no larger than what you need of it.

The C9X draft adds the header <inttypes.h> which gives the programmer
more choice.

It is more efficient for the program to be written to not depend on the
specific integer size, than for the language implementation to emulate
a specific size that the hardware doesn't ordinarily offer.

For example, you could write a compiler for a 36 bit platform which
does 32 bit arithmetic, so that badly written programs which assume
that long or unsigned long is 32 bits wide will port without changes.

But those programs will probably run less efficiently than programs
that will simply port to a 36 bit machine using the native
arithmetic.

Quote:
>'u' or 's', and then is followed by the number of bits the integer needs
>for the implementation, followed by 'bit'
>i.e.
>    //   char    8 bits
>    //   short  16 bits
>    //   long   32 bits

The C language doesn't have // comments, though C9X adds them. It
appears you have been using non-C compilers over the past few years. :)

Quote:
>    typedef unsigned char   u8bit;
>    typedef unsigned short u16bit;
>    typedef unsigned long  u32bit;

A lot of programmers have been doing this sort of thing, which is why
C9X adds <inttypes.h>. THis header gives you a way to test, at compile time,
whether integers some given size (8, 16, 32 or 64) are available,
and typedefs that represent them if they are available.

It also gives you a way to choose between space and performance; you can,
essentially, choose the fastest type that can hold 32 bits (or more), or the
smallest.

Before you reinvent the wheel, have a look at the existing proposed changes to
the language.

There is a newsgroup for proposals for extending or improving the C language,
namely comp.std.c.



Tue, 30 Oct 2001 03:00:00 GMT  
 C/C++ Future - A standard proposal for numbers

Quote:

> Usually, if you depend on integral types that have a specific length,
> you writing platform specific code rather than portable code.

I guess I didn't fully explain the problem that I'm trying solve.

I need exact bit length integrals in C/C++, and I don't have time to be
tweaking my code for every platform.

I think an example would help make this clearer...

Lets say I'm developing a simplistic cross platform Yet Another Bitmap
Format

i.e.
        struct my_bitmap_t
        {
                u32bit width;
                u32bit height
                u32bit bits_per_pixel;
                u8bit *pixelmap;
        }

I don't have time to tweak my platform independent saving/loading code
for each platform.  I should just be able to write the code once (and
assuming the member field alignment is already taken care of) and have
any platform I compile my new library on being able save/read the file
format, then have another OS/platform use the same code, and read/save
the file without out any side effects.

I just picked a graphics file format example since it's probably the
most common problem where exact bit-lengths are required.

Quote:
> In C there is already an adequate solution; simply know what the minimum
> ranges of the various types are.  Then use a type which satisfies
> your range requirements, and code in a way that doesn't assume that the
> type is no larger than what you need of it.

C only specifies the minimum length, not a maximum, unless I'm mistaken.

Quote:
> The C9X draft adds the header <inttypes.h> which gives the programmer
> more choice.

I'll take another look at <inttypes.h>. The last time I looked, the
<inttypes> interface was not human readable friendly.  But if it does
the job, then I guess that's what I'm after.

Quote:
> It is more efficient for the program to be written to not depend on the
> specific integer size, than for the language implementation to emulate
> a specific size that the hardware doesn't ordinarily offer.

I agree, IDEALLY it would be nice to do just
        struct my_bitmap_t
        {
                int width;
                int height;
                int bits_per_pixel;
                char *pixelmap;
        }

and gain the efficiency of the native types, but ints are different
lengths on different platforms. (i.e. 16, 32, etc.)

The other little kink is, my files need to be read/saved on
little-endian and big-endian platforms.  (Although I got some macros
which solve this problem.)

Quote:
> For example, you could write a compiler for a 36 bit platform which
> does 32 bit arithmetic, so that badly written programs which assume
> that long or unsigned long is 32 bits wide will port without changes.

> But those programs will probably run less efficiently than programs
> that will simply port to a 36 bit machine using the native
> arithmetic.

All the more reason I like the template look: int<36>

My needs are just for a portable 8 bit, 16 bit, 32 bit, and 64 bit type.
Integer and floating point in a easy to read human format.

Quote:
> >'u' or 's', and then is followed by the number of bits the integer needs
> >for the implementation, followed by 'bit'
> >i.e.
> >       //   char    8 bits
> >       //   short  16 bits
> >       //   long   32 bits

> The C language doesn't have // comments, though C9X adds them. It
> appears you have been using non-C compilers over the past few years. :)

Yes, I know // comments are not in standard C (yet) although all the C
compilers I've used support them.  (Well, actually the pre-processors
did.)

Quote:
> >       typedef unsigned char   u8bit;
> >       typedef unsigned short u16bit;
> >       typedef unsigned long  u32bit;

> A lot of programmers have been doing this sort of thing, which is why
> C9X adds <inttypes.h>. THis header gives you a way to test, at compile time,
> whether integers some given size (8, 16, 32 or 64) are available,
> and typedefs that represent them if they are available.

> It also gives you a way to choose between space and performance; you can,
> essentially, choose the fastest type that can hold 32 bits (or more), or the
> smallest.

> Before you reinvent the wheel, have a look at the existing proposed changes to
> the language.
> There is a newsgroup for proposals for extending or improving the C language,
> namely comp.std.c.

I thought I would "submit" a proposal and receive some feedback on the
idea.
At least I've got some feedback on how to proceed, which is what I was
partially looking for.

Thx for the constructive criticism.

Cheers



Fri, 02 Nov 2001 03:00:00 GMT  
 C/C++ Future - A standard proposal for numbers

Quote:


>> Usually, if you depend on integral types that have a specific length,
>> you writing platform specific code rather than portable code.

>I guess I didn't fully explain the problem that I'm trying solve.

>I need exact bit length integrals in C/C++, and I don't have time to be
>tweaking my code for every platform.

>I think an example would help make this clearer...

>Lets say I'm developing a simplistic cross platform Yet Another Bitmap
>Format

>i.e.
>    struct my_bitmap_t
>    {
>            u32bit width;
>            u32bit height
>            u32bit bits_per_pixel;
>            u8bit *pixelmap;
>    }

>I don't have time to tweak my platform independent saving/loading code
>for each platform.  I should just be able to write the code once (and
>assuming the member field alignment is already taken care of) and have
>any platform I compile my new library on being able save/read the file
>format, then have another OS/platform use the same code, and read/save
>the file without out any side effects.

>I just picked a graphics file format example since it's probably the
>most common problem where exact bit-lengths are required.

But you also have to worry about byte order and compiler-specific padding in
the structure. So you are better off to declare the thing as

        struct my_bitmap_t
        {
                unsigned long width;
                unsigned long height;
                unsigned long bpp;
                unsigned char *pixelmap;
        }

and then write a couple of portable routines for marshalling and unmarshaling.
These routines can be written easily such that they don't care about the local
byte order; shifting and masking takes care of that.

Suppose that ``unsigned long'' turns out to be 64 bits. Big deal; as long as
your code doesn't assume that the values are exactly 32, it doesn't have to
bother you.

If you are worried about speed, you can write special versions of the
marshalling routines targetted to certain platforms.  This would probably be
worthwhile only on machines whose byte order matches that of the internal
representation.



Sat, 03 Nov 2001 03:00:00 GMT  
 C/C++ Future - A standard proposal for numbers

Quote:
>C/C++ Future - A standard proposal for numbers

>Friday, May 14, 1999
>A remaining issue that I have not seen addressed is binary constant
>literals.  C/C++ has decimal, octal, and hexadecimal constant literals.
>i.e.
> int d = 10  ; // specified in base 10
> int o = 010 ; // 8 in base 10
> int h = 0x10; // 16 in base 10

>Personally, I have never seen the need for octal constants, but for
>completeness, I propose a simple scheme for binary constant literals:

>   <binary-constant-literal> ::= 0z { binary-digit }*

Yep, that would be really useful, I guess most programmers are pretty
adept by now at changing between binary and hex, but thats not a good
enough reason not to make things easier.

On fixed size int's, I suspect you may be onto a looser, for reasons
that Kaz has explained.

John Maddock
http://ourworld.compuserve.com/homepages/John_Maddock/



Sat, 03 Nov 2001 03:00:00 GMT  
 C/C++ Future - A standard proposal for numbers

Quote:



> >I think an example would help make this clearer...

> >Lets say I'm developing a simplistic cross platform Yet Another Bitmap
> >Format

> >i.e.
> >       struct my_bitmap_t
> >       {
> >               u32bit width;
> >               u32bit height
> >               u32bit bits_per_pixel;
> >               u8bit *pixelmap;
> >       }

> But you also have to worry about byte order and compiler-specific padding in
> the structure.

I already have those taken care of.

Of course it would be _nice_ if the member padding issue was
semi-standardized but I digress.

Quote:
> So you are better off to declare the thing as

>         struct my_bitmap_t
>         {
>                 unsigned long width;
>                 unsigned long height;
>                 unsigned long bpp;
>                 unsigned char *pixelmap;
>         }

> and then write a couple of portable routines for marshalling and unmarshaling.
> These routines can be written easily such that they don't care about the local
> byte order; shifting and masking takes care of that.

I've only seen marshaling in OS design w/ respect to Remote Procedure
Calls (RPCs.)
but I see that the ideology holds here too.  I'll have to go dig up my
OS books.

Quote:
> Suppose that ``unsigned long'' turns out to be 64 bits. Big deal; as long as
> your code doesn't assume that the values are exactly 32, it doesn't have to
> bother you.

If I save my bitmap on a 64-bit long cpu, a then try to read it in on a
16-bit cpu, it will get loaded wrong, so I still need cross platform
persistent objects with fixed-length integral (and floating point)
types.  

Wouldn't I be better off doing:
         struct my_bitmap_t
         {
                 int width;
                 int long height;
                 int long bpp;
                 unsigned char *pixelmap;
         }
and letting the marshaling function handle the conversions?

So is writing the marshaling proxy stub the best compromise?

Could you please give some details on how the "Right Way" to implement
this would be?  Does the marshaling function deal with one field at a
time?

Digressing for a moment...  Why isn't there a <floattypes.h> in the C++
spec?
I believe it has the same problems.

Quote:
> If you are worried about speed, you can write special versions of the
> marshalling routines targetted to certain platforms.  This would probably be
> worthwhile only on machines whose byte order matches that of the internal
> representation.

If I had the time, yes, I could tune each lib / marshaling func for each
platform.
I need portability before speed.

Cheers



Sat, 03 Nov 2001 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Newbie: separate big .cs file into small .cs files

2. word - automatic numbering/bold/underline/italics

3. Proposal for a standard C/Operating System Interface Library

4. PROPOSAL: New functions for C Standard Library

5. Random numbers in standard C/C++

6. Need C++ text for non cs major course

7. The ansi C standard looks to the future as well, folks

8. Future of C++ and Windows Programming :: C++

9. FUTURE SOFTWARE FUTURES

10. How to show/call Form2.cs from Form1.cs ?

11. Include code in other Cs files

12. Unable to install Duwamish 7 CS

 

 
Powered by phpBB® Forum Software