C/C++ Future - A standard proposal for numbers

Friday, May 14, 1999

Abstract:

This is a call for a discussion and feedback on ways that programmers,

software engineers and computer scientists can make our lives easier by

putting some constructive criticism into addressing the short-comings

and issues of specifying integers of a known bit-length in a portable

fashion in C/C++.

This document is only meant as a starting point. If solutions already

exist, please feel free to email links, documents, etc. at the above

mentioned email address.

Discussion:

We have different implementations of C/C++ on machines with varying:

- MAU (Minimum Addressable Unit) ranging from a 4 bits to 10 bits.

(chars range from 8 bits up to 10 bits.)

(a nibble is the MAU on the HP48 calculator series,

with 5 nibbles making up an 20-bit address space)

- address spaces ranging from 0 bits up past 128-bits, and

- ints natively supported by hardware from 4-bits up past 64-bits.

(along with programmer implemented custom long ints of n-bits.)

The problem of how many values does a specified bit-length specify will

also have to be addressed, but is skipped in this proposal.

Using hacks such as "long long" do not help readability or portability.

Do we allow 'long long long' to represent a 128-bit integer?

What about 256-bit integers?

We must simply plan for larger integer sizes before such "de facto"

standards are entrenched.

Some of the criteria we must consider are:

- readability (from a human view point)

- portability (from a compiler perspective)

- extendibility (from an engineering paradigm)

Bobby Schmidt has proposed one elegant scheme using templates.

(Refer to: C/C++ Users Journal, January 1988, "All This and C++ Too!")

i.e.

int<8> i;

unsigned<64> u;

Unfortunately C does not support templates, so we are forced

to address this issue in another way. This is real shame since the

bschmidt-integer-template is very readable, portable, and extendible.

Personally, for the past few years when I need a portable "guaranteed

known bit-length", I have been using a scheme where the type starts with

'u' or 's', and then is followed by the number of bits the integer needs

for the implementation, followed by 'bit'

i.e.

// char 8 bits

// short 16 bits

// long 32 bits

//

typedef unsigned char u8bit;

typedef unsigned short u16bit;

typedef unsigned long u32bit;

typedef signed char s8bit;

typedef signed short s16bit;

typedef signed long s32bit;

i.e.

u8bit max_8_bit = 255;

u16bit max_16_bit = 65535;

u32bit max_32_bit = 4294967295;

Constant literals should not have to be specified, as the compiler

should be able to "figure" this out by looking at the type of the

variable/constant. Although if we must include literals, one could use

an intuitive scheme like this:

s8bit test_8_bit = -1<8bit>;

s16bit test_16_bit = -1<16bit>;

s32bit test_32_bit = -1<32bit>;

s64bit test_64_bit = -1<64bit>;

I think its a lot cleaner then:

unsigned long long test_64_bit = -1ULL;

I have seen another scheme proposed (the numbers explicitly mean bits)

int8

int16

int32

int64

One way to specify signed or unsigned integers is:

// signed

sint8

sint16

sint32

sint64

// unsigned

uint8

uint16

uint32

uint64

Turning our attention for a minute to the naming and specifying the

precision of floating-point values, we find various sizes have also been

problematic.

sizeof( float ) // how many bits? what is the specified

range?

sizeof( double ) // how many bits? what is the specified

range?

sizeof( double double ) // non-portable!

Why not overhaul the floating point naming system as well:

(the numbers explicitly mean bytes)

float4; // same as float

float8; // same as double

float16; // same as double-double

float32; // using 32 bytes to represent a float point number.

Or if we wanted to be pedantic, we could be consistent and use bits:

float32;

float64;

float128;

float256;

The problem of representing big numbers (i.e. using a million bits to

represent a number for certain math problems) still remains, but a

programmer is free to implement his own version taking into

consideration the constraints and precision specific to his problem.

On a 8-bit architecture, accessing a 32-bit value imposes considerable

overhead. (C/C++ runs on 'small' embedded CPUs all the way up to

supercomputers and must remained designed as the 'portable assembler.')

There should be a way to query the compiler if a built-in type is native

or not.

i.e. // numbers represent number of bits

if (sizeof(int<8>) == MAX_INT_SIZE)

cout << "Largest native int is 8 bits";

else

if (sizeof(int<16>) == MAX_INT_SIZE)

cout << "Largest native int is 16 bits";

else

if (sizeof(int<32>) == MAX_INT_SIZE)

cout << "Largest native int is 32 bits";

else

if (sizeof(int<64>) == MAX_INT_SIZE)

cout << "Largest native int is 64 bits";

else

cout << "Largest native int is " << MAX_INT_SIZE;

A remaining issue that I have not seen addressed is binary constant

literals. C/C++ has decimal, octal, and hexadecimal constant literals.

i.e.

int d = 10 ; // specified in base 10

int o = 010 ; // 8 in base 10

int h = 0x10; // 16 in base 10

Personally, I have never seen the need for octal constants, but for

completeness, I propose a simple scheme for binary constant literals:

<binary-constant-literal> ::= 0z { binary-digit }*

i.e.

int b = 0z1101; // specified in base 2

Currently I the best work-around solution I have come up is:

const int Flag_one = (1 << 0); // 00001

const int Flag_similiar = (1 << 1); // 00010

const int Flag_yet = (1 << 2); // 00100

const int_Flag_another = (1 << 3); // 01000

const int Flag_example = (1 << 4); // 10000

But it would be much nicer to be able to do:

const int Flag_one = 0z00001;

const int Flag_similiar = 0z00010;

const int Flag_yet = 0z00100;

const int_Flag_another = 0z01000;

const int Flag_example = 0z10000;

I would appreciate feedback as to whether the above mentioned

considerations are indeed problematic, or are just wishful thinking on

my part.

Cheers