One of the paradoxical issues in the recurring debate about types is

Robert Harper's thesis that dynamic typing is a particular case of a

static typing. This article attempts to show the other extreme --

ultimately precise static typing. Curiously, at this extreme static

and dynamic typing converge again. This article presents an example of

a type system which lets a compiler detect a latent division by zero

in an expression, and flag it as a type error.

ML or Haskell would seem ideal languages for such an

exercise. However, their creators worked hard to ensure that their

type systems are decidable. I needed something more

expressive. Cayenne appears very appropriate -- alas I'm not familiar

with it well enough. Thus the extremely static type system is

implemented in C++. This article will show that any standard C++

_compiler_ can deal with algebraic data types, perform term

comprehension, and do type arithmetic. The C++ compiler is employed

as a term-rewriting engine -- an interpreter of untyped

lambda-calculus over type identifiers. It is the compiler that finds

an attempt to divide by zero and aborts the compilation -- before an

executable is built, let alone run. Well, as we will see, in our

static type system the distinction between compile-time and run-time

becomes a bit blurry. We rely extensively on a partial template

instantiation. It is a standard C++ feature that offers term

comprehension and resolution of polymorphic types. The code in this

article compiled with egcs-2.91.66 compiler, on FreeBSD and Linux.

A lambda-interp.cc code implements a type system which assigns a

distinct type to every number. Inference rules let the compiler

determine specific types for the results of arithmetic operations and

expressions. The ultimate specialization of types helps the compiler

catch even the most subtle errors. In particular, an expression whose

evaluation eventually includes a division by zero will be flagged as

mistyped. The C++ compiler acts as an interpreter of an untyped

term-rewriting calculus. C++ template expression expansion is

basically untyped: all template arguments have the same type --

'typename' -- at least in our example. Identifiers of terms in this

calculus are C++ _types_.

Excerpted below are type inference rules for addition. They are

specified as a set of partially-specialized template classes. These

classes have _no_ members -- they merely denote type term rewriting

rules.

template<typename T1, typename T2>

struct Add { // Traits class

Quote:

};

// (a+1) + (b+1) ==> a + (b + 1 + 1)

template<typename T1, typename T2>

struct Add<Succ<T1>, Succ<T2> > {

typedef typename Add<T1, Succ<Succ<T2> > >::sum sum;

Quote:

};

template<typename T>

struct Add<Zero,T> { // 0 + a ==> a

typedef T sum;

Quote:

};

The lambda-interp.cc code defines a few more rules dealing with type

inference for addition of negative numbers.

Here's the appropriately typed addition operation:

template <typename T1, typename T2>

typename Add<T1,T2>::sum add(T1 a, T2 b)

{

typename Add<T1,T2>::sum c;

return c;

Quote:

}

The following classes define type inference rules for division. They

infer the type of the result of div(T1 a,T2 b), by repeatedly

"subtracting" the Divisor from the Dividend, until the result becomes

zero or negative type.

template<typename T1, typename T2>

struct Div { // Traits class

Quote:

};

template<typename Dividend, typename Divisor, typename Ratio>

struct DivEval { // Traits class

Quote:

};

template<typename Divisor, typename Ratio>

struct DivEval<Zero, Divisor, Ratio> {

typedef Ratio value;

Quote:

};

template<typename T, typename Divisor, typename Ratio>

struct DivEval<Neg<T>, Divisor, Ratio> {

typedef typename Add<Ratio, Neg<Succ<Zero> > >::sum value;

Quote:

};

template<typename Dividend, typename Divisor, typename Ratio>

struct DivEval<Succ<Dividend>, Divisor, Ratio> {

typedef typename Add< Succ<Dividend>, Neg<Divisor> >::sum

new_dividend;

typedef typename DivEval< new_dividend, Divisor, Succ<Ratio> >::value

value;

Quote:

};

template<typename T1, typename T2>

struct Div<T1, Succ<T2> > {

typedef typename DivEval<T1, Succ<T2>, Zero>::value ratio;

Quote:

};

There are no data members in these Add<>, Div<>, DivEval<> template

classes. There is no conventional number arithmetics either. There is

only term rewriting, which amounts to arithmetics on types. C++

compiler acts as a term decomposition/composition engine.

Here is the main function

#define SHOWVAL(X) printf("The value of " #X " is: %d\n", X .eval())

main()

{

Succ<Succ<Zero> > two;

Succ<Zero> one;

SHOWVAL( two );

SHOWVAL( add(neg(one),neg(two)) );

SHOWVAL( add(two,neg(two)) );

SHOWVAL( div(one,two) );

SHOWVAL( div(two,one) );

SHOWVAL( div(add(one,add(two,two)),add(div(two,two),one)) );

Quote:

}

and the result of its execution:

The value of two is: 2

The value of add(neg(one),neg(two)) is: -3

The value of add(two,neg(two)) is: 0

The value of div(one,two) is: 0

The value of div(two,one) is: 2

The value of div(add(one,add(two,two)),add(div(two,two),one)) is: 2

It is instructive to take a look at the gas assembly code generated,

for example, for the last expression:

pushl $2 ; the value to print

pushl $.LC10 ; the format string

call printf

The result of the expression -- number 2 -- has been determined during

compilation. The executable code is only to print out that result.

When you add a statement:

SHOWVAL( div(add(two,two),add(div(two,two),neg(one))) );

to the main() function above, the C++ compiler says:

lambda-interp.cc:234: no type named `ratio' in `struct

Div<Succ<Succ<Succ<Succ<Zero> > > >,Zero>'

lambda-interp.cc:234: confused by earlier errors, bailing out

Look at the types of the operands! The _compiler_ has figured out that

a divisor in one of the div() operations has the type of Zero, which

makes the expression mistyped. Division by zero is a type error!

The code can be downloaded from

http://www.*-*-*.com/ ~oleg/ftp/c++-digest/lambda-interp.cc

It is a single self-contained, commented source code file.

On one end are dynamically typed languages. Their (byte) compiler is

simple and fast; it cheerfully accepts almost every garbage we can

throw at it. The run-time of these languages is inherently slower as

every operation must first run a battery of tests on its operands:

that the referenced variables are indeed defined, the operands of a

div operation are indeed numbers, and furthermore, the divisor is not

zero. If a branch of code was not executed in a particular run, the

errors in this branch, however grave or obvious, remain unreported.

The ultimately statically-typed "language" outlined above achieves the

other extreme. The run-time is now fast and trivial as it only prints

the results that have already been determined by the

compiler. Compilation on the other hand is highly involved. For

example, to make sure that no divisor in all division operations is

zero, the compiler basically has to evaluate the divisor expressions

-- in effect compute them at compile time. In general, such an

evaluation may be involved, and may even fail to terminate.

Thus the extreme static typing looks identically to dynamic typing,

only with compile- and run-time phases reversed. The optimal approach

seems to be to do some computation at compile time (e.g., making sure

all referenced variables are declared, or that operands of any

numerical operation are indeed numbers) and leave more tenuous checks

and the rest of the computation to run-time. As we know, the middle

ground is the most controversial position.

Sent via Deja.com http://www.*-*-*.com/

Before you buy.