CISC Microcode (was Re: RISC Mainframe) 
Author Message
 CISC Microcode (was Re: RISC Mainframe)


Quote:


>> >I'm glad you noticed it. We're agreed then that these idiot savants generate
>> >code that is as fast as a human can write, if not the same sort of code.

>> This is not always the case.  It is quite possible that a simple idea which
>> a human can see is not within the category of those known to the compiler.
>> No compiler can change the algorithm to one that it does not know about.
>> I can, and do.
>Yes, I quite agree, different problem spaces demand different algorithms for
>solution. However, and this is the key point, it is (in 1992) extremely rare
>that you encounter a constraint on the problem space due to the hardware that
>has any significant effect on the choice of algorithm. Things like 64K segment
>limits, massively parallel hardware, or the absence of a FPU... yes. Things
>like the presence or absence of autoincrement addressing... no. It's been many
>years that microoptimisation for the instruction set has been cost-effective
>outside of tight loops.

Try designing algorithms for massivel parallel, or even vector, machines.
The massively parallel SIMD machines really scream for VVVCISC, as any
conditional operation is a major bottleneck.  The conditioning in hardware
can be very cheap, as now each processor does it independently.

Things like the use of fixed-point operations on floating, the speed of
conversion, the use of bits, and like operations; packing and unpacking
of floats; and quite a few others, are still important.  With a decent
way of writing machine instructions, I consider a couple of hundred
instructions, certainly not what I would call a tight loop, as reasonable.

I do not know how important autoincrement addressing is, but vector machines
use a special case of it to great advantage.

Quote:
>> >Because allowing the compiler to find machine-specific optimisations is more
>> >cost effective than doing it myself. Because the code I write runs on Sparcs,
>> >68000s, 80386s, 80286s, VAXen, and 68020s. Because next month it might be
>> >running on MIPS or 88000. Next year it might be running on Supersparc or Alpha.
>> >Because optimising when you don't need to is a waste of resources.
>> This means that the language should be expanded to include the alternatives.
>Why? So I can write the same code half a dozen times? Whether I do so by
>coding a loop in five different assembly dialects or by coding a bunch of
>alternatives in Herman Rubin's Perfect Language it's still a waste of time.

There is not, was not, and never will be a "perfect language."  The superfast
sub-imbecile in the compiler, and even assembler instruction reordered, and
try out lots of alternatives, including investigating of many branches.  The
HLL gurus claim that their language will make it unnecessary for programmers
ever to use machine codes.  My proposal is for far less; let the programmer
set up translations, some of which will be for quite considerable use, and
the machine operations may or may not be invoked.  

And example, which may or may not use machine code, would be to find
the integer closest to a floating-point number.  Now this, and other
such "standard" situations, could be put in a dictionary of alternatives.
Quite some time ago, for simulation, I saw that producing vectors of
random variables, instead of separate calls, would greatly speed up
things.  I have not seen this in wide use on the standard libraries.

For a very simple operation, which can be extremely {*filter*}, consider that of
using a random bit to put a sign on a random number.  In fact, consider the
operation of putting the sign of x on y, or reversing the sign of y if x is
negative.  This is a fast operation on some machines, and a slow operation
on others, and if it is slow, the algorithm in which it is used should be
changed to avoid this problem, if at all possible.

- Show quoted text -

Quote:
>> >> >I expect any good programmer to code for the abstract machine defined for
>> >> >the language. Remember, software longa, hardware brevis.
>> >> This might be reasonable if we had a language produced by people who have
>> >> some respect for the capabilities of humans.
>> >We do. We have dozens of them.
>> Name one in which recognizes the various things I have previously published
>> in a syntax intended for fast use by a human being.
>The various things you're looking for are in the "tight loop" category, and
>the effort of coding them in assembler is negligable compared to the cost of
>coding the rest of the program in a language designed for microoptimisation
>rather than expression of an algorithm.
>C++ with embedded assembly and inline expansion.
>Forth.
>C with inline assembler.
>Turbo Pascal.
>Dec fortran 77.
>Most modern Lisp dialects.

I have been mainly using C with inline assembler.  But its headaches are
massive.  And my complaint that the great bulk of assembler languages are
not designed for human use definitely holds.  At least one Unix manual had
the explict statement that the assembler was essentially for compiler
maintainers ONLY.

Forth, and in fact all stack machines, are overly restrictive.  "Superinlining,"
which essentially uses code expansion, not subroutine conversion, is what
should be done.  Even if memory <-> register is as fast as claimed, it still
involved lots of moving.  An inlined procedure, to be good, must take the
arguments where it finds them, and put the results where they are wanted,
not using any standard memory or register locations.

Quote:
>> Lisp seems to have a
>> fairly reasonable collection of primitives, but a human being should not
>> be REQUIRED to use clumsy prenex notation.
>Oddly, many human beings prefer it. HP has managed to remain the last
>successful US calculator manufacturer by using a similar notation.

I do not believe that many human beings would like to have to type
plus(x,y) for x+y, minus(x,y) [or is it minus(y,x)?] for x-y, etc.
Nor should they have to type iplus for integers, fplus for floats,
etc.

HP uses stack notation, not prenex.  For manual operation, this minimizes
work, and even in a computer, this is the actual function or subroutine
procedure, even if it is not written that way.  I am not aware of any
calling sequences where the call is issued first, and then the arguments
are obtained, or some arguments are issued, and then the instruction, and
then other arguments.  The prenex form is easier for the parser.  I have
seen the structure of a microcomputer assembler, and it uses the opcode
to branch to the parsing of the remaining part of the instruction.  This
is easy for the assembler only.

And even the HP uses registers, which are really memory, to handle arguments,
instead of having to manipulate a stack.  The current registers, not the
registers on the early machines, are really a specialized high-speed memory,
not where the computations are done.  Too few machine designers have realized
this and included registers in the addressible memory, so that registers
themselves could be indexed and addressed as variable locations.

Now I have no problems with the assembler, optimizer, etc., using things in
this manner.  But I would like to write things my way, and you would like to
write things your way, and I see no reason why we cannot be accommodated.
What we need is a versatile macro processor to translate between.  
--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054

{purdue,pur-ee}!pop.stat!hrubin(UUCP)



Mon, 16 Jan 1995 21:16:51 GMT  
 CISC Microcode (was Re: RISC Mainframe)

Quote:

>> HP uses stack notation, not prenex.  For manual operation, this minimizes
>> work, and even in a computer, this is the actual function or subroutine
>> procedure, even if it is not written that way.  I am not aware of any
>> calling sequences where the call is issued first, and then the arguments
>> are obtained, or some arguments are issued, and then the instruction, and
>> then other arguments.
>On the B6700 (but my memory fails me about the details) a procedure
>call like
>    foo(x, y)
>compiled into
>    MKST            % MarK the STack
>    NAMC (foo)      % NAMe Call (push the address of foo)
>    VALC (x)        % VALue Call (push the value of x)
>    VALC (y)        % (push the value of y)
>    ENTR            % ENTeR the procedure named just after MKST
>This approach introduces some interesting possibilities.  I have seen
>paper designs where there are two (or more) CPUs and the code sequence
>is
>    START   foo     % other/next CPU starts running code of foo
>    PASS    x       % provide value of x
>    PASS    y       % provide value of y
>The entry code for foo would be something like
>    ...
>    RECEIVE arg1    % the first time it is needed!
>    ...
>    RECEIVE arg2    % the first time it is needed!
>    ...
>so that other computations in foo() can overlap with argument calculation.
>The same idea can be used for returning a result.  For some languages
>where there are a lot of function calls this can result in about a 1.5x
>speed-up.  (Think register windows + lazy evaluation, sort of.)
>I don't know if anyone has ever built such a machine.
>(I suspect that one major difference between Herman Rubin and other people
>on the net is that he is trying to get the last ounce out of "supercomputers",
>while other people are rather more interested in _not_ depending on what
>brand of box they're using.  Me, I think "Crays" are an edible arthropod.)

On the contrary, I want to get out of each box what it can deliver.  Also,
I believe that a more flexible architecture could deliver a lot more.  What
you have written is partial inlining, not a straight subroutine call.  For
this to be even possible, the compiler must know how the called procedure
works.  This almost precludes separate compilation, and the maintenance
of subroutine libraries.  I say almost, because I can think of ways around
some of it, but at least some form of synchronization will be needed.

There are other ways of handling subroutines other than calls.  These
definitely should be used more widely.  Some machines have transfers
which push a return address on the stack, or in a register, and allow
a return to that address.  These "calls" do not involve a context
switch, and the "subroutine" does not necessarily have a separate
set of variables, and separate compilation is not at all possible.
I have not seen this type of operation in a HLL.  Of course, if
transfer to a register is possible (shades of the ASSIGN operator
in Fortran, not as originally used), more versatility is possible.

[For those who do not know of this much-maligned possibility, in
some dialects of Fortran one could have

        ASSIGN 385 TO K

and later

        GOTO K]

We have the question which must repeatedly come up in comp.arch:

        Has this instruction disappeared because it is not in the languages?

--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
Phone: (317)494-6054

{purdue,pur-ee}!pop.stat!hrubin(UUCP)



Sat, 21 Jan 1995 20:59:28 GMT  
 CISC Microcode (was Re: RISC Mainframe)

Quote:

> So now C has TWO distinct
> label constructs:  foo: for "local" labels, setjmp for potentially non-local
> ones.  And you can "assign" the latter, but not the former.  GCC, with its
> label addresses, could at least have unified the two - but, no.

I will be the last to claim that C is perfect, but I think this is an important
distinction. It's bad enough having a label and trying to figure out how it got
there, but if you have to look for gotos in all the subroutines as well...?

Personally, I'd prefer if C had no goto at all, but an organized system of
contexts, so you could perform setjmp/longjmp operations within a routine.

That, and switch, would cover virtually all use of goto while reasonably
restricting its scope. It would also force compiler writers to optimise
switch and longjmp instead of inventing clever hacks of limited utility.
--
Peter da Silva                                               `-_-'
$ EDIT/TECO LOVE                                              'U`
%TECO-W-OLDJOKE Not war?                        Have you hugged your wolf today?
Ferranti Intl. Ctls. Corp.      Sugar Land, TX  77487-5012       +1 713 274 5180



Mon, 30 Jan 1995 22:50:06 GMT  
 CISC Microcode (was Re: RISC Mainframe)

Quote:
> Personally, I'd prefer if C had no goto at all, but an organized system of
> contexts, so you could perform setjmp/longjmp operations within a routine.

I'm not quite sure what you mean by "organized system of contexts"? If
this mechanism would allows multilevel loop breaks. If not, I'd like
more details on it.

        <mike



Thu, 02 Feb 1995 05:39:50 GMT  
 CISC Microcode (was Re: RISC Mainframe)

Quote:


> > Personally, I'd prefer if C had no goto at all, but an organized system of
> > contexts, so you could perform setjmp/longjmp operations within a routine.
> I'm not quite sure what you mean by "organized system of contexts"? If
> this mechanism would allows multilevel loop breaks. If not, I'd like
> more details on it.

Basically, imagine integrating setjmp/longjmp with the language to a greater
extent. Instead of writing:

        retry:
                insert amazing mass of nested loops here {
                        goto retry;
                }
                and put in some more code here
                if something horrible happens {
                        goto fail;
                }
                and still more code
        fail:

You use a formal setup like so:

        context foo: {
                insert amazing mass of nested loops here {
                        continue foo;
                }
                and put some more code here
                if something horrible happens {
                        break foo;
                }
                and still more code
        }

Now, to generalize:

        context foo;

        ...

        context foo: {
                yadda yadda yadda
                bar(foo);
        }

void bar(foo)
context foo;
{
        blah blah blah;
        if something horrible happens {
                break foo;
        }

Quote:
}

But putting it directly in the language, you allow for more optimizations.
By replacing goto, you clean up a messy part of the language without losing
any general capability (switches with good optimization and this context
can replace every use of goto).

As a further elaboration (my long-term fantasy, real coroutine support in C):

        context other, self;
        stack_t otherstack[MIN_STACK + 20*STACK_FRAME + 20*30*sizeof(int)];
        extern void token(context, context);

        bind other: otherstack, token, (self);
        bind self;

        while((t = pause(other, 1)) != EOF) {
                handle_token(t);
                if we need to bail out
                        (void)pause(other, 0);
        }

token(other)
context other;
{
        char buf[whatever];

        while(gets(buf)) {
                parse tokens here;
                if it looks like a foo
                        if(!pause(other, foo))
                                break;
                ...
        }

Quote:
}

Yes, you can build subroutine libraries that do this, but they're inherently
non-portable. This sort of thing should be part of the language, and is much
more important to me than expanding the type structure with object-oriented
stuff.
--
Peter da Silva                                               `-_-'
$ EDIT/TECO LOVE                                              'U`
%TECO-W-OLDJOKE Not war?                        Have you hugged your wolf today?
Ferranti Intl. Ctls. Corp.      Sugar Land, TX  77487-5012       +1 713 274 5180


Sat, 04 Feb 1995 22:54:47 GMT  
 CISC Microcode (was Re: RISC Mainframe)

[Impressive demonstration by Peter removed]
I'm sorry, but aren't you just implementing Exceptions here?

Greetings, Bert
--
#include <std/disclaimer>

  Bert Laverman,  Dept. of Computing Science, Groningen University



Mon, 06 Feb 1995 15:44:23 GMT  
 CISC Microcode (was Re: RISC Mainframe)

[Elided]

| Basically, imagine integrating setjmp/longjmp with the language to a greater
| extent. ...

[More elided]

| But putting it directly in the language, you allow for more optimizations.
| By replacing goto, you clean up a messy part of the language without losing
| any general capability (switches with good optimization and this context
| can replace every use of goto).

[Stll more elided]

| Yes, you can build subroutine libraries that do this, but they're inherently
| non-portable. This sort of thing should be part of the language, and is much
| more important to me than expanding the type structure with object-oriented
| stuff.

What you describe was generalised by Peter Landin in '65, who called
it the J operator. It has been adopted, in particular, by languages
like Scheme and ML which use it to define powerful control structures.
(The FP/Lisp community calls it ``call with current continuation'' or
call/cc).  For examples of the usages that you outlined, see the
Scheme stream-based IO or the Cornell work on Concurrent ML.

Cheers,
Dipankar Gupta
Hewlett-Packard Laboratories
Bristol UK

PS: I don't think that ML has continuations as a part of the Standard,
but the New Jersey implementation provides call/cc as a first-class
language element.

References:
[1] Landin, P. A correspondence between Algol 60 and Church's lambda
notation: Part 1 [C. ACM 8(2) pp89-101, Feb. 1965]

[2] Clinger, W and Rees, J (eds). Revised^4 Report on the Algorithmic
Language Scheme.
--
--



Tue, 07 Feb 1995 00:27:16 GMT  
 CISC Microcode (was Re: RISC Mainframe)

Quote:

> [Impressive demonstration by Peter removed]
> I'm sorry, but aren't you just implementing Exceptions here?

Well, sort of. You can *use* this to implement exceptions (for example,
set up an inheritance mechanism) but the same is true of longjmp/setjmp.
Plus, you don't traditionally combine exceptions and coroutines into a
single concept.
--
Peter da Silva                                               `-_-'
$ EDIT/TECO LOVE                                              'U`
%TECO-W-OLDJOKE Not war?                        Have you hugged your wolf today?
Ferranti Intl. Ctls. Corp.      Sugar Land, TX  77487-5012       +1 713 274 5180


Wed, 08 Feb 1995 00:20:43 GMT  
 CISC Microcode (was Re: RISC Mainframe)

        [ ... non local control transfer ... ]

laverman> [Impressive demonstration by Peter removed] I'm sorry, but
laverman> aren't you just implementing Exceptions here?

Ah, one of my pet peeves! To bad for you! :-)

There is *absolutely no way* to implement exceptions. There is no such
thing as an exception that can be implemented. Virtually all the
research (includig voluminous Ph.D. thesises, proceedings, books, ...)
on exception is misguided and pointless.

There have been even incredibly ridicolous attempts to have exceptions
as language entities, and the ensuing and toally pointless discussions
(should they be a first class data type? should they be statically or
dynamically scoped? and son on...) have provided cannon fodder for many
a silly debate.

On the other hand there is some relationship between 'exception
handling' and non nested control transfers, and maybe it will be useful
if I make it clear.

First of all, let's talk of what an 'exception' is: it is a *fact*,
not a language construct, or value, it is the fact that the state of the
computation has become undefined.

This happens iff a procedure implements a non total function, and the
procedure is applied outside that function's codomain.

This happens iff the precondition of a branching construct is weaker
than the union of the preconditions of the branches. For example,
computing

        3 / 0

is undefined because in the implementation of the / procedure (whether
it be microcode, hardwired, macrocode) there is something like:

  if
    divisor == 1 ->    quotient = dividend;
  | powerof2(divisor) ->  quotient = shiftright(divident,log2(divisor));
  | not powerof2(divisor)
    and divisor <> 1
    and divisor <> 0 ->          quotient = newt_raph_div(divisor,dividend);
  fi

So, 'exception handling' is just making all procedures implement total
functions. Unfortunately this is in general cannot be done at procedure
definition time, because extending certain functions so that they become
total can be done in several different ways, and there cannot be
previous agreement as to how.

For example in some cases division by zero should return a token for
'infinite', in some case it should return zero, and so on.

Therefore, 'exception handling' should normally consist in completing
all branching statements so that their precondition is never weaker than
the unions of those for their branches, and each new branch should be a
call to a *dynamically* scoped procedure name.

for example:

  if
  | divisor == 0 ->    quotient = div_by_zero(divisor,dividend);
    divisor == 1 ->    quotient = dividend;
  | powerof2(divisor) ->  quotient = shiftright(divident,log2(divisor));
  | not powerof2(divisor)
    and divisor <> 1
    and divisor <> 0 ->          quotient = newt_raph_div(divisor,dividend);
  fi

The body of div_by_zero may then contain any arbitrary logic that the
programmer of the upper levels of the program wishes.

In particular two choices seem to be common for the body of such a
procedure, and they are returning a 'default' value, or doing a non
local control transfer to higher level handling code. In the latter case
the dynamically scoped procedure does not just substitute for the
"missing" branch, but also for a number of intervening abstraction levels
that depend on the outcome of that branch.

It is only in the second case that non local control transfers enter the
'exception' handling picture at all; in such a case they are required,
but they are useful for many other reasons.

Unfortunately there is some silly tradition by which even relatively
serious authors pretend that exceptions are 'objects' or language
'entities', or exception handling is made to coincide with non local
control transfers, or other entirely inappropriate perversions.
--

Dept of CS, University of Wales    | UUCP: ...!mcsun!ukc!aber-cs!pcg



Fri, 10 Feb 1995 02:28:04 GMT  
 CISC Microcode (was Re: RISC Mainframe)

mwalker> I'm reading the r4rs report on Scheme, and this "Call with
mwalker> current continuation" confuses the heck out of me. I can't
mwalker> figure out what it's good for. Is it just kind of a
mwalker> setjmp/longjmp? I guess I don't see the similarity people have
mwalker> been posting about. Can anybody straighten me out on this?

Here I have rack and hot irons, ready for use. :-)

First, two references: John Allen, "Anatomy of Lisp"; Abelrson &
Sussman, "Program structures".

Second, a small explanation:

All languages have implicit an environment graph and a control graph.
Nodes in the environment graph are token-value associations, and nodes
in the control graph are cpu-state frames.

  Low-level explanation:
    Given a procedure, it is instantiated by creating a new environment
    node, filling it out with the arguments, creating a new control node,
    setting the PC in itr to the beginning of the procedure, and then
    making it active instead of the one describing the currently running
    procedure instance.

A given procedure may have many instances, e.g.  for recursion, and
correspondingly many environment and control nodes may refer to it.

Procedure instances may be values that are passed around, or may not be.

Now, most languages have restrictions to ensure that the control and
environment graphs are really trees, and that such trees can only be
walked depth first. In other words that the control and environment
trees can be implemented as a stack, as no procedure instance outlives
the procedure instance that created it.

While the control and environment graphs are normally treated as
linearized trees, in a FIFO fashion, there are at least two cases where,
even in traditional languages, strict nesting must be violated. The two
cases are procedure parameters (downward furnargs) and non local gotos
(catch/throw or longjmp).

With procedure parameters, depending on the language, an environment
frame is passed downwards, skipping several intermediate levels of the
stack. With non local gotos, control is transferred upwards, skipping
several levels of the stack.

Both facilities however are restricted to stack operation; procedure
parameters cannot be passed upwards (except for particular cases) and
control cannot be transferred downwards. The reasons are that,  in stack
based implementations, if a reference to an environment node is passed
upwards, the environment node will have been deallocated, and if a
control transfer is done downwards, the control node will also have been
deallocated.

However if environment and control nodes are made to exist in a non
stack like fashion then environment nodes can be passed 'upwards' and
control transfers can happen 'downwards'. An environment node that
persists upwards is called a closure, and a control node that that
persists downwards is called a continuation point.

Once these two entities can be manipulated, an entity called a
continuation (or generator or coroutine, at times) can be assembled out
of a continuation point and a closure. A continuation is just a
procedure instance made persistent and resumable.

call/cc 'just' gives you a handle on the currently executing procedure
instance. Or rather a copy of that, usually -- most scheme systems
create procedure instances on the stack by default, and only in the
relatively rare case where one uses call/cc the procedure instance gets
copied to the heap; this is safe because procedure instance handles can
only be gotten via call/cc. In older, dynamically scoped lisps, the
function primitive could be used to obtain a closure, but that was all
(in scheme, which is lexically scoped, it's define that is mostly used
to create closures, statically that is).

Note that in scheme there is no direct way to get separate access to the
environment and control node components of a procedure instance.

Now, what is the similarity between setjmp and call/cc?

  That setjmp creates a continuation point that can be resumed by some other
  arbitrary procedure instance, as long as the resuming procedure instance
  is newer than the resumed one. In other words setjmp/longjmp are an
  escape, just like call/cc, from the strict nesting of procedure
  call/return.

And what is the difference?

  There are two of them: setjmp defines only a continuation point, not a
  closure (it even loses parts of the current closure, if 'volatile' is
  not used!), while call/cc does both, and the continuation point defined by
  setjmp cannot be passed upwards, while the continuation defined by
  call/cc can.

There is also another relationship of sorts between the two:

  In some implementations of call/cc there is some abuse of setjmp, where
  it is used to define the control node of a continuation, even in a
  downward sense, by second guessing setjmp's implementation and changing
  stacks behind its back. This works, but is hardly portable.

Finally, a small idea on what call/cc is good for. Well, to realize in
scheme a lot of nice tricks, or rather fundamental concepts, like
coroutines, streams, generators, classes, actors, exception handling,
... by letting you create arbitrary control and environment graphs.
--

Dept of CS, University of Wales    | UUCP: ...!mcsun!ukc!aber-cs!pcg



Fri, 10 Feb 1995 01:43:19 GMT  
 
 [ 15 post ] 

 Relevant Pages 

1. RISC vs. CISC memory usage

2. RISC vs. CISC memory usage

3. RISC vs. CISC -- SPECmarks

4. RISC vs. CISC

5. RISC vs. CISC

6. Pentium/II/III/Pro CISC or RISC

7. Newbie Quest: CISC vs RISC and P7?

8. Smalltalk: RISC vs CISC

9. MVC and MVCL (was Re: RISC Mainframe)

10. RISC vs CISC? Call a spade a spade?

11. Should I learn RISC or CISC assembly language NOW?

12. Should I learn RISC or CISC assembly language NOW?

 

 
Powered by phpBB® Forum Software