S16 mimics F21 
Author Message
 S16 mimics F21

On Sun, 30 Jul 1995 07:34 GMT,

 >\ That fine (Novix) idea won't quite fit here, I think.
 >\ There is already a call opcode, and I want to use F21's
 >\ original 27 instructions for several reasons.
 >
 >there's a call onpage, but call offpage is time-consuming, and
 >if s16 will only exist as a simulator then maybe it's worth the
 >special purpose bit...?

Hm, enticing, but...
The "call" opcode may occur in any slot. How about this:

    A "call" in slot 0 is accompanied by a 10-bit address.
    A "call" in slot 1 is accompanied by a 5-bit address.
    A "call" in slot 2 is followed by a 15- or 16-bit address.

I still hesitate to implement quite a new element. This is a steady
temptation with RISC engines. Look at the code of SWAP:



The first version is so long that it may change our habits using
"noise words". The second version currupts the contents of the A
register. So why not implement a 28th opcode for a rapid SWAP
function? Or, because it's so easy, the 29th opcode could be a
quick OR. BTW, what about a one-step multiplier a la RTX 2000?

 >otoh, if this is just a simulator, then packing things on to
 >silicon isn't going to be a design objective...

Well, just a simulator. But please take into account the turbo
version with an add-in card driven by a non-virtual F21... :-)

Thank you for your suggestions. Still listening...

  Cheers                       Johannes Teich | Internet:    __  __c

 _________________________________ CompuServe: 100522,135 ____(_)/_(_)_
 (Murnau, Bavaria/Germany)



Fri, 16 Jan 1998 03:00:00 GMT  
 S16 mimics F21


Quote:
>On Sun, 30 Jul 1995 07:34 GMT,

> >\ That fine (Novix) idea won't quite fit here, I think.
> >\ There is already a call opcode, and I want to use F21's
> >\ original 27 instructions for several reasons.

> >there's a call onpage, but call offpage is time-consuming, and
> >if s16 will only exist as a simulator then maybe it's worth the
> >special purpose bit...?

>Hm, enticing, but...
>The "call" opcode may occur in any slot. How about this:

>    A "call" in slot 0 is accompanied by a 10-bit address.
>    A "call" in slot 1 is accompanied by a 5-bit address.
>    A "call" in slot 2 is followed by a 15- or 16-bit address.

On F21 we label the slots starting with 0 as the first.  So on F21
a branching opcode in slot 0 takes a 15 bit argument and uses
it as a 14 bit page branch or 13 bit home page branch.  In slot
1 you get a 10 bit on page branch argument.  In slot 2 or slot 3
you get a 10 bit argument, but the opcode also fills some of
those bits.  So the upper 5 bits of the branch address would also
be the branch opcode if it was in slot 2, and there is only one
address for each branch opcode if they appear in slot 3.

In 8 bit code operation only the instruction in slot 3 is executed.
So with branch instructions the opcode forms the lower 5 bits of
the address and the upper three bits of the 10 bit address that
are available in 8 bit memory may be set.  This means that in 8
bit mode the branch opcodes can only use 8 address on each page.

Quote:
>I still hesitate to implement quite a new element. This is a steady
>temptation with RISC engines. Look at the code of SWAP:




I would suggest an improved definition for SWAP:

     : SWAP  ( n1 n2 -- n2 n1 )  over push push drop pop pop ;

This is slower than SWAP' but does not destroy A as your SWAP.

I had a choice between OVER and SWAP for one of the instructions
decoded by F21.  I asked for SWAP, I got OVER.  It was one of the
few things I asked for that I didn't get.  I think Chuck prefers
OVER, or OVER was easier to implement than SWAP I am not sure.

But OVER is easy to define if you have SWAP

     : OVER  ( n1 n2 -- n1 n2 n1 )  push dup pop swap ;

This is as fast as SWAP' without the use of the A register, while
SWAP takes six instructions.  So this is why I would have prefered
SWAP to OVER.  

Decoding opcodes and which operations can be assigned to which
opcodes is limited by a number of constraints in Chuck's MISC
design.  In order to make timing work the prefetch mechanism
determines instruction type by one bit.  Prefetch can only begin
after all memory references have executed.  All of the non-memory
reference opcodes are assigned.

Quote:
>The first version is so long that it may change our habits using
>"noise words". The second version currupts the contents of the A
>register. So why not implement a 28th opcode for a rapid SWAP
>function? Or, because it's so easy, the 29th opcode could be a
>quick OR. BTW, what about a one-step multiplier a la RTX 2000?

Well on F21 there were constraints saying that SWAP, OR, or *
could not be decoded as a 28th, 29th, or 30th opcode because
they are not memory operations.  Other than that I cannot see that SWAP
is more complex than OVER (but it might be).  And I don't think
Chuck could do a single cycle * considering that he has not yet
done a single cycle + where carry most move through more than
a few bits.  I won't say that you couldn't do a single cycle
* in 3ns in .8 micron, but I know Chuck couldn't do it right
now.  When he goes to .5 micron everything will be running at
2ns or less so a single cycle * is just a difficult.  Even if
Chuck could figure out how to make a single cycle multiply
it would certainly take up a lot of silicon.  More than the
present chip I would say.

Quote:
>Well, just a simulator. But please take into account the turbo
>version with an add-in card driven by a non-virtual F21... :-)

:-)

Let me take this chance to say that F21 was submitted to Mosis
last week for fabrication.  So if they have no problems there they
should return prototypes around October 1.  :-)

Jeff Fox
Ultra Technology
2510 10th St.
Berkeley CA 94710
(510) 848-2149

http://www.dnai.com/~jfox



Sat, 17 Jan 1998 03:00:00 GMT  
 S16 mimics F21

Quote:

>is more complex than OVER (but it might be).  And I don't think
>Chuck could do a single cycle * considering that he has not yet
>done a single cycle + where carry most move through more than
>a few bits.  I won't say that you couldn't do a single cycle
>* in 3ns in .8 micron, but I know Chuck couldn't do it right
>now.  When he goes to .5 micron everything will be running at
>2ns or less so a single cycle * is just a difficult.  Even if
>Chuck could figure out how to make a single cycle multiply
>it would certainly take up a lot of silicon.  More than the
>present chip I would say.

There's a thing called a Cray multiplier that consists of adders arranged
in a diamond pattern, like how doing "*" by hand looks:

       0101
      x1001
   --------
       0101
      0000
     0000
    0101
   ========
   00101101

If the multiplier has a "0" bit, then all zeros go into the row's adder,
else the multiplicand is gated into the row's adder.  For F21, you would
need 21*21 adders.  I'm not sure how many transistors that would equate
to.  But, it looks like the kind of regularly tiled structure that could
be laid out easily in OKAD.  The ripple carry delay would not be a lot worse
than for "+".  I believe there are lots of optimizations to this design that
can reduce the required number of gates by 1/2 or so.

-Dave



Sat, 17 Jan 1998 03:00:00 GMT  
 S16 mimics F21


Quote:

>There's a thing called a Cray multiplier that consists of adders arranged
>in a diamond pattern, like how doing "*" by hand looks:

>       0101
>      x1001
>   --------
>       0101
>      0000
>     0000
>    0101
>   ========
>   00101101

>If the multiplier has a "0" bit, then all zeros go into the row's adder,
>else the multiplicand is gated into the row's adder.  For F21, you would
>need 21*21 adders.  I'm not sure how many transistors that would equate
>to.  But, it looks like the kind of regularly tiled structure that could
>be laid out easily in OKAD.  The ripple carry delay would not be a lot worse
>than for "+".  I believe there are lots of optimizations to this design that
>can reduce the required number of gates by 1/2 or so.

Yes, this is what he multiply step instruction does one bit at a time. First
you shift one argument left then shift the result right and conditinaly add if
a bit is 1.  And even if it were no slower than a + it would still not be a
once cycle instruction in most cases.

You are right this is not a big hardware problem in the sense that it
only needs 21*21 bit adders.  However this is about the same size as the
rest of the F21 chip.  So it would add about $.10 to the cost of the die
but would not add any pins.  It would cost some for development, but as
you say it would be a fairly regularly tiled structure and so would not
be nearly as complex to design as the rest of the chip.  It would just
need about as many transistors.  It might be well worth the $.10 even
if it only gets you some improvement on *.

Jeff Fox
Ultra Technology



Sat, 17 Jan 1998 03:00:00 GMT  
 S16 mimics F21

: >is more complex than OVER (but it might be).  And I don't think
: >Chuck could do a single cycle * considering that he has not yet
: >done a single cycle + where carry most move through more than
: >a few bits.  I won't say that you couldn't do a single cycle
: >* in 3ns in .8 micron, but I know Chuck couldn't do it right
: >now.  When he goes to .5 micron everything will be running at
: >2ns or less so a single cycle * is just a difficult.  Even if
: >Chuck could figure out how to make a single cycle multiply
: >it would certainly take up a lot of silicon.  More than the
: >present chip I would say.
: There's a thing called a Cray multiplier that consists of adders arranged
: in a diamond pattern, like how doing "*" by hand looks:

:        0101
:       x1001
:    --------
:        0101
:       0000
:      0000
:     0101
:    ========
:    00101101

: If the multiplier has a "0" bit, then all zeros go into the row's adder,
: else the multiplicand is gated into the row's adder.  For F21, you would
: need 21*21 adders.  I'm not sure how many transistors that would equate
: to.  But, it looks like the kind of regularly tiled structure that could
: be laid out easily in OKAD.  The ripple carry delay would not be a lot worse
: than for "+".  I believe there are lots of optimizations to this design that
: can reduce the required number of gates by 1/2 or so.

: -Dave
I went through the same discussions with myself a few months ago, when
trying to decide whether to get a P21 for some signal processing/number
crunching I wanted to do.  Basically, I came to the conclusion that I
should buy a signal processing chip/evaluation board rather than the
P21.  I have since decided to go with the Motorola 56002 evaluation
module, and, if I can't find a Forth for it, port "eForth" or roll my own.
I don't know the original posters' circumstances/requirements, but if
hardware-speed multiply is a requirement, perhaps a similar approach
will work for him as well.  The Motorola chip does not have a divide in
hardware; that must be done with a loop in software.

Incidentally, I've looked at the "eForth" implementation for the 8086
and it does not use the 8086 multiply instructions; it uses a
shift-and-add.  Divide is done with shift-and-subtract.  So my port or
design will not match the "eForth" model in this respect.  Given the
limited market, I intend to place my results in the public domain via
the World Wide Web; I don't have the spare time to provide support to
other users.
--

If operating systems had truth-in-labelling laws:
"MVS -- Best if used before December 31, 1999"



Sat, 17 Jan 1998 03:00:00 GMT  
 S16 mimics F21
......... stuff .......
|>
|> Incidentally, I've looked at the "eForth" implementation for the 8086
|> and it does not use the 8086 multiply instructions; it uses a
|> shift-and-add.  Divide is done with shift-and-subtract.  So my port or
|> design will not match the "eForth" model in this respect.  Given the
|> limited market, I intend to place my results in the public domain via
|> the World Wide Web; I don't have the spare time to provide support to
|> other users.
|> --

|> If operating systems had truth-in-labelling laws:
|> "MVS -- Best if used before December 31, 1999"

Hi
 Remember that eForth was written to be most easily ported
to various uP's. One should alwas remeber that once one
has the basic eForth up and running, finding such things
as better multiply words can be added.
 For DSP I highly recommend that you have some way to
do FIR's,IIR's and FFT's in assembly as the normal mode.
One could make special compiler words that were more
efficient. It is best to find ways to optimize such stuff.
Dwight



Sun, 18 Jan 1998 03:00:00 GMT  
 S16 mimics F21
......... stuff ........
|> need about as many transistors.  It might be well worth the $.10 even
|> if it only gets you some improvement on *.
|>  
|> Jeff Fox
|> Ultra Technology

Give me a fast * and a fast + and I can use
almost anything as a DSP.
Dwight



Sun, 18 Jan 1998 03:00:00 GMT  
 S16 mimics F21

 >>I still hesitate to implement quite a new element. This is a steady
 >>temptation with RISC engines. Look at the code of SWAP:
 >>


 >
 >I would suggest an improved definition for SWAP:
 >
 >     : SWAP  ( n1 n2 -- n2 n1 )  over push push drop pop pop ;

Pretty! I didn't see that.

 >     : OVER  ( n1 n2 -- n1 n2 n1 )  push dup pop swap ;

I see. I'm tempted to replace over by swap for that 16-bit virtual
beast. (Xian, hear me? %-,) I think it's nicer for newcomers. And
easy to simulate by an F21 as well... :-)

 >>The first version is so long that it may change our habits using
 >>"noise words". The second version currupts the contents of the A
 >>register. So why not implement a 28th opcode for a rapid SWAP
 >>function? Or, because it's so easy, the 29th opcode could be a
 >>quick OR. BTW, what about a one-step multiplier a la RTX 2000?
 >
 >Well on F21 there were constraints saying that SWAP, OR, or *
 >could not be decoded as a 28th, 29th, or 30th opcode because they
 >are not memory operations.  Other than that I cannot see that
 >SWAP is more complex than OVER (but it might be).  And I don't
 >think Chuck could do a single cycle * considering that he has not
 >yet done a single cycle + where carry most move through more than
 >a few bits.

Well, I wasn't quite serious while writing my above lines. I wanted
to point out that a RISC *has* to be somewhat "reduced".

 > Even if Chuck could figure out how to make a single cycle multiply
 > it would certainly take up a lot of silicon.  More than the
 > present chip I would say.

No, please make the chip rapid & cheap. And above all available.

    ( When the RTX2000 was rather new, I tried to demonstrate it's
    throughput by letting it make music. The CPU was accompanied only
    by memory and a DAC. There was a sine table which was read using
    different step widths, four tones at the same time, and each tone
    fading with "e" characteristic. This fading was done by permanently
    multiplying the four amplitudes. I first thought of adding
    logarithms - till I realized that the multiplier was much quicker.
    The demo at least impressed myself... )

 > Let me take this chance to say that F21 was submitted to Mosis
 > last week for fabrication.  So if they have no problems there they
 > should return prototypes around October 1.  :-)

:-)

  Cheers                       Johannes Teich | Internet:    __  __c

 _________________________________ CompuServe: 100522,135 ____(_)/_(_)_
 (Murnau, Bavaria/Germany)



Sun, 18 Jan 1998 03:00:00 GMT  
 S16 mimics F21


: >|> So my port or
: >|> design will not match the "eForth" model in this respect.  
: >Hi
: > Remember that eForth was written to be most easily ported
: >to various uP's. One should always remeber that once one
: >has the basic eForth up and running, finding such things
: >as better multiply words can be added.
: That is exactly correct.  eForth is designed for education use
: and easy porting.  It comes with a minimal number of code words,
: but you see a big performance increase by writing more words
: in CODE.  Start with the math words and also VARIABLES and
: USER VARIABLES if you want to see the biggest performance
: increase.  With only one math operation as a CODE word the
: eForth porting model is very slow.

There's one small problem with adding CODE words to eForth: it doesn't
come with an assembler, either :-).  Sure, I can read the MASM source
and figure out how to hack a CODE word into the dictionary with "C," or
I can find or write an assembler, etc.  But there are other small Forths
("pygmy", "hForth") that already have an assembler.

: Bill has talked about releasing a new version for a long time.
: He made the new version ANS compliant over a year ago.  Last I
: heard he wanted to change some internals.  I don't know the
: current status.

Actually, for my purposes a slow multiply isn't a killer, because Forth
will be a control language -- all the math will be done in CODE words
anyhow.  The Motorola 56K has 16-bit unsigned integer addresses, but the
data operations use 24-bit signed fractions.  It's also a Harvard
architecture (separate instruction and data memories).  Brad Rodrigues'
Camel Forth seems to have provisions for Harvard architectures, but I
don't think it's been ported to the 8086 yet :-(.

By the way, I was impressed with the use of the video stuff in the P21
for generating audio.  I'm still going to use a signal processing chip,

--

If operating systems had truth-in-labelling laws:
"MVS -- Best if used before December 31, 1999"



Mon, 19 Jan 1998 03:00:00 GMT  
 S16 mimics F21

                         There was a sine table which was read using
    different step widths, four tones at the same time, and each tone
    fading with "e" characteristic. This fading was done by permanently
    multiplying the four amplitudes. I first thought of adding
    logarithms - till I realized that the multiplier was much quicker.
    The demo at least impressed myself... )

What do you mey by "permanently multiplying the four amplitudes"?

Thanks,
  Dale



Mon, 19 Jan 1998 03:00:00 GMT  
 S16 mimics F21

Quote:


>|>
>|> So my port or
>|> design will not match the "eForth" model in this respect.  

>Hi
> Remember that eForth was written to be most easily ported
>to various uP's. One should always remeber that once one
>has the basic eForth up and running, finding such things
>as better multiply words can be added.

That is exactly correct.  eForth is designed for education use
and easy porting.  It comes with a minimal number of code words,
but you see a big performance increase by writing more words
in CODE.  Start with the math words and also VARIABLES and
USER VARIABLES if you want to see the biggest performance
increase.  With only one math operation as a CODE word the
eForth porting model is very slow.

Bill has talked about releasing a new version for a long time.
He made the new version ANS compliant over a year ago.  Last I
heard he wanted to change some internals.  I don't know the
current status.

Jeff Fox
Ultra Technology



Mon, 19 Jan 1998 03:00:00 GMT  
 
 [ 15 post ] 

 Relevant Pages 

1. S16 mimics F21

2. S16 mimics F21

3. S16 mimics F21

4. Forth & music (was: S16 mimics F21)

5. S16 mimics F21

6. "S16" mimics F21

7. mimicking stacks in gawk

8. mimic wc

9. Can't mimic Excel DECIMAL computations

10. USB for F21?

11. So, is the F21 dead?

12. On the F21 GameBody topic

 

 
Powered by phpBB® Forum Software