Q: Parallel forth machine 
Author Message
 Q: Parallel forth machine

I am wondering if there are any known, or considered, models
for a parallel forth machine?  Or if such even seems reasonable.

Derrick Shearer



Thu, 31 May 2001 03:00:00 GMT  
 Q: Parallel forth machine

John Dorband's group had Forth running on the MPP machine at NASA
Goddard a while back.  Kel Winters (I think I'm remembering his name
correctly; I think he was from Univ. Montana or Wyoming -- don't
remember for sure) had some proposals for custom parallel stack
hardware.  But I haven't heard anything recent.  I don't have specific
pointers to those, but they'd probably be in back issues of the
Rochester Forth Conference proceedings.

-- Phil Koopman

Quote:

>I am wondering if there are any known, or considered, models
>for a parallel forth machine?  Or if such even seems reasonable.

>Derrick Shearer





Fri, 01 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine

Quote:

>A Forth virtual machine exists as one or more
>stacks upon which there is a single thread of
>execution which may or may not implement
>parallelization through the use of multitasking.
>A parallel Forth virtual machine exists as one
>or more stacks upon which there is more than
>one thread of execution, which necessitates
>parallelization/multitasking.  Defining a model
>more detailed than this is where I am stuck.
>Do threads share stacks?
>If so, to what extent?
>Also if so, then how is control arbitrated?

I've thought about this some without suffering the
disadvantage of knowing anything at all about the
topic.  I'd like to share my naive ideas.

I'll look at what I'd like a Forth parallel system
to do for me, and in the process I'll simply assume
that anything I want can be done efficiently with
no thought to methods or to practicality.

My interest is, how should a virtual Forth parallel
system look to the user, and one step down from that,
how would it look to a user who peeks under the hood
at 'virtual' parallel machinery.

There's a natural hierarchy.  The word that gets
executed from the input buffer is at the top.  All
the under-the-hood parallelizing is going on to help
that word get executed more efficiently.  It isn't
*necessary* to have a word at the top but it doesn't
seem to do any harm and it helps me think about it.

Say the word at the top has a definition that contains
7 words.  It would be nice if there was an infinite pool
of unused processors available, and the top processor (or
any other) could delegate work to others.  Each processor
must have access to the whole program since it might get
delegated anything.  But each one needs its own data stack.  
Our top processor then can delegate each word to a different
processor.  We'll need to know the inputs and outputs for
each of those 7 words.  The top processor passes the known
inputs to each of them and passes tokens for the unknown
inputs.  Now each of the 7 2nd-level processors goes to
work.  Any of them that have all their inputs can get busy.
Those that lack some inputs can still note the words their
words will execute.  Say one of them contains 8 words.
The processor can delegate the work to 8 3rd-level processors
and send each of them the stack inputs they need or else
tokens for the ones that aren't available.  The ones that
have everything they need can proceed.  Etc.

At some point a processor will find it easier to simply
do the work rather than delegate further.  When it finishes
it sends its stack outputs back to the processor that
called it, and that processor can send those outputs to the
next in line that's waiting for them.  Any work which can
proceed given those results will get done; whatever must
still wait for other results from someone else continues to
wait.

None of this needs to be visible to the programmer -- the
compiler does it invisibly.  This is as much of a concept
of it as a programmer needs, except for I/O and memory.

If you have a loop you can assign processors to the 2nd
run through the loop at the same time as the first run
and if they don't need data from the first time around
they can proceed.  If they do need data from the first
time around they might still be able to do some things
early.  

There's nothing here about balancing the load.  In general
you'd probably want to give more priority to early routines
because they might provide data that later routines will
need.  Sometimes they wouldn't.  You can put as many fancy
algorithms as you like into the compiler to improve that
and the person who writes a Forth program won't have to
know anything about it.

Handling memory is much harder than stacks.  They all need
access to all the data space, but they have to know who's
already used each thing.  You might need some processors
devoted to memory.  For each space in memory some processor
needs to track which routines can write it, which routines
can read it, and in what order.  A stack.  When it isn't
certain which memory space a routine will use (say for
example that you're about to write into an array and you
have to do a complicated calculation to find out which
place in the array to use) then all must be blocked until
that's settled.  This looks like a nightmare in practice.
You could assign a separate cell for each unit of memory
that will be separately accessed, and each routine that
affects that memory unit leaves a token in the cell to
say who can access it next.  But when it isn't clear which
array item will be accessed next then you could spend a lot
more effort changing tokens than you would to just let a
single processor do the whole thing.  I guess the simple
rule for programmers would be to minimise the use of nonstack
nonlocal memory, and minimise complex addressing.

You'd probably also want one processor devoted to I/O output,
and anything that sends it something to print sends a token
to say where that something goes in the sequence.  

I don't really know where the technical problems are.  But
quite likely the virtual parallel machine should be something
like this.  Forth is sort-of inherently serial because the
source code comes in linear sequences of bytes.  Find the
minimum amount that programmers need to think about to make
their programs run well parallel, and let them alone.  Forth
code that depends heavily on the data stack and very little
on memory should be easy to make parallel unless the algorithm
is inherently serial, because it will usually be fairly obvious
whether any given routine depends on previously-computed
results and it will be obvious which results it depends on.



Sat, 02 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine

Quote:


>> I've thought about this some without suffering the
>> disadvantage of knowing anything at all about the
>> topic.  I'd like to share my naive ideas.
>I have been working on parallel Forth with Chuck and other
>people for the last ten years.  I would like to say a few
>things about some of your thoughts.

Great!

Quote:
>G! and RX are all you need to parallelize Forth easily and efficiently.
>Check out the four papers on parallel Forth at my site they are
>some of the oldest papers at my site.

OK, I'll look at that.

Quote:
>Linda has the wonderful feature of automatic runtime load balancing.  On
>real Linda systems you can change the number of the processors online on
>the network while things are running. You an even unplug the master
>processor and a worker takes over being master.  In real Linda systems
>they have to support a wide range of machines, compilers, operating
>systems, and networks and the code that does the runtime load
>balancing knows all about this.  On our versions of Forth-Linda we
>focused on symetric-multiprocessing where the nodes are like tuples
>themselves, duplicates of one another.  It makes things really simple.
>Then we simplifed things from five words to two it is hard to make
>it any simpler unless we just hide it completely under the hood.
>I think making parallelization completely automatic is not Forth like.
>We will see.

Automatic runtime load balancing.  That's what I was looking for.
It looked to me like it might be possible to hide it completely under
the hood, with the result that lots of Forth code with no parallelizing
words could be done parallel within reason, without rewriting.

Quote:
>> Forth is sort-of inherently serial because the
>> source code comes in linear sequences of bytes.
>No.  The source code can be a linear sequence but the execution
>can have whatever level of parallelism is appropriate for
>the hardware and the problem.  

That's what I meant.  You don't have to write it parallel, all it
means to write it serial is that in case of a dispute about resources
or an output of one section being input into another, the one that
comes earlier in the source code should have priority.  And that might
get relaxed for the resource disputes.  Lots of Forth code that's
*written* serial might be executed in parallel, and the interesting thing
would be to see which ones can't and why not.

Quote:
>Most problems are embarrasingly
>parallel at many levels.  People are familiar with sequential
>computation but the most obvious example of parallelism is
>the human brain.  My interests were in building a specialized
>parallel Forth machine optimized for this class of computation
>and providing a simple Forth like mechanism to control it.

That is interesting.


Sat, 02 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine

Quote:

>My interest is, how should a virtual Forth parallel
>system look to the user, and one step down from that,
>how would it look to a user who peeks under the hood
>at 'virtual' parallel machinery.

There shouldn't be any difference.

Consider the case of actual Forth machines, for which
Forth simply becomes a macro assembler.

There is no reason, that I can think of, why a parallel
Forth machine should be any different in allowing that
level of determinancy.

Quote:
>Say the word at the top has a definition that contains
>7 words.  It would be nice if there was an infinite pool
>of unused processors available, and the top processor (or
>any other) could delegate work to others.  Each processor
>must have access to the whole program since it might get
>delegated anything.  But each one needs its own data stack.

An *infinite* pool of unused *processors*?  Neither practical,
nor, do I think, the right concept.  Scalable, yes, infinite, no.
Processes, yes, processors, no.  Processes do not carry the
connotation of being chip bound which processors have.  So,
a scalable pool of unused processes.  Additionally, since an
unused process need not exist, a scalable pool of processes.
Yet that's nothing new.

Consider that processes may be migrated to processors, and
that each processor may have more than one process at any
given time...  There should be some means for making the
operation of all processes deterministic, which would imply
a known, but possibly hidden, multitasking routine upon each
processor.

Also, claiming that "each [processor] must have access to the
whole program" and also that "each [one] needs its own data
stack" seems somewhat contradictory.  Reevaluation and
clarrification?

Should a protected mode be necessitated or not?
Consider the option of protected mode on Intel x86
type architectures.  Consider the relative need for
protected mode contrasted between PC use and
embedded systems use.

Quote:
>Our top processor then can delegate each word to a different
>processor.  We'll need to know the inputs and outputs for
>each of those 7 words.  The top processor passes the known
>inputs to each of them and passes tokens for the unknown
>inputs.  Now each of the 7 2nd-level processors goes to
>work.  Any of them that have all their inputs can get busy.
>Those that lack some inputs can still note the words their
>words will execute.  Say one of them contains 8 words.
>The processor can delegate the work to 8 3rd-level processors
>and send each of them the stack inputs they need or else
>tokens for the ones that aren't available.  The ones that
>have everything they need can proceed.  Etc.

Such a model can easily be jeapordized.  With X resources,
and X+1 demand, what happens? (demand>resources)
Claiming infinite resources isn't allowable.  Folding back in
a round-robin fashion for processor use is only stalling.

Quote:
>At some point a processor will find it easier to simply
>do the work rather than delegate further.  When it finishes
>it sends its stack outputs back to the processor that
>called it, and that processor can send those outputs to the
>next in line that's waiting for them.  Any work which can
>proceed given those results will get done; whatever must
>still wait for other results from someone else continues to
>wait.

Yet why necessitate waiting?  Certainly processors could
be put to more adventageous use working on other
processes.  I would rather have a process waiting in a
blocked state than a processor waiting in a blocked state.

Quote:
>None of this needs to be visible to the programmer -- the
>compiler does it invisibly.  This is as much of a concept
>of it as a programmer needs, except for I/O and memory.

Yet there should be allowances for determinancy.
http://www.realtime-info.be/encyc/techno/terms/58/24.htm

Quote:
>If you have a loop you can assign processors to the 2nd
>run through the loop at the same time as the first run
>and if they don't need data from the first time around
>they can proceed.  If they do need data from the first
>time around they might still be able to do some things
>early.

That is more a matter of programming efficiency than of
machine design; to an extent, it would be nice to keep
things simple.

Quote:
>There's nothing here about balancing the load.  In general
>you'd probably want to give more priority to early routines
>because they might provide data that later routines will
>need.  Sometimes they wouldn't.  You can put as many fancy
>algorithms as you like into the compiler to improve that
>and the person who writes a Forth program won't have to
>know anything about it.

Again, that is more a matter of programming efficiency
than of machine design.  Remember how simple the
Forth virtual machine is.  It is only a couple of stacks with
a single thread of control.

A parallel Forth virtual machine can be one or more
stacks with multiple threads of control.  These multiple
threads of control may be processors, or they may not
be processors.  These multiple threads of control are
always processes.  How should these processes be
arbitrated?  How should their constituent threads be
arbitrated?

How should processes be multitasked on a single
processor?

Stack sharing?  To what extent?

Should there be a protected mode?  Always, never, or
only sometimes?

Quote:
>Handling memory is much harder than stacks.  They all need
>access to all the data space, but they have to know who's
>already used each thing.  You might need some processors
>devoted to memory.  For each space in memory some processor
>needs to track which routines can write it, which routines
>can read it, and in what order.  A stack.  When it isn't

Memory aside, an arbitration stack?
How about an arbitration stack...

An arbitration stack with tokens for each available stack.
If a process wishes to acquire a stack for its use, it pops
a token, pushes that token onto the stack it is using, maybe,
or possibly some other stack to which it has access, and
starts using the stack it has been assigned.  Once finished,
the token taken is returned to the arbitration stack, freeing
up the stack.

If another process wishes to use a stack which another
process is not using, then there is the equivalency of a
protected mode in effect.  If another process wishes to
use a stack which another process is using, for means
of data exchange or such, then how does it make its
need known to the process it wishes to request data
from?  How best to arbitrate?

How about a dual stack arrangement, which might be
easier to work with?

The concept could be expanded to include tokenized
memory locations, also.

Quote:
>certain which memory space a routine will use (say for
>example that you're about to write into an array and you
>have to do a complicated calculation to find out which
>place in the array to use) then all must be blocked until
>that's settled.  This looks like a nightmare in practice.

Yet does the previous use of an arbitration stack suggest
a feasible implementation which might be representative
of a common parallel Forth machine?

Quote:
>You could assign a separate cell for each unit of memory
>that will be separately accessed, and each routine that
>affects that memory unit leaves a token in the cell to
>say who can access it next.  But when it isn't clear which
>array item will be accessed next then you could spend a lot
>more effort changing tokens than you would to just let a
>single processor do the whole thing.  I guess the simple
>rule for programmers would be to minimise the use of nonstack
>nonlocal memory, and minimise complex addressing.

Stacks are stacks, and non-stack memory is non-stack
memory.  Memory access should be considered separate
from a parallel Forth machine, I believe.  The basic Forth
machine certainly does not explicitly refer to memory.

Quote:
>You'd probably also want one processor devoted to I/O output,
>and anything that sends it something to print sends a token
>to say where that something goes in the sequence.

I'm not so sure of that.  Again, what of a single processor
configuration?  I think that an arbitration stack makes
more sense for a parallel Forth (virtual) machine.

There is no reason why a single processor can't be
parallel tasking, is there?

Quote:
>I don't really know where the technical problems are.  But
>quite likely the virtual parallel machine should be something
>like this.  Forth is sort-of inherently serial because the
>source code comes in linear sequences of bytes.  Find the
>minimum amount that programmers need to think about to make
>their programs run well parallel, and let them alone.  Forth
>code that depends heavily on the data stack and very little
>on memory should be easy to make parallel unless the algorithm
>is inherently serial, because it will usually be fairly obvious
>whether any given routine depends on previously-computed
>results and it will be obvious which results it depends on.

For parallel execution, often each specific task in parallel
occurs with either itself in a sequential nature, or its
components occurring, at some level, in a sequential nature.

Division of labor is an abstraction which I believe does not
directly address how a parallel Forth machine should be
modeled.

Derrick Shearer



Sat, 02 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine
<snip>

An approximation, so far:

Stack based, with multiple threads of concurrent control.
Scallable, not infinite, number of processes allowed.
Each active process must have its own stack.
Hidden (implicit?), yet deterministic, multitasking and load balancing.
Amenable to Forth language and implementation.

<snip>

Quote:
>Obviously you don't need to have process overhead for every single
>word.  But I don't have to think about where it gets divided up --
>that's a job for the compiler.

<snip>

Forth is also an interpretive language...

How well do parallelism and interpretation mesh?

Quote:
>>Should a protected mode be necessitated or not?
>>Consider the option of protected mode on Intel x86
>>type architectures.  Consider the relative need for
>>protected mode contrasted between PC use and
>>embedded systems use.

>I don't quite know what you're talking about in this context
>and I don't want to need to know.  (Though I am interested.  It
>doesn't look like it's the direction I'm heading, but I could be
>wrong about that and it's sure to be useful in other contexts.)

Protection in the sense of memory or stack protection.  So
processes do not accidentally overwrite data which shouldn't
have been overwritten.

Memory protection?
Stack protection?
Register protection?

Should registers even be acknowledged?

<snip>

Quote:
>Processes that have all their input data should have priority.
>Processes that have all their input data and their parent
>processes also have all their input data should have priority
>over processes whose parents lack some data.  I'm not clear on
>the details beyond that.  It might depend partly on how complex
>you want to make your compiler.  In an extreme case it could
>calculate the time required for each process (with a spread for
>branching etc) and give the processes that are on the critical
>path the highest priority.  Something simpler would probably be
>workable.  The old Forth block system gave the block that was
>least recently used the lowest priority.  That wasn't always
>ideal but it worked well most of the time.  I like the idea of
>a simple parallel system that speeds up my code a lot.  If my
>code averages a hundred times faster then I don't care whether
>sometimes the processors are only being used at 10% efficiency
>or even 1% efficiency.  If I need it faster still then I'll
>look at how to do that, but if it gives great results without
>demanding my attention then that's good enough for the first
>pass.

Restated, processes which can be immediately proccessed
should have precedence over processes which cannot be
immediately processed.  It is one scheduling algorithm
among many, and a method for implementing parallelizm
when sequentially constrained.

Quote:
>>How should processes be multitasked on a single processor?

>If you only have one processor the lowest-overhead way is to
>treat them like a serial program.  Each process is guaranteed
>to get its input data because the previous process has
>completed.  No inter-process overhead beyond the calls for the
>"inner interpreter".  Data is passed on a single data stack
>with minimal overhead.

Yet that is no longer optimal when transitioning to multiple
processors, is it?  A parallel Forth virtual machine should
be uniform in design across all number of processing
elements, both one and many.

Quote:
>>Stack sharing?  To what extent?

>Sequential serial routines should share a stack.

Yet the questions were intended to regard parallel
routines, rather than serial routines.

Sequential and serial are synonymous.

Quote:
>>Should there be a protected mode?  Always, never, or
>>only sometimes?

>I don't want to think about that, I want it to just work.

Don't you want capability for arbitrary control?
Shouldn't there be the capability to perform 'illegal'
ops if so desired, e.g., when interpretting Forth,
possibly for debugging purposes?

- Show quoted text -

Quote:
>>Memory aside, an arbitration stack?
>>How about an arbitration stack...

>>An arbitration stack with tokens for each available stack.
>>If a process wishes to acquire a stack for its use, it pops
>>a token, pushes that token onto the stack it is using, maybe,
>>or possibly some other stack to which it has access, and
>>starts using the stack it has been assigned.  Once finished,
>>the token taken is returned to the arbitration stack, freeing
>>up the stack.

>Usually, a process will have a deterministic stack effect.  You
>know its inputs and its outputs and the maximum stack space it
>uses.  In that case it can have something like a stack frame.
>I'm thinking of your arbitration stack as an implementation
>detail.

I consider it an important one.  What would Forth and its
virtual machine be without the data stack and the return
stack?  Likewise, what could a parallel Forth virtual
machine be without something along the lines of an
arbitration stack?

Otherwise, how could parallel processes be coordinated
without requiring a single, sequential, thread of control?

- Show quoted text -

Quote:
>>If another process wishes to use a stack which another
>>process is not using, then there is the equivalency of a
>>protected mode in effect.  If another process wishes to
>>use a stack which another process is using, for means
>>of data exchange or such, then how does it make its
>>need known to the process it wishes to request data
>>from?  How best to arbitrate?

>A process which has farmed out its code to sub-processes,
>one for each Forth word, then has nothing left to do but
>accept output data from its sub-processes and send it to
>the sub-processes that need it, and send its output data
>to its calling process.  Just as traditional Forth words
>do nothing but calls (using a single data stack) until you
>get down to primitives, Forth processes can do nothing but
>process calls and data passing until you get down to the
>processes that do the work.  This looks to me like the
>analogous concept.  When I write parallel Forth code I
>don't want to have to think about the details of how this
>works any more than I think about whether my Forth system
>is subroutine-threaded or token-threaded.  One will be
>faster, another smaller, either way my code works.

Yet there should be the capability of tweaking ops at
all levels of abstraction.  Just because you don't take
advantage of tweaking the low level ops does not
mean that there is no merit to doing so.

Thinking in terms of stacks, do how would you achieve
parallelizm?  With how many stacks?  Assume that
each process can read or write a stack simultaneously
with one or more other processes.  Arbitration must be
defined.  Once this is optimally done so, there is little
reason to look back in the transition to higher levels of
abstraction, in which case, why not hardcode the
design?  That is how I perceive a parallel Forth virtual
machine should be designed.

In doing so, everyone should have some consensus
as to what the ideal is, or ought be.

Quote:
>>Memory access should be considered separate
>>from a parallel Forth machine, I believe.  The basic Forth
>>machine certainly does not explicitly refer to memory.




indicated do move memory to and from specific
locations, those specific locations are somewhat
unnecessary in considering the fact that a Forth virtual
machine loads and stores from somewhere.  The
exact locations which a Forth virtual machine loads
and stores from are somewhat less important relative
to the fact that there are loads and stores.

Handling the specifics of locality is a matter of
implementation which can safely be ignored. Assume
that whatever location is generalized can be reached.

Quote:
>>There is no reason why a single processor can't be
>>parallel tasking, is there?

>It multitasks.  But what does that get you?  If one process
>has to wait on something external then the others can proceed.
>You can have unrelated things that don't visibly wait on each
>other.  But you pay some overhead for it.  If I write my
>program so it can be executed serially, it will run faster if
>I just run it serially.  But farming it out to other processors
>could make it faster still despite the extra overhead.

I meant parallel tasking, not just multitasking.  A human brain
parallel tasks, and it can be regarded as a single processor.

- Show quoted text -

Quote:
>>For parallel execution, often each specific task in parallel
>>occurs with either itself in a sequential nature, or its
>>components occurring, at some level, in a sequential nature.

>>Division of labor is an abstraction which I believe does not
>>directly address how a parallel Forth machine should be
>>modeled.

>I have a vision of it.  The result is a parallel machine that
>I mostly don't have to pay attention to.  This ia analogous to
>the multitasking Forths that people mostly don't have to pay
>attention to.  If a routine is too computation-intensive you
>insert PAUSE .  You do BLOCK regularly instead of trying to
>save the block address.  A very few simple rules and you can
>program as if yours is the only task running.  I say it should
>be possible to do parallel Forth that way.  It may run more
>efficiently after being optimised for the underlying hardware,
>just as traditional Forth ran faster when key routines were
>converted to assembly.  But a simple, robust approach that
>exploited Forth's inherent easy-to-parallelize nature wouldn't
>be a good thing.

Understood.  Realize, however, that without addressing the
low level details at some point, there cannot be an achievable
implementation.

I have suggested an arbitration stack as a means to making
a parallel Forth virtual machine more conceptualizable.  If such
an addition can be justified, and the abstractions just above
it codified, then there is no reason why a parallel Forth virtual
machine can't be realized.

Does anyone have any ideas regarding the use of anything
other than an arbitration stack?  Something?  Please.  I hate
to induct an idea just because ...

read more »



Sat, 02 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine

Quote:


>>My interest is, how should a virtual Forth parallel
>>system look to the user, and one step down from that,
>>how would it look to a user who peeks under the hood
>>at 'virtual' parallel machinery.
>There shouldn't be any difference.
>There is no reason, that I can think of, why a parallel
>Forth machine should be any different in allowing that
>level of determinancy.

If you have an existing parallel system it makes sense to
have a Forth that can correctly push its buttons, and it
makes sense to program it to do what it does well.

I'm coming at the problem from the other direction.  If
I could have a Forth parallel system that would just do
what I wanted, what would that be?  If I start with what
I want then maybe it will turn out I can have part of it.

Quote:
>>Say the word at the top has a definition that contains
>>7 words.  It would be nice if there was an infinite pool
>>of unused processors available, and the top processor (or
>>any other) could delegate work to others.  Each processor
>>must have access to the whole program since it might get
>>delegated anything.  But each one needs its own data stack.
>An *infinite* pool of unused *processors*?  Neither practical,
>nor, do I think, the right concept.  Scalable, yes, infinite, no.
>Processes, yes, processors, no.  Processes do not carry the
>connotation of being chip bound which processors have.  So,
>a scalable pool of unused processes.  Additionally, since an
>unused process need not exist, a scalable pool of processes.
>Yet that's nothing new.

Good.  Bear in mind that I'm ignorant and may not use the
words correctly.  Yes, a scalable pool of processes, and
beforehand I want to be able to pretend they each get their
own processor.  I don't want to have to pay close attention to
how to effectively use 128 processors and then have to rewrite
extensively if it turns out there are 256 or only 64.

Quote:
>Consider that processes may be migrated to processors, and
>that each processor may have more than one process at any
>given time...  There should be some means for making the
>operation of all processes deterministic, which would imply
>a known, but possibly hidden, multitasking routine upon each
>processor.

Yes!  And I want that to just work.  In old-fashioned block-
oriented Forth you could do extensive multitasking and if you
followed the rules you didn't have to worry about whether your
block was in memory at the moment -- the system just handled it.
I want the process migration to be like that.

Quote:
>Also, claiming that "each [processor] must have access to the
>whole program" and also that "each [one] needs its own data
>stack" seems somewhat contradictory.  Reevaluation and
>clarrification?

Each active process needs its own stack.  The stack is where the
work is done and if multiple processes access the same stack they
need lots of rules for how to interact.  I say, when you start up
a process you send it the data you have and a warning about the
data you don't have.  When some of the missing data turns up you
send that.  The process does the parts it can do (on its own stack)
and waits for the data it needs.  Here's the central idea:  
Processes divide up naturally with Forth words.  A Forth word calls
a series of other Forth words, each of them gets its own stack with
the input data that's available and returns its output data -- which
the calling word sends on to the next word that uses it.  It might
send the data on to *its* calling word, which sends it on to whatever
uses it.

Obviously you don't need to have process overhead for every single
word.  But I don't have to think about where it gets divided up --
that's a job for the compiler.  

Quote:
>Should a protected mode be necessitated or not?
>Consider the option of protected mode on Intel x86
>type architectures.  Consider the relative need for
>protected mode contrasted between PC use and
>embedded systems use.

I don't quite know what you're talking about in this context
and I don't want to need to know.  (Though I am interested.  It
doesn't look like it's the direction I'm heading, but I could be
wrong about that and it's sure to be useful in other contexts.)

Quote:
>>Our top processor then can delegate each word to a different
>>processor.
>Such a model can easily be jeapordized.  With X resources,
>and X+1 demand, what happens? (demand>resources)
>Claiming infinite resources isn't allowable.  Folding back in
>a round-robin fashion for processor use is only stalling.

The simplest approach is to do the things that are sequentially
earliest in the program first.  There may be other rules that work
better.  My point is that since the separation of work into
processes naturally follows Forth words, it might be possible to
get good results without having to burden the Forth programmer
with those details at all.  Of course you'll get better results if
you carefully write to meet the idiosyncracies of your particular
parallel system.  But -- apart from memory use and maybe I/O, Forth
may be naturally easy to make parallel.

Quote:
>>At some point a processor will find it easier to simply
>>do the work rather than delegate further.  When it finishes
>>it sends its stack outputs back to the processor that
>>called it, and that processor can send those outputs to the
>>next in line that's waiting for them.  Any work which can
>>proceed given those results will get done; whatever must
>>still wait for other results from someone else continues to
>>wait.
>Yet why necessitate waiting?  Certainly processors could
>be put to more adventageous use working on other
>processes.  I would rather have a process waiting in a
>blocked state than a processor waiting in a blocked state.

What you call a process I was thinking of as a virtual processor.
You're right.

Quote:
>>None of this needs to be visible to the programmer -- the
>>compiler does it invisibly.  This is as much of a concept
>>of it as a programmer needs, except for I/O and memory.
>Yet there should be allowances for determinancy.
>http://www.realtime-info.be/encyc/techno/terms/58/24.htm

I'll look.

Quote:
>.... Remember how simple the
>Forth virtual machine is.  It is only a couple of stacks with
>a single thread of control.
>A parallel Forth virtual machine can be one or more
>stacks with multiple threads of control.  These multiple
>threads of control may be processors, or they may not
>be processors.  These multiple threads of control are
>always processes.  How should these processes be
>arbitrated?  How should their constituent threads be
>arbitrated?

Processes that have all their input data should have priority.
Processes that have all their input data and their parent
processes also have all their input data should have priority
over processes whose parents lack some data.  I'm not clear on
the details beyond that.  It might depend partly on how complex
you want to make your compiler.  In an extreme case it could
calculate the time required for each process (with a spread for
branching etc) and give the processes that are on the critical
path the highest priority.  Something simpler would probably be
workable.  The old Forth block system gave the block that was
least recently used the lowest priority.  That wasn't always
ideal but it worked well most of the time.  I like the idea of
a simple parallel system that speeds up my code a lot.  If my
code averages a hundred times faster then I don't care whether
sometimes the processors are only being used at 10% efficiency
or even 1% efficiency.  If I need it faster still then I'll
look at how to do that, but if it gives great results without
demanding my attention then that's good enough for the first
pass.

Quote:
>How should processes be multitasked on a single processor?

If you only have one processor the lowest-overhead way is to
treat them like a serial program.  Each process is guaranteed
to get its input data because the previous process has
completed.  No inter-process overhead beyond the calls for the
"inner interpreter".  Data is passed on a single data stack
with minimal overhead.

Quote:
>Stack sharing?  To what extent?

Sequential serial routines should share a stack.  

Quote:
>Should there be a protected mode?  Always, never, or
>only sometimes?

I don't want to think about that, I want it to just work.

Quote:
>Memory aside, an arbitration stack?
>How about an arbitration stack...
>An arbitration stack with tokens for each available stack.
>If a process wishes to acquire a stack for its use, it pops
>a token, pushes that token onto the stack it is using, maybe,
>or possibly some other stack to which it has access, and
>starts using the stack it has been assigned.  Once finished,
>the token taken is returned to the arbitration stack, freeing
>up the stack.

Usually, a process will have a deterministic stack effect.  You
know its inputs and its outputs and the maximum stack space it
uses.  In that case it can have something like a stack frame.
I'm thinking of your arbitration stack as an implementation
detail.

Quote:
>If another process wishes to use a stack which another
>process is not using, then there is the equivalency of a
>protected mode in effect.  If another process wishes to
>use a stack which another process is using, for means
>of data exchange or such, then how does it make its
>need known to the process it wishes to request data
>from?  How best to arbitrate?

A process which has farmed out its code to sub-processes,
one for each Forth word, then has nothing left to do but
accept output data from its sub-processes and send it to
the sub-processes that need it, and send its output data
to its calling process.  Just as traditional Forth words
do nothing but calls (using a single data stack) until you
get down to primitives, Forth processes can do nothing but
process calls and data passing until you get down to the
processes that do the work.  This looks to me like the
analogous concept.  When I write parallel Forth code I
don't want to have to think about the ...

read more »



Sun, 03 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine

Quote:

>But a simple, robust approach that
>exploited Forth's inherent easy-to-parallelize nature wouldn't
>be a good thing.

Typo.  I mean it woud be a good thing.


Sun, 03 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine

Quote:



><snip>
>An approximation, so far:
>Stack based, with multiple threads of concurrent control.
>Scallable, not infinite, number of processes allowed.
>Each active process must have its own stack.
>Hidden (implicit?), yet deterministic, multitasking and load balancing.
>Amenable to Forth language and implementation.

Yes.

Quote:
>Forth is also an interpretive language...
>How well do parallelism and interpretation mesh?

The way the interpreter usually works, you get one *line* of
interpreted names at a time.  You look up those names in a symbol table
and find the routines to execute.

If you executed each name in sequence (handling in parallel the
routines that get called) it would be marginally slower.  Or you could
look them all up and run them all in parallel.  Or you could look up one
and start it, and then look up the second one and run that (and a lot of
resource may already be committed to the first) and then look up the
third -- so your extra processors don't wait while you interpret.  Or
you could farm out the interpretation too -- interpret each word as a
separate process.  But some words parse others, and there would be some
waste from that.  Look at it this way -- if a given word tends to call
7 others, and each of them calls 7 others, and each of them calls 7
others, and each of them calls 7 primitives that do little scraps of
machine code, it isn't going to make your system run much slower if the
7 top-level words get executed in sequence.

Quote:
>>>Should a protected mode be necessitated or not?
>>I don't quite know what you're talking about in this context
>Protection in the sense of memory or stack protection.  So
>processes do not accidentally overwrite data which shouldn't
>have been overwritten.

It would be good for the processors to have their own onboard
data stacks.  Memory use is an issue.  

Quote:
>Memory protection?    Yes.
>Stack protection?     I hope it isn't needed.
>Register protection?  No.
>Should registers even be acknowledged?

No.

Quote:
><snip>
>>Processes that have all their input data should have priority.
>Restated, processes which can be immediately proccessed
>should have precedence over processes which cannot be
>immediately processed.  It is one scheduling algorithm
>among many, and a method for implementing parallelizm
>when sequentially constrained.

Yes.  

Quote:
>>>How should processes be multitasked on a single processor?
>>If you only have one processor the lowest-overhead way is to
>>treat them like a serial program.
>Yet that is no longer optimal when transitioning to multiple
>processors, is it?  A parallel Forth virtual machine should
>be uniform in design across all number of processing
>elements, both one and many.

Say one processor interprets a command and starts requiring
others to execute it.  It's a serial processor and tells the
others one-at-a-time.  (I guess an alternative would be for
it to broadcast a signal and everybody does what they're
supposed to when they get the signal.  I'm new at this.)
If it has to tell all the others personally then it may not
finish giving orders before the first half have finished.
My first-pass thinking was to treat this like a Forth program.
It makes a process for each word in the highest-level command,
and the processors that execute those make a process for each
word in their 2nd-level command, and so on.  The process of
delegating the commands goes in parallel.

But really this could be settled at compile-time.  Except for
branches you mostly know what processes will be active at
start, and you can just delegate them.  First you start the
processes that have all their data, then you start the ones
that depend on results from the first ones, etc.

I believe this doesn't require much modification to Forth
source code because it's easy for the compiler to recognise
which processes have their initial data.

Quote:
>>>Stack sharing?  To what extent?
>>Sequential serial routines should share a stack.
>Yet the questions were intended to regard parallel
>routines, rather than serial routines.

A process whose job is to delegate other processes to
multiple processors, must keep track of which processes
have their inputs.  It must communicate with the processes
it calls.  When they provide output data it must know where
to send it.  This is a specialised job.  Right now I'm
thinking of it in terms of messages.  It gets a message and
knows which process the message comes from.  The message
provides part of the output from that process.  Conceptually
this all makes sense to me.  But in reality it makes sense
that the compiler will know at compile-time whether partial
data will be useful, and if it won't be useful then the
message shouldn't be sent until everything needed is ready.

The stack idea starts to break down.  Each routine has its
stack inputs and stack outputs, but it might need to start
some of its routines before all the stack inputs are there.
(Start the ones that have all *their* stack inputs.)  It
will be handling intermediate results before all the initial
data is present.  It just isn't acting like a stack.

I think the stack model just doesn't work for this.  I can
make it work by restricting the parallel stuff.  If we
*only* start parallel routines that are completely
independent, that have all their needed inputs, then we can
make stacks for each of them and they can start working
serially.  Each of them can start processes that are
completely independent and make stacks for each of those.
The results will be collected onto parent stacks according
to some rule and more processing done.  The result is that
a lot of things that could be done in parallel won't be.
But when there are a limited number of processors anyway
that probably won't matter.

Quote:
>>>Should there be a protected mode?  Always, never, or
>>>only sometimes?
>>I don't want to think about that, I want it to just work.
>Don't you want capability for arbitrary control?
>Shouldn't there be the capability to perform 'illegal'
>ops if so desired, e.g., when interpretting Forth,
>possibly for debugging purposes?

Yes, but first I want it to just work.  I'll deal with the
subtle machine-dependent stuff later.

Quote:
>>>Memory aside, an arbitration stack?
>>>How about an arbitration stack...
>>>An arbitration stack with tokens for each available stack.
>>>If a process wishes to acquire a stack for its use, it pops
>>>a token, pushes that token onto the stack it is using, maybe,
>>>or possibly some other stack to which it has access, and
>>>starts using the stack it has been assigned.  Once finished,
>>>the token taken is returned to the arbitration stack, freeing
>>>up the stack.
>>Usually, a process will have a deterministic stack effect.  You
>>know its inputs and its outputs and the maximum stack space it
>>uses.  In that case it can have something like a stack frame.
>>I'm thinking of your arbitration stack as an implementation
>>detail.
>I consider it an important one.  What would Forth and its
>virtual machine be without the data stack and the return
>stack?  Likewise, what could a parallel Forth virtual
>machine be without something along the lines of an
>arbitration stack?

If you have a stack of stacks there's the chance the top one
will be too small.  We could make them a standard size and
require that no process can overflow them.

If you don't know what order your data will arrive then you can't
just push it onto a stack.  You'll need something more like an
array.  You look at the tag and put it into place.

Quote:
>Otherwise, how could parallel processes be coordinated
>without requiring a single, sequential, thread of control?

When you execute a Forth word you already have a hierarchical
sort of control.  High-level words call low-level ones and early
words pass data to later ones, not the other way around.  I'm
not sure that's enough but it may be.  

Quote:
>>>If another process wishes to
>>>use a stack which another process is using, for means
>>>of data exchange or such, then how does it make its
>>>need known to the process it wishes to request data
>>>from?  How best to arbitrate?

I was thinking of data exchange as a push rather than pull.
When it's ready you send it, tagged to say what it's for.
The other process may not be ready for it and it may need to
be stored.  I don't want that sitting on an active process's
stack.

Quote:
>>When I write parallel Forth code I
>>don't want to have to think about the details of how this
>>works any more than I think about whether my Forth system
>>is subroutine-threaded or token-threaded.  One will be
>>faster, another smaller, either way my code works.
>Yet there should be the capability of tweaking ops at
>all levels of abstraction.  Just because you don't take
>advantage of tweaking the low level ops does not
>mean that there is no merit to doing so.

Yes, but it's a special thing.  When I write a fragment in
assembler I have to know assembler, knowing Forth isn't
enough.  And when I switch to a different processor I'll
need a different assembler.  I want something that will
work OK on top of the idiosyncratic ops.

Quote:
>Thinking in terms of stacks, do how would you achieve
>parallelizm?  With how many stacks?  Assume that
>each process can read or write a stack simultaneously
>with one or more other processes.  Arbitration must be
>defined.  Once this is optimally done so, there is little
>reason to look back in the transition to higher levels of
>abstraction, in which case, why not hardcode the
>design?  That is how I perceive a parallel Forth virtual
>machine should be designed.

I imagine a Forth processor that executes a Forth primitive
should start with all its data items on a stack and it will
return its outputs on a stack.  When you're assembling the
inputs they won't act like a stack then.  What you do with
the outputs may not be like a stack either.

CODE A ( x x -- x ) ... END-CODE
CODE B ( x x -- x ) ... END-CODE
CODE C ( x x -- x ...

read more »



Sun, 03 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine
On Wed, 16 Dec 1998 00:34:51 GMT, Jonah Thomas

Quote:

>I'm coming at the problem from the other direction.  If
>I could have a Forth parallel system that would just do
>what I wanted, what would that be?  If I start with what
>I want then maybe it will turn out I can have part of it.

        Throw words at it to be done in parallel, as many as I want to
do in one go, then channel the results where I want them to go. The
first part like the system (forget its name right now, I've got a
bookmark at the office) where you make parallisable function calls,
and the functions are compiled in two versions: one to be executed in
normal sequential fashion by the main thread of execution, and one to
be executed by another processor that has stolen it from the back of
the job queue. The ordinary sequential version runs faster, because it
doesn't have the same overhead, so it runs as if it is just throwing
jobs onto a sequential job queue, except that sometime (hopefully, a
lot of the time!) the jobs are completed before the main thread of
execution gets to them. If the job queue is emptied, it can start
pulling stuff back together.

        [I'm not sure whether the same processor would go off and
steal another job from some other tasks queue if it was left waiting
for a previously stolen process to complete.]

As to the channel forking and gathering, to permit

             +-----------------------\
            /         +----------------+-----------------
           /         /            +------/
--------/-------/-------\--/--\
                               \      +---\       +-----------
                                 \           +---[
                                   +------/       +----------

activities, I reckon it can be done with shared memory, but its easier
for me to imagine explaining it to the computer with a message channel
language.

(
----------
Virtually,

Bruce McFarling, Newcastle,

)



Sun, 03 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine

Quote:

>              +-----------------------\

Can you please rewrite that using a non-proportional font?

-marcel



Sun, 03 Jun 2001 03:00:00 GMT  
 Q: Parallel forth machine


Quote:


>>              +-----------------------\

>Can you please rewrite that using a non-proportional font?

        I probably could. It's not worth the trouble though. The point
is how you let parallel threads of execution split off and rejoin
together in ways other than just:

  [------------]
  [------------]
--[------------]-----
  [------------]
  [------------]

(BTW, in a proportional font that looks funny, so now the shoes on the
other foot). I can halfway picture doing that with communication
channels, with a system of mux &/or demux socket definitions providing
the plumbing to launch and collect parallel threads of execution.
        One advantage of Forth is that any thread of execution can be
bundled up into a word, which can be docked into the mux &/or demux
sockets as what happens to the stack in betwene appearing and
disappearing. Something along the lines of |`| to tick an entity that
can be launched off as an independent parallel thread.

(
----------
Virtually,

Bruce McFarling, Newcastle,

)



Sun, 03 Jun 2001 03:00:00 GMT  
 
 [ 21 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Forth for parallel machines?

2. Parallel Lisps for Parallel Machines

3. HELP: PARALLEL MACHINES AND PARALLEL LANGUAGES

4. Parallel Lisps for Parallel Machines

5. Parallel Lisps on Parallel Machines

6. Parallel Reduction Machine

7. References to Parallel Functional Machines

8. Multiple parallel instances of identical state machines

9. Parallel Numeric Algorithms for MIMD Machines

10. WRITE on parallel machines

11. Python on parallel machines (with MPI)

12. Python on parallel machines (with MPI)

 

 
Powered by phpBB® Forum Software