Asynchronous parallel computing 
Author Message
 Asynchronous parallel computing

I'm not quite sure if this is a Linux or fortran question but I will
try here first.

Basically, I'm trying to set up an asynchronous parallel computing
cluster on a couple of machines running SuSE 9.1 - 10.1. Some of the
PCs are at college and two of them are in my home.

I will be using a simple master-slave topology. A Fortran program will
run on the master node, which will ssh with no login to the other nodes
to execute tasks. The problem is that I don't know how to detect if the
tasks given to a slave node is still running or have been completed
already. I only want to assign a new task to a slave node if it has no
job running (mine or others).

The most reliable method I could think of is to "ps aux > output.txt"
and parse "output.txt" for >80% CPU utilisation (say). This will need
to be periodically done perhaps every 30 seconds, perhaps using a "do
while" loop with a timer.

Is there a better way to do this?

Thanks.



Mon, 24 Nov 2008 16:11:19 GMT  
 Asynchronous parallel computing

Quote:

> I'm not quite sure if this is a Linux or Fortran question but I will
> try here first.

> Basically, I'm trying to set up an asynchronous parallel computing
> cluster on a couple of machines running SuSE 9.1 - 10.1. Some of the
> PCs are at college and two of them are in my home.

> I will be using a simple master-slave topology. A Fortran program will
> run on the master node, which will ssh with no login to the other nodes
> to execute tasks. The problem is that I don't know how to detect if the
> tasks given to a slave node is still running or have been completed
> already. I only want to assign a new task to a slave node if it has no
> job running (mine or others).

> The most reliable method I could think of is to "ps aux > output.txt"
> and parse "output.txt" for >80% CPU utilisation (say). This will need
> to be periodically done perhaps every 30 seconds, perhaps using a "do
> while" loop with a timer.

> Is there a better way to do this?

I think what you want is a batch queueing system like PBS or one of
its relatives.

--

Experimentelle Physik V   http://www.physik.uni-dortmund.de/~wacker
Universitaet Dortmund     Tel.: +49 231 755 3587
D-44221 Dortmund          Fax:  +49 231 755 4547



Mon, 24 Nov 2008 16:32:34 GMT  
 Asynchronous parallel computing

Quote:

> I think what you want is a batch queueing system like PBS or one of
> its relatives.

The thing is that the job a slave node needs to run varies depending on
the most recent results from other slave nodes. As new results come in,
the next job to be solved would change.

Thus, a queue would quickly become obsolete. The best thing is to only
get the master node to send a new task to a slave node iff the slave
node is idle. That way the tasks the next slave node is given will
always be based on the most recent & best known solution.



Mon, 24 Nov 2008 17:09:07 GMT  
 Asynchronous parallel computing

Quote:

> The best thing is to only
> get the master node to send a new task to a slave node iff the slave
> node is idle. That way the tasks the next slave node is given will
> always be based on the most recent & best known solution.

This is probably using a sledgehammer to crack a nut, but have you looked
into MPI and/or PVM ?

Ian



Mon, 24 Nov 2008 18:22:10 GMT  
 Asynchronous parallel computing

Quote:


> > The best thing is to only
> > get the master node to send a new task to a slave node iff the slave
> > node is idle. That way the tasks the next slave node is given will
> > always be based on the most recent & best known solution.

> This is probably using a sledgehammer to crack a nut, but have you looked
> into MPI and/or PVM ?

> Ian

No I haven't, but I suppose I could. Frankly, I just hate having black
boxes in my work. I wrote a whole optimiser myself because of this,
although freely available, peer tested, canonical ones are available
for download.

I had to grudgingly accept a commercial finite element code though. I
don't think I'm smart or able enough to write my own FE code and
experimentally validate it for anisotropic materials under high strain
rates in a few years.

Even now it still bugs me that all my work is useless (although maybe
not pointless) if I don't have a licence to that "black box". So, if
possible, I'd like to code the parallel computing task managing myself.



Mon, 24 Nov 2008 17:37:19 GMT  
 Asynchronous parallel computing

Quote:

> I will be using a simple master-slave topology. A Fortran program will
> run on the master node, which will ssh with no login to the other nodes
> to execute tasks. The problem is that I don't know how to detect if the
> tasks given to a slave node is still running or have been completed
> already. I only want to assign a new task to a slave node if it has no
> job running (mine or others).

You might consider Ara Howards' RubyQueue,

  http://raa.ruby-lang.org/project/rq/
  http://www.artima.com/rubycs/articles/rubyqueue.html

He originally developed it to help a bunch of NOAA scientists
accomplish a very similar task.

Regards,
--
Bil
http://fun3d.larc.nasa.gov



Mon, 24 Nov 2008 17:47:23 GMT  
 Asynchronous parallel computing

[Snip...]

Quote:
>> cluster on a couple of machines running SuSE 9.1 - 10.1

I also apologize for straying away from Fortran, but...

Just a note about SuSE 10.1--I'm having more than the usual trouble getting
it installed to my liking. IMO, it's one of the roughest SuSE releases, and
I go back to 6.4 with them. Plenty of traffic on alt.os.linux.suse about it
if that might be of use to you.

OTOH, SuSE 10.0 has been rock solid for me.

Anyway, just a suggestion, if you can wait for 10.1 to settle down a bit.

FWIW, YMMV...

--
Regards, Weird (Harold Stevens) * IMPORTANT EMAIL INFO FOLLOWS *
Pardon any bogus email addresses (wookie) in place for spambots.
Really, it's (wyrd) at airmail, dotted with net. DO NOT SPAM IT.
Kids jumping ship? Looking to hire an old-school type? Email me.



Mon, 24 Nov 2008 18:24:51 GMT  
 Asynchronous parallel computing


Quote:
> > This is probably using a sledgehammer to crack a nut, but have you looked
> > into MPI and/or PVM ?

> No I haven't, but I suppose I could. Frankly, I just hate having black
> boxes in my work. I wrote a whole optimiser myself because of this,
> although freely available, peer tested, canonical ones are available
> for download.

> I had to grudgingly accept a commercial finite element code though. I
> don't think I'm smart or able enough to write my own FE code and
> experimentally validate it for anisotropic materials under high strain
> rates in a few years.

> Even now it still bugs me that all my work is useless (although maybe
> not pointless) if I don't have a licence to that "black box". So, if
> possible, I'd like to code the parallel computing task managing myself.

MPI is open source and is an open standard, so you don't really have
license or "black box" issues with it.  I would say that there are
more man-years of effort in MPI development than there would be in a
commercial finite-element code, so if you really want to reinvent
the MPI wheel, you have a lot of work to do.

However, MPI isn't the solution for every parallel computing
problem.  If you code is SPMD (single program multiple data), then
MPI is a good fit.  That sounds like what you want to do.  Each node
will be running the same program, but it is operating on its own
input data.  You can program with MPI for both dynamic and static
task assignment.  It sounds like you want dynamic task assignment.  
MPI can be used in either a peer-to-peer mode or in a master-slave
mode, or you can switch back and forth within your code.

If you want something lighter weight, then you might consider
TCGMSG.  It is also open source.  I know the author, and I think it
took him about a year to get it in its final form.  TCGMSG predates
MPI, and these days it is probably easier to port and install MPI
than TCGMSG.

$.02 -Ron Shepard



Mon, 24 Nov 2008 22:12:33 GMT  
 Asynchronous parallel computing

Quote:

>I'm not quite sure if this is a Linux or Fortran question but I will
>try here first.

>Basically, I'm trying to set up an asynchronous parallel computing
>cluster on a couple of machines running SuSE 9.1 - 10.1. Some of the
>PCs are at college and two of them are in my home.

Others have made some useful suggestions, but ....

You might get good answers by asking this same question in either
comp.distributed or comp.parallel (or both).  I think it's slightly
more on-topic in comp.distributed, but that's not a very active group,
so I don't know whether you'll get replies.

Quote:
>I will be using a simple master-slave topology. A Fortran program will
>run on the master node, which will ssh with no login to the other nodes
>to execute tasks. The problem is that I don't know how to detect if the
>tasks given to a slave node is still running or have been completed
>already. I only want to assign a new task to a slave node if it has no
>job running (mine or others).

>The most reliable method I could think of is to "ps aux > output.txt"
>and parse "output.txt" for >80% CPU utilisation (say). This will need
>to be periodically done perhaps every 30 seconds, perhaps using a "do
>while" loop with a timer.

>Is there a better way to do this?

Not necessarily better, but the best of my knowledge (which admittedly
isn't extensive), this is more or less what well-known "distributed
computing" projects do, so there should be some stuff out there --
scripts, code, something -- that would either make it unnecessary to
write your own, or could help guide you toward a good reinvention
of the wheel, if that's what you want.

--
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.



Mon, 24 Nov 2008 22:23:14 GMT  
 Asynchronous parallel computing

Quote:



>> > The best thing is to only
>> > get the master node to send a new task to a slave node iff the slave
>> > node is idle. That way the tasks the next slave node is given will
>> > always be based on the most recent & best known solution.

>> This is probably using a sledgehammer to crack a nut, but have you looked
>> into MPI and/or PVM ?

>> Ian

>No I haven't, but I suppose I could. Frankly, I just hate having black
>boxes in my work. I wrote a whole optimiser myself because of this,
>although freely available, peer tested, canonical ones are available
>for download.

I think you may not be clear on what MPI and PVM are -- these
are libraries of message-passing code, and I don't think you want
to write your own such libraries, any more than you'd write your
own ....  I was going to say "printf", but that's C.  I'm not sure
what the Fortran equivalent is, but some widely-used and fairly
basic library routine that no sensible person would pass up in
favor of writing his/her own version.

Quote:
>I had to grudgingly accept a commercial finite element code though. I
>don't think I'm smart or able enough to write my own FE code and
>experimentally validate it for anisotropic materials under high strain
>rates in a few years.

>Even now it still bugs me that all my work is useless (although maybe
>not pointless) if I don't have a licence to that "black box". So, if
>possible, I'd like to code the parallel computing task managing myself.

You'd still be doing that with MPI (which is more widely used at this
point, AFAIK, and thus might be a better choice).  What MPI gives you
is a library of stuff you probably do *not* want to write for yourself
(e.g., "send this array of integers from process 1 to process 2").

--
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.



Mon, 24 Nov 2008 22:29:21 GMT  
 Asynchronous parallel computing

Quote:



> >I'm not quite sure if this is a Linux or Fortran question but I will
> >try here first.

> >Basically, I'm trying to set up an asynchronous parallel computing
> >cluster on a couple of machines running SuSE 9.1 - 10.1. Some of the
> >PCs are at college and two of them are in my home.

> Others have made some useful suggestions, but ....

> You might get good answers by asking this same question in either
> comp.distributed or comp.parallel (or both).  I think it's slightly
> more on-topic in comp.distributed, but that's not a very active group,
> so I don't know whether you'll get replies.

> >I will be using a simple master-slave topology. A Fortran program will
> >run on the master node, which will ssh with no login to the other nodes
> >to execute tasks. The problem is that I don't know how to detect if the
> >tasks given to a slave node is still running or have been completed
> >already. I only want to assign a new task to a slave node if it has no
> >job running (mine or others).

> >The most reliable method I could think of is to "ps aux > output.txt"
> >and parse "output.txt" for >80% CPU utilisation (say). This will need
> >to be periodically done perhaps every 30 seconds, perhaps using a "do
> >while" loop with a timer.

> >Is there a better way to do this?

> Not necessarily better, but the best of my knowledge (which admittedly
> isn't extensive), this is more or less what well-known "distributed
> computing" projects do, so there should be some stuff out there --
> scripts, code, something -- that would either make it unnecessary to
> write your own, or could help guide you toward a good reinvention
> of the wheel, if that's what you want.

On NT workstations, I've been able to synchronize distributed
applications very easily (near real time within network protocol
limitations) using pipes (createfile) with a background pipe monitoring
application.  Excellent performance.  Of course there are some security
issues with what I've done, but it is an isolated network that I have
complete control over.

- Show quoted text -

Quote:
> --
> B. L. Massingill
> ObDisclaimer:  I don't speak for my employers; they return the favor.



Tue, 25 Nov 2008 00:14:07 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. Parallel Computing and Distributed computing

2. multiple, parallel, asynchronous, continuous Serial VISA reads

3. Parallel and distributed computing

4. Parallel computing (and hard drive space...)

5. Parallel Computing Forum

6. LOGO-L> logo and parallel computing

7. Simple Imperative Languages for Teaching Parallel Computing

8. Scheme extensions for parallel computing.

9. CFP:CFP: Reading Workshops on Parallel Computing CFP:CFP

10. Parallel computing on dual-processor pentium

11. Parallel Computing

 

 
Powered by phpBB® Forum Software