Python or OS forking/threading problem? 
Author Message
 Python or OS forking/threading problem?

Running the python program below results in either:
i) Python segfaults (that is the parent process, see below)
ii) One or more processes hogging all the CPU
I have tried this on different setups:
a) Uni-processor Linux RH6.0 running kernel 2.2.9
b) Dual-processor Linux RH6.0 running kernel 2.2.5-15smp
Both running Python 1.5.2
When running on setup a) it runs for some time then it fails.
Setup b) fails almost immediately.
I know there are some issues with forking in a thread (I assume
pthreads are used in the implementation) but as far as I can
see it should only affect the child process (deadlocks, etc if
child process inherits locks from parent). The parent process
should not be affected by this.
I would appreciate any suggestions what is wrong here and
what I can to fix this. I have a fairly large multi-threaded
application that calls some backend scripts. My Zope server
crashes once-in-a-while because of this problem.
Since I really don't need to continue running in Python in
the child process (only want to do exec) I am going to implement
my own fork/exec function in C. What I'm hoping to achieve with
that is to avoid the Python VM locks. I suspect they have something
to do with this because when my parent process segfaults it
happens in PyThread_aquireLock (I don't remember the exact name
of the function).
Thanks
--- THE EVIL PROGRAM ---
import threading
import os
class MyThread(threading.Thread):
def run(self):
for i in range(100):
self.once()
def once(self):
pid = os.fork()
if pid == 0:
print " hello mom"
os._exit(0)
pid2, sts = os.waitpid(pid,0)
print "bye baby"
threads = []
for i in range(10):
threads.append(MyThread())
map(lambda x: x.start(), threads)
import time
time.sleep(1000)
print "DONE"

Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Wed, 04 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?

Quote:

> Running the Python program below results in either:
> i) Python segfaults (that is the parent process, see below)
> ii) One or more processes hogging all the CPU

Hmm... this is wacky.  Here's your test program, after fixing
Deja.com's mangling of the code:

import threading
import os, sys

class MyThread(threading.Thread):
    def run(self):
        for i in range(50):
            print 'calling once', i
            self.once()
        print 'Returning from run()'

    def once(self):
        print ' calling fork'
        pid = os.fork()
        print ' fork output', pid
        if pid == 0:
            print " hello mom"
            sys.stdout.flush()
            os._exit(0)
        print 'Calling waitpid', pid
        pid2, sts = os.waitpid(pid,0)
        print " bye baby"
        sys.stdout.flush()

threads = []
for i in range(10):
    threads.append(MyThread())
print threads
map(lambda x: x.start(), threads)
print 'All started'

import time
print 'Final sleep'
time.sleep(5)
print "DONE"

I've added debugging printouts and changed some of the numbers; run()
only loops 10 times instead of 100, and the final delay is only 5
seconds, not 1000.

On Solaris 2.6 and Python 1.5.2, it hangs very quickly:

[<MyThread(Thread-1, initial)>, <MyThread(Thread-2, initial)>, ... ]
calling once 0
 calling fork
 fork output 0
 hello mom
 fork output 19633
Calling waitpid 19633
 bye baby
calling once 1
 calling fork
 fork output 0
 hello mom
 fork output 19634
Calling waitpid 19634

Apparently{*filter*} in waitpid... The current CVS tree on the same
machine, however, doesn't hang; it runs through the whole sequence.
Looking through the CVS logs, I can't find a relevant checkin, so I
don't know what caused the change.  So, try the current CVS tree and
see if that improves matters.  (But Zope won't work with the current
CVS tree; still, at least you can determine if the current CVS helps,
and then look for the precise bugfix.)

Can anyone suggest what's going on here?  

--
A.M. Kuchling                   http://www.*-*-*.com/
And how often do we meet the man who prefaces his remarks with: "I was reading
a book last night..." in the too loud, overenunciated fashion of one who might
be saying: "I keep a hippogryph in my ba{*filter*}t." Reading confers status.
    -- Robertson Davies, _A Voice from the Attic_



Sun, 08 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?

Quote:
>Hmm... this is wacky.

I don't know much about threading but here is my small
contribution.  The attached program locks up on my machine when I
increase the number of forking processes to more than one.  It
happens for the CVS version of Python as well as 1.5.2.

Also, there doesn't seem to be major changes to threading.py,
posix.waitpid or threadmodule.c between 1.5.2 and the current CVS
source.  Andrew, are you sure it doesn't lock up on Solaris?
Maybe you need to increase the number of threads.

    Neil

import threading
import os, sys

running = threading.Semaphore(20) # about 5 is enough on my machine
forking = threading.Semaphore(1) # more than 1 seems to cause deadlocks

class MyThread(threading.Thread):
    def start(self):
        running.acquire()
        threading.Thread.start(self)

    def run(self):
        print ' calling fork'
        forking.acquire()
        pid = os.fork()
        print ' fork output', pid
        if pid == 0:
            print " hello mom"
            sys.stdout.flush()
            os._exit(0)
        forking.release()
        print 'Calling waitpid', pid
        pid2, sts = os.waitpid(pid,0)
        print " bye baby"
        sys.stdout.flush()
        running.release()

while 1:
    t = MyThread().start()



Mon, 09 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?

Quote:

> I don't know much about threading but here is my small
> contribution.  The attached program locks up on my machine when I
> increase the number of forking processes to more than one.  It
> happens for the CVS version of Python as well as 1.5.2.

It doesn't crash on Solaris; I increased the number of forking
processes to 10.  I wonder if all those threads might be causing so
much scheduler overhead that it only looks like a deadlock on Linux,
but really it's just spending lots of time in the kernel trying to
pick the next process to run.

Quote:
>Also, there doesn't seem to be major changes to threading.py,
>posix.waitpid or threadmodule.c between 1.5.2 and the current CVS
>source.  Andrew, are you sure it doesn't lock up on Solaris?
> Maybe you need to increase the number of threads.

Pretty sure, I think; I ran it with 50 threads, and while it was
pretty slow (scheduling overhead, I imagine), the program *did*
complete.  I thought it might be some change to signal handling that's
responsible for the fix, so that different treatment of SIGCHLD was
responsible, but there are no relevant changes since 1.5.

Unless ... I noticed something suspicious along the way; look at this
code from floatsleep() in Modules/timemodule.c:

        Py_BEGIN_ALLOW_THREADS
        if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &t) != 0) {
                Py_BLOCK_THREADS
#ifdef EINTR
                if (errno != EINTR) {
#else
                if (1) {
#endif
                        PyErr_SetFromErrno(PyExc_IOError);
                        return -1;
                }
        }
        Py_END_ALLOW_THREADS

Py_BLOCK_THREADS is for leaving a {BEGIN,END}_ALLOW_THREADS block (see
ceval.h), but this code doesn't always exit; if errno == EINTR, the
flow would be Py_BEGIN_ALLOW_THREADS; Py_BLOCK_THREADS;
Py_END_ALLOW_THREADS.  I suspect the BLOCK_THREADS should be moved to
inside the if, so it's only executed when the function actually
returns unexpectedly.  But I don't know if this might be the root of
the problem.

Any threading wizards such as Tim Peters or Greg Stein want to offer
some insight, whether into this possible bug or the original problem?

--
A.M. Kuchling                   http://starship.python.net/crew/amk/
Science itself, therefore, may be regarded as a minimal problem, consisting of
the completest possible presentment of facts with the least possible
expenditure of thought.
    -- Ernst Mach



Mon, 09 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?
Quote:
----- Original Message -----

Newsgroups: comp.lang.python

Sent: Thursday, March 23, 2000 9:55 AM
Subject: Re: Python or OS forking/threading problem?

> Unless ... I noticed something suspicious along the way; look at this
> code from floatsleep() in Modules/timemodule.c:

> Py_BEGIN_ALLOW_THREADS
> if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &t) != 0) {
>     Py_BLOCK_THREADS
>     #ifdef EINTR
>     if (errno != EINTR) {
>     #else
>     if (1) {
>     #endif
>         PyErr_SetFromErrno(PyExc_IOError);
>         return -1;
> }
> }
> Py_END_ALLOW_THREADS

> Py_BLOCK_THREADS is for leaving a {BEGIN,END}_ALLOW_THREADS block (see
> ceval.h), but this code doesn't always exit; if errno == EINTR, the
> flow would be Py_BEGIN_ALLOW_THREADS; Py_BLOCK_THREADS;
> Py_END_ALLOW_THREADS.  I suspect the BLOCK_THREADS should be moved to
> inside the if, so it's only executed when the function actually
> returns unexpectedly.  But I don't know if this might be the root of
> the problem.

Are you using a CVS copy of Python?  Because my source for floatsleep()
doesn't look like that.  I'm using the 1.5.2 source and the code looks like
this:

 Py_BEGIN_ALLOW_THREADS
 if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &t) != 0) {
     Py_BLOCK_THREADS
     PyErr_SetFromErrno(PyExc_IOError);
     return -1;
 }
 Py_END_ALLOW_THREADS

I agree that the usage you quoted is fishy, but the above looks fine.  And
forking in multiple threads locks up on my RH6.1 box also, so I don't think
the problem is in sleep.

I ran the code that you cleaned up with buffering turned off, and it seemed
to me to be locking up on the os._exit() call.  I remember Tim Peters
posting a while back about an obscure race condition with a lot of processes
being created and killed.  I'm no guru, but I'd place my bet that this is
the very problem.

My solution to the problem goes like this:

    Patient: "Doctor! My arm hurts when I go like this."

    Doctor: "So don't do that."

David



Mon, 09 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?
Grab the very latest CVS patches:  Andrew checked in a change that I presume
addresses this mystery.


Tue, 10 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?

Quote:
>It doesn't crash on Solaris; I increased the number of forking
>processes to 10.  I wonder if all those threads might be causing so
>much scheduler overhead that it only looks like a deadlock on Linux,
>but really it's just spending lots of time in the kernel trying to
>pick the next process to run.

The CPU usage is 100% but only a small percentage is system time.
The scheduling should show up as system time shouldn't it?.  I
don't think sleep() or wait() is the problem either, Python seems
to hang in the same place quite consistently.  The only task
running (using 100% cpu) is doing this:

    pthread_cond_wait () from /lib/libpthread.so.0
    PyThread_acquire_lock at thread_pthread.h:318
    PyEval_AcquireThread at ceval.c:150
    t_bootstrap at ./threadmodule.c:223
    pthread_start_thread () from /lib/libpthread.so.0

None of the threads are in a call to sleep(), wait() or fork().
This comment from the pthread_atfork man page seems like it
_might_ be relevent:

    To  understand  the purpose of pthread_atfork, recall that
    fork(2)  duplicates  the  whole  memory  space,  including
    mutexes in their current locking state, but only the call-
    ing thread: other threads are not  running  in  the  child
    process.  Thus, if a mutex is locked by a thread other than
    the thread calling fork, that  mutex  will  remain  locked
    forever in the child process, possibly blocking the execu-
    tion of the child process.

I don't see how this could cause a problem for Python though.
AFAIK, Python only using one lock.  The thread that has that lock
has to be the one that is calling fork().

If someone has any theories about what is happening here I would
be eager to hear them.

    Neil

--
Real programmers don't make mistrakes



Wed, 11 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?

Quote:

> Grab the very latest CVS patches: Andrew checked in a change
> that I presume addresses this mystery.

Not for my problem on Linux.  My code didn't call sleep().  It
may be a bug with the pthreads in libc6 for Linux.  I can't
reproduce it with C code though.

    Neil



Wed, 11 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?

Quote:

> Not for my problem on Linux.  My code didn't call sleep().  It
> may be a bug with the pthreads in libc6 for Linux.  I can't
> reproduce it with C code though.

I think I can explain what happens.  Look at the following code:

#include <stdio.h>
#include <string.h>

#include <unistd.h>
#include <pthread.h>

void *
thread(void * v)
{
  for (;;) {
    char buf[40];
    int len = sprintf(buf, "%lu\n", (unsigned long) getpid());
    write(1, buf, len);
    sleep(1);
  }

Quote:
}

main()
{
  pthread_t t;
  pthread_create (&t, NULL, &thread, NULL);
  fork();
  sleep(3600);

Quote:
}

This clearly shows that the thread is not duplicated on fork().  

Now look at the Python source code:

static PyObject *
posix_fork(self, args)
        PyObject *self;
        PyObject *args;
{
        int pid;
        if (!PyArg_ParseTuple(args, ":fork"))
                return NULL;
        pid = fork();
        if (pid == -1)
                return posix_error();
        PyOS_AfterFork();
        return PyInt_FromLong((long)pid);

Quote:
}

void
PyOS_AfterFork()
{
#ifdef WITH_THREAD
        main_thread = PyThread_get_thread_ident();
        main_pid = getpid();
#endif

Quote:
}

long PyThread_get_thread_ident _P0()
{
        volatile pthread_t threadid;
        if (!initialized)
                PyThread_init_thread();
        /* Jump through some hoops for Alpha OSF/1 */
        threadid = pthread_self();
        return (long) *(long *) &threadid;

Quote:
}

In the child process, all threads have disappeared, but the code doesn't
seem to be prepared to handle this.


Wed, 11 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?

Quote:

> Grab the very latest CVS patches:  Andrew checked in a change that I presume
> addresses this mystery.

No, that's just the suspicious bit of code that I noticed in
timemodule.c; GvR confirmed that it looks wrong, so I checked in the
patch.

--
A.M. Kuchling                   http://starship.python.net/crew/amk/
But we cannot disguise our abhorrence of modern communication devices.
  -- Queen Victoria on teleconferencing, in SEBASTIAN O #1



Fri, 13 Sep 2002 03:00:00 GMT  
 Python or OS forking/threading problem?
Thanks everybody for looking into this problem. I was
working with Naris on this (Naris posted the original
question).

I have a few comments to your discussion that might be helpful.

1) According to Posix when doing fork inside a thread only
   that thread is cloned in the new process.

   a) Thats exactly why in the code below none of
      the threads are cloned (only the main thread).

   b) This is known problem for the child process.
      If the cloned thread needs to acquire a lock that
      at the time of the fork some other thread had, it
      will deadlock (since in the child process there
      is no thread to release that lock).
      pthread_atfork is there to help the child process
      to handle this problem.

2) The strange thing in our case is that it is the parent
   thread that starts behaving badly. And as I will soon
   point out, it depends on what the child process is doing.

3) We ran into this problem because we were using popen in
   a thread. Popen is basically only doing fork and exec plus
   some filehandling stuff. The popen module is implemented
   in python. I replaced that module with one in C and the
   problem went away  (so I have a workaround:)
   I did some more experiments and it seems that if the child
   process leaves an atomic operation (enters Python again)
   our problem may occur.
   I don't know the implementation of Python but everything
   I have seen indicates that the child process is still sharing
   some lock with the parent process.

Hope this will help

Snorri

Quote:


> > Not for my problem on Linux.  My code didn't call sleep().  It
> > may be a bug with the pthreads in libc6 for Linux.  I can't
> > reproduce it with C code though.

> I think I can explain what happens.  Look at the following code:

> #include <stdio.h>
> #include <string.h>

> #include <unistd.h>
> #include <pthread.h>

> void *
> thread(void * v)
> {
>   for (;;) {
>     char buf[40];
>     int len = sprintf(buf, "%lu\n", (unsigned long) getpid());
>     write(1, buf, len);
>     sleep(1);
>   }
> }

> main()
> {
>   pthread_t t;
>   pthread_create (&t, NULL, &thread, NULL);
>   fork();
>   sleep(3600);
> }

> This clearly shows that the thread is not duplicated on fork().

> Now look at the Python source code:

> static PyObject *
> posix_fork(self, args)
>         PyObject *self;
>         PyObject *args;
> {
>         int pid;
>         if (!PyArg_ParseTuple(args, ":fork"))
>                 return NULL;
>         pid = fork();
>         if (pid == -1)
>                 return posix_error();
>         PyOS_AfterFork();
>         return PyInt_FromLong((long)pid);
> }

> void
> PyOS_AfterFork()
> {
> #ifdef WITH_THREAD
>         main_thread = PyThread_get_thread_ident();
>         main_pid = getpid();
> #endif
> }

> long PyThread_get_thread_ident _P0()
> {
>         volatile pthread_t threadid;
>         if (!initialized)
>                 PyThread_init_thread();
>         /* Jump through some hoops for Alpha OSF/1 */
>         threadid = pthread_self();
>         return (long) *(long *) &threadid;
> }

> In the child process, all threads have disappeared, but the code doesn't
> seem to be prepared to handle this.



Wed, 18 Sep 2002 03:00:00 GMT  
 
 [ 11 post ] 

 Relevant Pages 

1. Python threads: backed by OS threads?

2. Threading/forking under Python

3. fork - or threading problems

4. Newbie question: how to make os.fork() and os.pipe() work

5. Problem with any of os.system(), os.fork() & os.execp() and os.spawn()

6. strange problem with fork and Python-2.1

7. Using ActiveX in a forked thread...

8. SUnit Question:How to catch fail/error in forked threads

9. What it more efficient fork or thread ?

10. thread, fork, wait and popen3

11. threading.start(): does it fork??

12. Forking and Threads

 

 
Powered by phpBB® Forum Software