hanging on a file read in python 1.5.2, WinNT 
Author Message
 hanging on a file read in python 1.5.2, WinNT

I'm having a bizarre problem. I have a program that, among other
things, using urllib to read a file to parse some meta info
("publicationDate". On every site I use this program, everything runs
smoothly. But on one site, the program periodically hangs when doing
the read. After this statement is executed, nothing:

print "I've opened file %s" % dcurrentFile

On other sites, with identical metadata, no hangs. On this site, it
will run through 0 - x files with no problems, sometimes hesitating
after that statement, and finally, it will hang--I can go out to lunch
and leave it and it's still hung when I go back. Spidering a different
site, it will run all night and through thousands of entries (this is
an intranet project).

Any ideas? I suspect I'm doing something wrong with urllib. It is used
elsewhere in the program to grab some data, but here, I had to
explicitly preface the url with "http://".

Here's the entire relevant code fragment:

class seekUrl(htmllib.HTMLParser):

    def __init__(self):
        htmllib.HTMLParser.__init__(self, formatter.NullFormatter())
        self.c_pubdate = "1990-01-01"

    def do_meta(self,stuff):
        if stuff[0][1] == "publicationDate":
            self.c_pubdate = stuff[-1][1]
        try:
            bar = DateTime.ISO.ParseDate(self.c_pubdate)
        except:
            self.c_pubdate = "1990-01-01"

    def close(self):
            return self.c_pubdate

def urlDateCheck(dcurrentFileRoot,dcurrentFile):
    parser=seekUrl()
    dcurrentFile = 'http://'+dcurrentFile
    dcurrent = urllib.urlopen(dcurrentFile)
    print "I've opened file %s" % dcurrentFile
    inFile = dcurrent.read()
    if dcurrentFileRoot != 'www.foobar.com':
        print "ready for the parser feed"
        parser.feed(inFile)
        print "now to grab a pubdate from parser.close"
        pubdate = parser.close()
        print "I've got a pubdate, %s" % pubdate
    dcurrent.close()
    urllib.urlcleanup()
    print "I'm in urlDateCheck and pubdate is %s" % pubdate
    inTime = 0
    foo = DateTime.now() + DateTime.RelativeDateTime(months=-18)
    bar = DateTime.ISO.ParseDate(pubdate)
    if bar > foo :
        inTime = 1
    return inTime

and from the main line:

                        print "now getting ready to check date status"
                        inDate = urlDateCheck(currentFileRoot,nextUrl)
                        if inDate : currentState = 'NEW'
                        else : currentState = 'IGNORE

....

--
Ari Davidow

Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Mon, 19 May 2003 03:00:00 GMT  
 
 [ 1 post ] 

 Relevant Pages 

1. embedded Python: one thread hangs while trying to get global Python lock

2. read/write on COM1 WinNt

3. WinNT read error

4. Tcl+WinNT: reading registry

5. How 2 invoke python 2 read from text file under W2k

6. Python classes for reading/writing/parsing MIDI files

7. httplib hung on read

8. binary file reading in python

9. Python reading Berlkey DB-files made with Perl

10. Read MS Access 97 *.mdb files with python??

11. reading/writing binary files in Python

12. Reading commands from a file to the Python interpreter

 

 
Powered by phpBB® Forum Software