Lost between urllib and httplib 
Author Message
 Lost between urllib and httplib

Python 2.1:

I'm choosing between urllib and httplib, but have trouble with both.
urllib.urlopen() is hiding HTTP errors (hope this will be fixed in next
version!), and httplib.HTTP() doesn't seem to be able to access some pages
if you only supply a domain name.

Here's the script I use for testing httplib:

import urlparse
import httplib
import sys

url = sys.argv[1]

try:
  h = httplib.HTTP(urlparse.urlparse(url)[1])
  h.putrequest('GET', urlparse.urlparse(url)[2])
  h.putheader('Accept', 'text/html')
  h.endheaders()
except:
  print "Host not found."
  sys.exit()

print h.getreply()[0]

On some websites, I get HTTP error 301, 302 or even 404. Try for example:

http://www.*-*-*.com/ (301)
http://www.*-*-*.com/ (302)
http://www.*-*-*.com/ (404)

If using urllib however, these places are all accessed successfully. What
is it in urllib.urlopen() that has to be added if using httplib?

Regards,

Gustaf Liljegren



Fri, 28 Nov 2003 05:06:11 GMT  
 Lost between urllib and httplib

Content-Transfer-Encoding: 8Bit

Quote:

> python 2.1:

> I'm choosing between urllib and httplib, but have trouble with both.
> urllib.urlopen() is hiding HTTP errors (hope this will be fixed in next
> version!), and httplib.HTTP() doesn't seem to be able to access some pages
> if you only supply a domain name.

> Here's the script I use for testing httplib:

> import urlparse
> import httplib
> import sys

> url = sys.argv[1]

> try:
>   h = httplib.HTTP(urlparse.urlparse(url)[1])
>   h.putrequest('GET', urlparse.urlparse(url)[2])
>   h.putheader('Accept', 'text/html')
>   h.endheaders()
> except:
>   print "Host not found."
>   sys.exit()

> print h.getreply()[0]

> On some websites, I get HTTP error 301, 302 or even 404. Try for example:

> http://www.webstandards.org (301)
> http://www.native-instruments.com (302)
> http://www.chaos.com (404)

> If using urllib however, these places are all accessed successfully. What
> is it in urllib.urlopen() that has to be added if using httplib?

> Regards,

> Gustaf Liljegren

HTTP results 301 and 302 are redirections.  You have to handle them

yourself using httplib.  The good news is that httplib gives you the

flexibility to handle them as you want.  I have attached one of our HTTP

clients that uses httplib.  It handles all three sites you mentioned.

<shameless plug>

This code is used in our website load testing system http://www.stressmy.com

</shameless plug>

--


Senior Meat Manager

Downright Software LLC

http://www.downright.com

  rawhtmlpage.py
13K Download


Fri, 28 Nov 2003 08:52:20 GMT  
 Lost between urllib and httplib


Quote:
> HTTP results 301 and 302 are redirections.  You have to handle them
> yourself using httplib.  The good news is that httplib gives you
> the flexibility to handle them as you want.  I have attached one of
> our HTTP clients that uses httplib.  It handles all three sites
> you mentioned.

Doug, unfortunately I receive a digest of this list that does not
include attachments. Is it possible that the sample client you
mentioned is available somewhere on the 'net. Thanks very much in
any case. - Bill

Bill Bell, Software Developer



Fri, 28 Nov 2003 22:50:16 GMT  
 Lost between urllib and httplib

For some reason, my newsserver appears to have lost Dougs article (could it
be the attachment?), but I found it on groups.google.com (without
attachment of course). Thanks a lot for the info anyway. It's a great
relief to know that I can still use httplib. I only need some more
knowledge in HTTP.

Nevertheless, I'd appreciate very much to see how to handle 301 and 302
redirections too.

Regards,

Gustaf Liljegren



Sat, 29 Nov 2003 03:27:42 GMT  
 Lost between urllib and httplib


Quote:
>I'm choosing between urllib and httplib, but have trouble with both.
>urllib.urlopen() is hiding HTTP errors (hope this will be fixed in next
>version!)

urllib.urlopen is implemeneted using the class urllib.FancyURLopener.
That class does lots of good things (such as transparently handing
redirects) but its default behaviour is to swallow any errors it
doesnt know how to handle.

I suggest you derive a new class from FancyURLopener, and overide the
http_error_default function. I often use this class to raise errors in
exceptions.....

class URLopener(urllib.FancyURLopener):
    def http_error_default(self, url, fp, errcode, errmsg, headers):
        return urllib.URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)

Quote:
>and httplib.HTTP() doesn't seem to be able to access some pages
>if you only supply a domain name.

httplib is very low level, I suggest you stick with urllib.

Toby{*filter*}enson



Sat, 29 Nov 2003 21:04:21 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Authentication, urllib, and httplib...

2. REPOST: Authentication, urllib, and httplib...

3. urllib and httplib wan to use my modem instead of the ethernet

4. using httplib, urllib with proxy authentication

5. web client urllib/httplib question

6. SSL and POST with httplib or urllib

7. simple httplib and urllib timeout question

8. httplib vs. urllib

9. difference between urllib and httplib

10. WebDAV extensions to urllib/httplib?

11. urllib / httplib and URL - requests by python - program

12. httplib / urllib

 

 
Powered by phpBB® Forum Software