htmllib samples 
Author Message
 htmllib samples

Hi,

has anyone got some samples or point me to where to find them on how to
implement this module?

Oliver



Mon, 15 Sep 2003 16:38:38 GMT  
 htmllib samples

Quote:

> has anyone got some samples or point me to where to find them on how to
> implement this module?

you mean "use", I assume?

http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html#HTMLLIB-MODULE
http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html#FORMATTER-M...

note that when you think you need the htmllib module, you
usually want the sgmllib module:

http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html#SGMLLIB-MODULE

Cheers /F



Mon, 15 Sep 2003 16:53:32 GMT  
 htmllib samples

Quote:

> Hi,

> has anyone got some samples or point me to where to find them on how to
> implement this module?

> Oliver

I've attached a simple parser we use.  It may help.

Note that Frederick  Lundh modestly failed to mention his upcoming book on
'The python Library'.  There's an ebook version of his 'eff-bot' Guide,
which is the  first place I look for code  examples.

--

Senior Meat Manager
Downright Software LLC
http://www.dougfort.net

[ formfieldparser.py 4K ]
#!/usr/bin/env python
"""
FormFieldParser

This object parses HTML text and builds a dictionary of
dictionaries of form fields

$Id: formfieldparser.py,v 1.1 2001/01/26 15:18:30 dougfort Exp $
"""
__author__="Downright Software LLC"
__version__="$Revision: 1.1 $"[11:-2]

import sgmllib
import string
import cStringIO
import urllib
import re

import webnudge.util.misc
import webnudge.util.document

class FormFieldParserException:
    def __init__(self, message):
        self._message = message
    def __str__(self):
        return self._message

###########################################################
class FormFieldParser(sgmllib.SGMLParser):
###########################################################
    """
    FormFieldParser class. Parse a page from a website,
    creating a dictionary of dictionairies of form
    fields
    """

    #----------------------------------------------------------
    def __init__(self):
    #----------------------------------------------------------
        """
        Constructor
        """
        sgmllib.SGMLParser.__init__(self)

        self._formcount = 0
        self._formdict = {}

    #----------------------------------------------------------
    def parse(self, text):
    #----------------------------------------------------------
        """
        parse some text, without trashing javascript
        """
        self.feed(text)
        self.close()
        return self._formdict

    #----------------------------------------------------------
    def start_form(self,attributes):
    #----------------------------------------------------------
        """
        start a form
        """
        self._formdict[self._formcount] = {}
    #----------------------------------------------------------
    def end_form(self):
    #----------------------------------------------------------
        """
        end a form
        """
        self._formcount += 1

    #----------------------------------------------------------
    def _storeformfield(self,attributes,multivalue=0):
    #----------------------------------------------------------
        """
        Capture name and value attributes of a form field
        """
        tagname = None
        tagvalue = ""
        selected = 0
        for key, value in attributes:
            if key == "name":
                tagname = value
                continue
            if key == "value":
                tagvalue = value
                continue
            if key == "selected":
                selected = 1
                continue
        if multivalue and not selected:
            return

        if tagname:
            self._formdict[self._formcount][tagname] = tagvalue

    #----------------------------------------------------------
    def do_input(self,attributes):
    #----------------------------------------------------------
        """
        Capture <input> element
        """
        self._storeformfield(attributes)

    #----------------------------------------------------------
    def do_option(self,attributes):
    #----------------------------------------------------------
        """
        Capture <option> element
        """
        self._storeformfield(attributes, multivalue=1)

    #----------------------------------------------------------
    def do_select(self,attributes):
    #----------------------------------------------------------
        """
        Capture <select> element
        """
        self._storeformfield(attributes, multivalue=1)

    #----------------------------------------------------------
    def do_textarea(self,attributes):
    #----------------------------------------------------------
        """
        Capture <textarea> element
        """
        self._storeformfield(attributes)

#----------------------------------------------------------
if __name__ == "__main__":
#----------------------------------------------------------
    """
    Code for commandline testing
    """
    import sys
    if len(sys.argv) != 2:
        print "Usage:  filteringparser.py <url>"
        sys.exit(-1)

    import webnudge.util.rawhtmlpage
    page = webnudge.util.rawhtmlpage.RawHTMLPage()
    page.load(sys.argv[1])
    if not page:
        print "*** Error *** %s" % (page._message)
        sys.exit(-1)

    result = FormFieldParser().parse(page._data)

    sys.stdout.write(repr(result))



Mon, 15 Sep 2003 20:24:13 GMT  
 htmllib samples
Ok, i'll bite - why would you want to use the significantly greater
complexity of SGML if you think you want to do HTML?

Regards,

Dave LeBlanc

P.S Looked at the book niblet on O'Reilly - looks good. Can't wait to
see it in print!

On Thu, 29 Mar 2001 08:53:32 GMT, "Fredrik Lundh"

<snip>

Quote:
>note that when you think you need the htmllib module, you
>usually want the sgmllib module:
<snip>
>Cheers /F



Tue, 16 Sep 2003 04:29:19 GMT  
 htmllib samples

Quote:

> Ok, i'll bite - why would you want to use the significantly greater
> complexity of SGML if you think you want to do HTML?

htmllib is based on sgmllib, but the classes in htmlib are designed
for HTML formatting rather than HTML parsing.  in my experience,
most real-life problems are parser related (e.g. extract all images,
look for title tags, etc), not formatting related.

Cheers /F



Tue, 16 Sep 2003 06:10:23 GMT  
 htmllib samples

Quote:

> I've attached a simple parser we use.  It may help.

> Note that Frederick  Lundh modestly failed to mention his upcoming book on
> 'The Python Library'.  There's an ebook version of his 'eff-bot' Guide,
> which is the  first place I look for code  examples.

It looks very interesting but what is this webnudge module you are using? I
can't find it anywhere...

Thank you,

--
Romuald Texier



Tue, 16 Sep 2003 16:11:12 GMT  
 htmllib samples

    [snip -- moved here an effbot quote that Dave has afterwards]:

Quote:
> >note that when you think you need the htmllib module, you
> >usually want the sgmllib module:
    [snip]
> Ok, i'll bite - why would you want to use the significantly greater
> complexity of SGML if you think you want to do HTML?

sgmllib only covers a small subset of SGML, so you do NOT
incur any "significantly greater complexity" whatsoever.

Alex



Tue, 16 Sep 2003 20:44:53 GMT  
 htmllib samples

Quote:
> Hi,

> has anyone got some samples or point me to where to find them on how to
> implement this module?

http://diveintopython.org/dialect_divein.html

--
-M
You're smart; why haven't you learned Python yet?
http://diveintopython.org/



Wed, 17 Sep 2003 03:40:27 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Bug in LV6 AI sample channel, sample channels?

2. AI Sample Channel VS AI Sample Channels??

3. Sample with an external clock with a SCXI 1140 (8 channel sample/hold amplifier)

4. using htmllib to lightly process html?

5. htmllib & do_meta question

6. Parsing complex web pages safely with htmllib.HTMLParser

7. Documentation/Examples about the htmllib?

8. successor to htmllib

9. htmllib - how to use it?

10. Trouble with htmllib.HTMLParser

11. Would anyone show me how to use htmllib?

12. Q: about htmllib.HTMLParser

 

 
Powered by phpBB® Forum Software