HtmlPrag: Pragmatic Parsing of HTML to SXML 
Author Message
 HtmlPrag: Pragmatic Parsing of HTML to SXML

HtmlPrag is a permissive HTML parser that emits SXML.  The idea being
that SXML tools by Oleg Kiselyov, Kirill Lisovsky, and others can be
used for processing the oft-erroneous HTML of real-world Web pages.

HtmlPrag is currently tested with 7 Scheme implementations; let me know
if your favorite one is not on the list.  The license is LGPL.

    http://www.*-*-*.com/

Future versions will be announced to the "ssax-sxml" email list.

--
                                              http://www.*-*-*.com/



Fri, 12 Aug 2005 08:59:24 GMT  
 HtmlPrag: Pragmatic Parsing of HTML to SXML
I saw you using an inline document format, please where could I find
more information on that? Thank you!


Fri, 12 Aug 2005 14:55:05 GMT  
 HtmlPrag: Pragmatic Parsing of HTML to SXML

Quote:

> I saw you using an inline document format, please where could I find
> more information on that? Thank you!

Sorry, I should've included a note about the doc format...

The format is largely Texinfo, run through a preprocessor called
Funcelit that adds some new constructs and inserts some boilerplate that
causes docs to be typeset as articles rather than books (and HTML
one-pagers, rather than HTML multi-pagers).  Funcelit was kludged up as
one of my first programs in Guile, for which Texinfo was particularly
appropriate.  I'm currently only using Funcelit as a stopgap measure til
I find or make a better literate programming format (probably one that
uses XML as an intermediate representation, but not XML syntax as a
source format).  However, I will try to clean up and release Funcelit,
just in case anyone's curious.

--
                                             http://www.neilvandyke.org/



Fri, 12 Aug 2005 16:25:39 GMT  
 HtmlPrag: Pragmatic Parsing of HTML to SXML

Quote:
> [...] I'm currently only using Funcelit as a stopgap measure til I find or
> make a better literate programming format (probably one that uses XML as
> an intermediate representation, but not XML syntax as a source format).

Have you seen Mole from Kirill Lisovsky?
http://www196.pair.com/lisovsky/scheme/lit/index.html

I'd quite like a texinfo backend for that, but it's not itchy enough for me
to do yet.

MJR



Fri, 12 Aug 2005 22:14:05 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. Interesting read: Pragmatic Parsing in Common Lisp

2. Pragmatic parsing macros

3. Pragmatic parsing macros (aka. META)

4. HtmlPrag 0.6

5. how to construct OLE interface classes OR html parsing

6. HTML parse tree

7. Comments, Please: a script to parse plain text lyrics to HTML

8. awk scrips parsing html-table to plain text

9. how to parse HTML in VW

10. html parsing

11. HTML header parsing

12. Problem using Clarion String Methods to parse raw HTML

 

 
Powered by phpBB® Forum Software