HTML Tidy Wrapper for Functional Developer 
Author Message
 HTML Tidy Wrapper for Functional Developer

I've uploaded to my site[1] a wrapper around the HTML Tidy COM
component[2].

I used the FD COM interface wizard and then added a couple of higher
level usage functions so clients don't need to use COM routines.

Basically it allows input of HTML as a string (or a file using the low
level COM wrapper) and returns the tidied string.

Useful when parsing HTML that has overlapping tags or is otherwise not
well formed. I pass it through HTML tidy then use an html or xml
parser on it.

Works something like:

  let page = get-web-page(#"http", "www.double.co.nz", 80, "/");
  let (tidied-page, warnings, errors) = html-tidy(page);
  if(warnings)
    format-out("\nWarnings:\n");
    for(warning in warnings)
      format-out("  %s\n", warning)
    end for;
  else
    format-out("No Warnings.");
  end if;

  if(errors)
    format-out("Errors:\n");
    for(error in errors)
      format-out("  %s\n", error)
    end for;
  else
    format-out("No Errors.");
  end if;

[1] http://www.*-*-*.com/
[2] http://www.*-*-*.com/

Chris.
--
http://www.*-*-*.com/



Wed, 09 Apr 2003 08:11:01 GMT  
 HTML Tidy Wrapper for Functional Developer
This is great! (if it works)
Klemens



Quote:
> I've uploaded to my site[1] a wrapper around the HTML Tidy COM
> component[2].

> I used the FD COM interface wizard and then added a couple of higher
> level usage functions so clients don't need to use COM routines.

> Basically it allows input of HTML as a string (or a file using the low
> level COM wrapper) and returns the tidied string.

> Useful when parsing HTML that has overlapping tags or is otherwise not
> well formed. I pass it through HTML tidy then use an html or xml
> parser on it.

> Works something like:

>   let page = get-web-page(#"http", "www.double.co.nz", 80, "/");
>   let (tidied-page, warnings, errors) = html-tidy(page);
>   if(warnings)
>     format-out("\nWarnings:\n");
>     for(warning in warnings)
>       format-out("  %s\n", warning)
>     end for;
>   else
>     format-out("No Warnings.");
>   end if;

>   if(errors)
>     format-out("Errors:\n");
>     for(error in errors)
>       format-out("  %s\n", error)
>     end for;
>   else
>     format-out("No Errors.");
>   end if;

> [1] http://www.double.co.nz/dylan
> [2] http://perso.wanadoo.fr/ablavier/TidyCOM/index.html

> Chris.
> --
> http://www.double.co.nz/dylan

Sent via Deja.com http://www.deja.com/
Before you buy.


Fri, 02 May 2003 03:00:00 GMT  
 HTML Tidy Wrapper for Functional Developer

Quote:

> This is great! (if it works)

Seems to work fine in the projects where I've used it. It's being used
in a little free-ware application I wrote that a number of people have
downloaded and used without problems. Hope it works well for you!

Chris.
--
http://www.double.co.nz/dylan



Sat, 03 May 2003 02:37:42 GMT  
 HTML Tidy Wrapper for Functional Developer
I have problems with the compile of TidyCOM. I get lots of unresolved
external I guess because the tidy.lib cannot be found. Can this come
from the different configurations of Tidy and TidyCOM?
Chris, can you help with this?

I have another problem when I pass the resulting XML file to e.g. XML
notepad. He refuses to load because of duplicate attributes. Some
special & expressions like ü · © are not treated by
tidy (not by your code) and therefore cause errors in XML parsers.
Do you know by heart where to treat this?

Thanks in advance & Regards,
Klemens



Quote:
> I've uploaded to my site[1] a wrapper around the HTML Tidy COM
> component[2].

> I used the FD COM interface wizard and then added a couple of higher
> level usage functions so clients don't need to use COM routines.

> Basically it allows input of HTML as a string (or a file using the low
> level COM wrapper) and returns the tidied string.

> Useful when parsing HTML that has overlapping tags or is otherwise not
> well formed. I pass it through HTML tidy then use an html or xml
> parser on it.

> Works something like:

>   let page = get-web-page(#"http", "www.double.co.nz", 80, "/");
>   let (tidied-page, warnings, errors) = html-tidy(page);
>   if(warnings)
>     format-out("\nWarnings:\n");
>     for(warning in warnings)
>       format-out("  %s\n", warning)
>     end for;
>   else
>     format-out("No Warnings.");
>   end if;

>   if(errors)
>     format-out("Errors:\n");
>     for(error in errors)
>       format-out("  %s\n", error)
>     end for;
>   else
>     format-out("No Errors.");
>   end if;

> [1] http://www.double.co.nz/dylan
> [2] http://perso.wanadoo.fr/ablavier/TidyCOM/index.html

> Chris.
> --
> http://www.double.co.nz/dylan

Sent via Deja.com http://www.deja.com/
Before you buy.


Mon, 05 May 2003 03:00:00 GMT  
 HTML Tidy Wrapper for Functional Developer

Quote:

> I have problems with the compile of TidyCOM. I get lots of
> unresolved external I guess because the tidy.lib cannot be
> found. Can this come from the different configurations of Tidy and
> TidyCOM?

As far as I know you don't need the tidy.lib. All I did on a clean
machine to test the library was put the TidyCOM.zip files in a
directory, run regsvr32.exe on the TidyCOM.DLL and then built the
library in FD.

Have you got the COM libraries for FD available from the fun-o
website? I'm also using the MS linker if that makes a difference. If
this still doesn't work can you send me the log file of the link stage
or a list of the FD errors and I'll see what I can find.

Quote:
> I have another problem when I pass the resulting XML file to
> e.g. XML notepad. He refuses to load because of duplicate
> attributes. Some special & expressions like ü · ©
> are not treated by tidy (not by your code) and therefore cause
> errors in XML parsers.  Do you know by heart where to treat this?

My tidycom dylan library only converts the HTML to well formed HTML,
not XML or XHTML. Things like fixing overlapping tags. I then parse
the code using a Dylan HTML parser. To convert to XML you need to use
some of the TidyCOM options. This involves doing something like:

  let s = "<html>...</html>"
  let tidy := make(<ITidyObject>, class-id: $TidyObject-class-id);
  ITidyOptions/OutputXML(tidy.ITidyObject/Options) := #t;
  let result = bstr-to-byte-string(ITidyObject/TidyMemToMem(tidy, s));

See the implementation of my html-tidy() method for more
details. Eventually I plan to extend the dylan wrapper so it has an
easy way to set the various HTML tidy options like XML output, etc.

Hope that helps,
Chris.
--
http://www.double.co.nz/dylan



Tue, 06 May 2003 02:47:06 GMT  
 HTML Tidy Wrapper for Functional Developer


Quote:

> > I have problems with the compile of TidyCOM. I get lots of
> > unresolved external I guess because the tidy.lib cannot be
> > found. Can this come from the different configurations of Tidy and
> > TidyCOM?

> As far as I know you don't need the tidy.lib. All I did on a clean
> machine to test the library was put the TidyCOM.zip files in a
> directory, run regsvr32.exe on the TidyCOM.DLL and then built the
> library in FD.

> Have you got the COM libraries for FD available from the fun-o
> website? I'm also using the MS linker if that makes a difference. If
> this still doesn't work can you send me the log file of the link stage
> or a list of the FD errors and I'll see what I can find.

> > I have another problem when I pass the resulting XML file to
> > e.g. XML notepad. He refuses to load because of duplicate
> > attributes. Some special & expressions like &uuml; &middot; &copy;
> > are not treated by tidy (not by your code) and therefore cause
> > errors in XML parsers.  Do you know by heart where to treat this?

> My tidycom dylan library only converts the HTML to well formed HTML,
> not XML or XHTML. Things like fixing overlapping tags. I then parse
> the code using a Dylan HTML parser. To convert to XML you need to use
> some of the TidyCOM options. This involves doing something like:

>   let s = "<html>...</html>"
>   let tidy := make(<ITidyObject>, class-id: $TidyObject-class-id);
>   ITidyOptions/OutputXML(tidy.ITidyObject/Options) := #t;
>   let result = bstr-to-byte-string(ITidyObject/TidyMemToMem(tidy, s));

> See the implementation of my html-tidy() method for more
> details. Eventually I plan to extend the dylan wrapper so it has an
> easy way to set the various HTML tidy options like XML output, etc.

> Hope that helps,
> Chris.
> --
> http://www.double.co.nz/dylan

I compile using Microsoft C++. What do I have to consider here to get
the compile thru?

Regards,
Klemens

Sent via Deja.com http://www.deja.com/
Before you buy.



Thu, 22 May 2003 03:00:00 GMT  
 HTML Tidy Wrapper for Functional Developer

Quote:

> I compile using Microsoft C++. What do I have to consider here to
> get the compile thru?

I don't understand what you are doing. There is no C++ code with the
HTML tidy wrapper for Functional Developer that I provide. I authored
the Dylan code, not the HTML Tidy code itself - that is available at
the original web site mentioned at my site. You should contact the
author of HTML tidy if you are having problems rebuilding that.

The wrapper I provide requires Functional Developer 2.0 from
http://www.functionalobjects.com to compile.

Chris.
--
http://www.double.co.nz/dylan



Fri, 23 May 2003 03:55:43 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. HTML Tidy and Python wrapper

2. : A simple wrapper for embeeding Python in HTML

3. Generic db wrapper and html application generator.

4. Functional Developer for Linux [Alpha 2] now available

5. Functional Developer tip

6. Functional Developer UNICODE support + source?

7. Interpreting error messages in Functional Developer

8. Status of unix port of Functional Developer

9. big nums for functional developer

10. Functional Developer in Byte

11. Functional Developer 2.0 is Now Available!

12. Functional Developer as a product name

 

 
Powered by phpBB® Forum Software