Bug in HTML::Parser or HTML::Tree 
Author Message
 Bug in HTML::Parser or HTML::Tree

[I tried to send the bug report to the package author but it returned
with

Quote:
> online.no unable to deliver mail to the following recipient(s):



so - as the package is well known and presented in books like Cookbook,
I search for help here - maybe someone know the author or maybe someone
else know how to correct this]

it seems, that the HTML::Parser does not allow to place <p> inside table
cell. The parsing results are then wrong.

Example HTML file:
--------------------------------------------------------
<html>
  <head>
    <title>TEST</title>
  </head>

  <body>

  <table>
     <tr>
        <td>
          La la la
          <p>
            Le le le
        <td>
     </tr>
  </table>

  </body>
</html>
--------------------------------------------------------

Dump of the tree created from this file:
--------------------------------------------------------
<HTML>
  "  "
  <HEAD>
    " "
    <TITLE>
      "TEST"
    " "
  " "
  <BODY>
    " "
    <P>
      <TABLE>
        " "
        <TR>
          " "
          <TD>
            " La la la "
    <P>
      " Le le le "
      <TABLE>
        <TD>
          "  "
      " "
  " "            
--------------------------------------------------------

as you can see "Le le le" went to top level and then next table was
started.

The program I used to dump it:
--------------------------------------------------------
#!/usr/bin/perl

use HTML::TreeBuilder;


  open FILE, "< $_";

  my $t = new HTML::TreeBuilder;

  $t->dump($_);

Quote:
}

--------------------------------------------------------

I noticed that behaviour in packages I found in Debian hamm
distribution, so I got the newest one from CPAN - they behave
incorrectly too. The newest versions I used are HTML-Parser-2.22 and
HTML-Tree-0.51.

-- Marcin Kasperski     Marcin.Kasperski<at>softax.com.pl
--                      marckasp<at>friko6.onet.pl
-- Moje pogldy s moimi pogldami, nikogo poza mn nie reprezentuj.
-- (My opinions are just my opinions.)



Mon, 15 Oct 2001 03:00:00 GMT  
 Bug in HTML::Parser or HTML::Tree

4/29/99, Marcin Kasperski got a bad parse for:

        <html>
          <head>
            <title>TEST</title>
          </head>

          <body>

          <table>
             <tr>
                <td>
                  La la la
                  <p>
                    Le le le
                <td>
             </tr>
          </table>

          </body>
        </html>

Dear Martin,

I'm not familiar with HTML::TreeBuilder, nor with ->dump, but I ran your
example through HTML::Parser, using my own dump script, and got a correct
parse.  Your HTML::TreeBuilder output (below) also lacked </head>,
</title>, and a few other things.  It would appear that HTML::Parser is a
more solid parser than is TreeBuilder.

So far, I've found Parser to handle straight html satisfactorily, but it
can't handle all the <script> data that exists out there and which Netscape
and Explorer are able to handle.

To really keep up with the evolving frontiers of html usage would take an
ongoing development effort devoted to Parser, and perhaps liaison with
vendors of scripting languages.  It's not so much a matter of "bugs", but
more a matter of keeping up with new stuff.

best regards,
rkm

----------------------------------------------------------------------------

Quote:
><HTML>
>  "  "
>  <HEAD>
>    " "
>    <TITLE>
>      "TEST"
>    " "
>  " "
>  <BODY>
>    " "
>    <P>
>      <TABLE>
>        " "
>        <TR>
>          " "
>          <TD>
>            " La la la "
>    <P>
>      " Le le le "
>      <TABLE>
>        <TD>
>          "  "
>      " "
>  " "
>--------------------------------------------------------



Mon, 15 Oct 2001 03:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. HTML Parsers/ HTML::Parser

2. Including the comments into the parsing tree with HTML::Parser

3. HTML::Element/Parser/Tree help

4. Parsing HTML with HTML::Tree

5. Two bugs in HTML-Tree modules?

6. HTML-Tree bug

7. HTML::Parse v. HTML::Parser

8. HTML::Parser: Parsing HTML tables and Frames???

9. HTML::TokeParser vs HTML::Parser subclass?

10. HTML::Parser - HTML::TableExtract Help

11. Removing HTML tags with HTML::Parser - my code is bloated

12. Incompatible versions of HTML::Parser and HTML::TreeBuilder

 

 
Powered by phpBB® Forum Software