Bug in HTML::Parser or HTML::Tree
[I tried to send the bug report to the package author but it returned
with
Quote:
> online.no unable to deliver mail to the following recipient(s):
so - as the package is well known and presented in books like Cookbook,
I search for help here - maybe someone know the author or maybe someone
else know how to correct this]
it seems, that the HTML::Parser does not allow to place <p> inside table
cell. The parsing results are then wrong.
Example HTML file:
--------------------------------------------------------
<html>
<head>
<title>TEST</title>
</head>
<body>
<table>
<tr>
<td>
La la la
<p>
Le le le
<td>
</tr>
</table>
</body>
</html>
--------------------------------------------------------
Dump of the tree created from this file:
--------------------------------------------------------
<HTML>
" "
<HEAD>
" "
<TITLE>
"TEST"
" "
" "
<BODY>
" "
<P>
<TABLE>
" "
<TR>
" "
<TD>
" La la la "
<P>
" Le le le "
<TABLE>
<TD>
" "
" "
" "
--------------------------------------------------------
as you can see "Le le le" went to top level and then next table was
started.
The program I used to dump it:
--------------------------------------------------------
#!/usr/bin/perl
use HTML::TreeBuilder;
open FILE, "< $_";
my $t = new HTML::TreeBuilder;
$t->dump($_);
Quote:
}
--------------------------------------------------------
I noticed that behaviour in packages I found in Debian hamm
distribution, so I got the newest one from CPAN - they behave
incorrectly too. The newest versions I used are HTML-Parser-2.22 and
HTML-Tree-0.51.
-- Marcin Kasperski Marcin.Kasperski<at>softax.com.pl
-- marckasp<at>friko6.onet.pl
-- Moje pogldy s moimi pogldami, nikogo poza mn nie reprezentuj.
-- (My opinions are just my opinions.)