Parsing HTML with HTML::Tree 
Author Message
 Parsing HTML with HTML::Tree

Hi,

  I am trying to parsing the following HTML content:

-- first part

  <a href="/GeneralContent/MySearch.aspx?PagePrefix=IN&amp;

    "Chicago"

-- second part


    "Something here"

I am using HTML:Tree to parse the HTML and what I would like to do is
that whenever there isn't any  <a href=.....> segment as in the second
part of the HTML, I will print something else, such as "Error
occurred". Notice that both first and second parts of the HTML have
common text of "<td class="storyTitle">", which I use for search
criteria.

My problem is that I don't know what the following code will return
whenever <a href=...> is not found. I tried to test against "" or
undef, but doesn't seem to work.

The following is some of my code and it doesn't work as I wish.

use strict;
use LWP::Simple;
use HTML::Tree;

if ($td->attr('class') eq 'storyTitle')
{
  if (my $sym = $td->find('a'))
  {
    if ($sym->as_text() ne '')
    {
      print $sym->as_text() . "\n";
    }
    else
    {
      print "Error Occurred" . "\n";
    }
  }

Quote:
}



Fri, 17 Aug 2012 01:29:59 GMT  
 Parsing HTML with HTML::Tree

Quote:

> I am using HTML:Tree to parse the HTML and what I would like to do is
> that whenever there isn't any  <a href=.....> segment as in the second
> part of the HTML, I will print something else, such as "Error
> occurred".
> My problem is that I don't know what the following code will return
> whenever <a href=...> is not found.

You have a logic problem.

You have written:

   if ( found a <a> )
       # do something
   __END__

So your code cannot to anything if an <a> is not found.

Quote:
> if ($td->attr('class') eq 'storyTitle')
> {
>   if (my $sym = $td->find('a'))

If an <a> is not found then this if-condition is false and the program
is done, none of the code below here will be executed. So you want your
code to be structured something like this:

    if (my $sym = $td->find('a')) {
        print $sym->as_text(), "\n";
    }
    else {
        print "Error Occurred\n";
    }

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"



Fri, 17 Aug 2012 04:33:46 GMT  
 Parsing HTML with HTML::Tree
Tad,

  Thanks for your advice. You hit the nail on the head and it works
well now.

  Nick



Fri, 17 Aug 2012 16:37:42 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. HTML:Element/Parse/Tree question

2. HTML Parse Tree Tools in Perl

3. Including the comments into the parsing tree with HTML::Parser

4. Bug in HTML::Parser or HTML::Tree

5. HTML::Parse v. HTML::Parser

6. HTML::Parser: Parsing HTML tables and Frames???

7. Parsing HTML with HTML::TableExtract

8. HTML::Parse and HTML::FormatText??

9. Where is HTML/Parse.pm, all I can find is HTML::Parser

10. HTML::Parse v.s. HTML::Parser

11. Missing HTML::Parse and HTML::FormatText modules

12. Replacing HTML::Parse for stripping HTML tags?

 

 
Powered by phpBB® Forum Software