
Extracting Hyperlink Title
Quote:
> I'm trying to find a way of extracting the descriptive text between
> the <A> and </A> anchors of a hyperlink in a page of HTML.
Alec,
extract_links actually returns a reference to an array of arrays.
The inner array is [ URL, reference to an HTML::Element object ].
In weird cases where the description of an anchor tag contains
tags (example: <a href="..."><b>SecurID</b></a>), one needs
to traverse the tree until a scalar is found.
This type of detail should be available in the second edition
of Web Client Programming.
Regards,
Clinton
------modified version of your code below-------------------------
#!/usr/local/bin/perl -w
use LWP::Simple;
use HTML::Parse;
use HTML::Element;
use URI::URL;
use strict;
my $page = 'http://www.perl.com/';
my $html = get $page;
my $parsed_html = HTML::Parse::parse_html($html);
print "Content-type: text/html\n\n";
my $link =$_->[0];
my $description = $_->[1];
do {
$description = $description->content->[0];
} until (! ref $description);
# if there isn't plain text between the anchor tags, use empty string.
if (! defined $description) { $description='' }
my $url = new URI::URL $link;
my $full_url = $url->abs($page);
print "<A HREF=\"$full_url\">$description</A>\n\n";
Quote:
}