Check if a URL exists 
Author Message
 Check if a URL exists

You could use webget to do the URL verification - here's a little perl script
that reads a file full of URLs, fetches their title tag and uses it.

Frankly, I would love to find a way to speed things up - so if someone
knows of better ways to do this, I would love to know.

#!/usr/perl5/bin/perl -w
#Script reads a file containing URLs one per line and generates an HTML
#page where the links are titled after their title script.  WARNING!
#a significant number of pages have weird or no titles.
#Authors: Larry W. Virden and Chris Unger
#Copyright 1997

$| = 1;

print "<html>\n";
print "<title>My URLs</title>\n";
print "<body><ul>\n";

while (<>) {
   chop;
   $url = `webget -timeout 120 -q -nf -nr $_`;
        $url =~ s/\r\n|\r|\n/ /g;

        # $title[1] = $_ if ( $title[1] =~ /^$/ );
        $title[1] = $_ unless ( defined $title[1] && $title[1] );
        $title[1] =~ s/^\s+|\s+$//g;
   print "<LI><A HREF=\"" . $_ . "\">" . $title[1] . "</A>\n" if ( $title[1] =~ /[^\s]+/ );

Quote:
}

print "</ul></body>\n";
print "</html>\n";

#       In theory, a HTML page should consist of:
#       ...misc text...<title>a title sequence</title>...other text...
#       This means that if we split on title and /title, the 2nd entry
#       in the array should contain the title info.  Unfortunately, poorly
#       written html won't have all that we need.

        # $url =~ s#.*(<title>\s*([^<]*)\s*</title>)+?.*#$2#i;
        #$url =~ s/(.*)\s+/$1/;
   #print "<LI><A HREF=\"" . $_ . "\">" . $url . "</A>\n" if ( $url =~ /[^\s]+/ );
#  $url = `/projects/xopsrc/Tclsrc/v7/bin/timed-run 120 url_get $_`;
--

<URL: http://www.*-*-*.com/ %7Elvirden/> <*> O- "We are all Kosh."
Unless explicitly stated to the contrary, nothing in this posting should
be construed as representing my employer's opinions.



Fri, 18 Feb 2000 03:00:00 GMT  
 Check if a URL exists


Quote:
> You could use webget to do the URL verification - here's a little perl
> script that reads a file full of URLs, fetches their title tag and uses
> it.

> Frankly, I would love to find a way to speed things up - so if someone
> knows of better ways to do this, I would love to know.

How about LWP?

--
Tom Phoenix           http://www.teleport.com/~rootbeer/

Randal Schwartz Case:  http://www.rahul.net/jeffrey/ovs/



Sat, 19 Feb 2000 03:00:00 GMT  
 Check if a URL exists

On Tue, 2 Sep 1997 10:07:07 -0700, in



-}> You could use webget to do the URL verification - here's a little perl
-}> script that reads a file full of URLs, fetches their title tag and uses
-}> it.
-} How about LWP?

And, what about temporarily unreachable destinations?  Although it SEEMS
that the net has improved in its throughput over the past few months
there are still occasional breakdowns, where a major node gets overloaded.

If you're validating links (e.g. extracted with the indexer in htmlchek)
then you may want several failures before you invalidate the link, if
it's a foreign link.  If it's a local link, however, to the script
checking the links, odds are that you want to invalidate a link
immediately if it's unreachable - depending on your server load.

     Bruce Gingery



Sun, 20 Feb 2000 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. Checking if a URL exists in Perl

2. Checking if URL is valid/exist using PERL

3. Question - how to check if an object method exists

4. Use CGI to check against existing .htaccess/.passwd files

5. Checking whether file exists

6. existing spell-check code??

7. Checking To See If File Exists

8. Checking a window exists?

9. checking for existing within a list

10. checking if file exists in Net::Telnet

11. Checking web page exists

12. Checking if a file exists

 

 
Powered by phpBB® Forum Software