Getting the contents of a URL. 
Author Message
 Getting the contents of a URL.

: What he wants to do is have a script that when supplied with a URL will
: return the contents of the page in text format.

Ultra-easy.  Get libwww5 from a CPAN near you, or try:
http://www.*-*-*.com/

You'll have to read the excellent docs to get the most out of LWP.  :-)

--
Nathan V. Patwardhan



Mon, 20 Sep 1999 03:00:00 GMT  
 Getting the contents of a URL.

Quote:

>Hi,

>In my efforts to convert a friend to Perl ( he is a bit obsessed with
>Java )
>I am trying to find out how to do something. Having never used the
>HTML/WWW part of Perl I am not sure where to begin.

>What he wants to do is have a script that when supplied with a URL will
>return the contents of the page in text format.

>I am going to try and look into the HTML/WWW side of Perl but if any one
>has any suggestions I would appreciate it, even if it is only to tell me
>it is
>not really very easy to do.

Take a look at the LWP module. It'll return the resource found at a
URL. If you want it in plain text rather than HTML, look for Tom C's
striphtml script.

--
Jason C. Bodnar

Internet Programmer
Cox Interactive Media



Mon, 20 Sep 1999 03:00:00 GMT  
 Getting the contents of a URL.

 [courtesy cc of this posting sent to cited author via email]


:What he wants to do is have a script that when supplied with a URL will
:return the contents of the page in text format.

From the perlfaq, part 9.  See http://www.perl.com/perl/faq/index.html for
details and other cool informmation that will help you in your journey.

  How do I fetch an HTML file?

    Use the LWP::Simple module available from CPAN, part of the
    excellent libwww-perl (LWP) package. On the other hand, and if
    you have the lynx text-based HTML browser installed on your
    system, this isn't too bad:

        $html_code = `lynx -source $url`;
        $text_data = `lynx -dump $url`;

The FAQ doesn't say this (but should: Gnat, please add a rendition of
the following to part9)), but all you have to do is something like this,
once you have the LWP suite of modules:

    perl -MLWP::Simple -e 'getprint "http://www.sn.no/libwww-perl/"'

One advantage to this approach is that it works through a proxy, assuming
everything is set up for that.  See lwpcook(1) and friends for deatils.

Hm.. that just gets it as HTML.  If you want it as text, you'll have
to do something more like this:

  perl -MHTML::Parse -MLWP::Simple -MHTML::FormatText
        -e 'print HTML::FormatText->new->format(parse_html(get("http://perl.com/")))'

Or as a program:

    #!/usr/bin/perl -w
    use LWP::Simple;
    use HTML::Parse;
    use HTML::FormatText;
    print HTML::FormatText->new->format(parse_html(get("http://perl.com/")));

Error checking and argument parameterization have been left as an exercise
for the reader.

--tom
--

    /* we have tried to make this normal case as abnormal as possible */
            --Larry Wall in cmd.c from the perl source code



Mon, 20 Sep 1999 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. getting the content of a url...

2. Getting contents of a URL

3. Retrieving contents of a url

4. Retrieving contents of url

5. How to import URL contents!

6. Getting Content type of a file

7. Getting content of a remote site

8. getting contents of a dir

9. Getting perl to execute after print content

10. getting web page content

11. Getting a web page content using perl

12. Help with getting file contents

 

 
Powered by phpBB® Forum Software