How to import URL contents! 
Author Message
 How to import URL contents!

Hello helpful people!,

I am trying to find out how to import the contents of a URL from another
web site of mine and parse it into a data structure such as an array or
hash or even copy it into a file in my server so that I can access the
contents and modify them.

Has anyone come accross this before?.

I'd appreciate any help on this. I have 4 web sites and am trying to
create a 5th that would serve as a portal of the other 4. For this, I
want to be able to run a script and have my 5th web site create a page
on the fly that would collect the headlines from the other 4 sites and
format them correctly. Then the new page would like the headlines to the
original web site they came from... HELP!!!>

Jose' L.



Wed, 18 Jun 1902 08:00:00 GMT  
 How to import URL contents!

Quote:

> Hello helpful people!,

I guess you haven't heard what most people say about this group.

Quote:
> I am trying to find out how to import the contents of a URL from another
> web site of mine and parse it into a data structure such as an array or
> hash or even copy it into a file in my server so that I can access the
> contents and modify them.

You may want to start with the LWP::Simple or LWP::UserAgent modules.

If you need to parse the returned data, there are several parsers in
Perl.  You might want to check out the HTML::Parse and HTML::Parser
modules first, even though neither really groks HTML.

Quote:
> Has anyone come accross this before?.

When you are searching for a potential module, go to CPAN and look.
Try this URL:
    search.cpan.org

Quote:
> I'd appreciate any help on this. I have 4 web sites and am trying to
> create a 5th that would serve as a portal of the other 4. For this, I
> want to be able to run a script and have my 5th web site create a page
> on the fly that would collect the headlines from the other 4 sites and
> format them correctly. Then the new page would like the headlines to the
> original web site they came from... HELP!!!>

It sounds like the CGI.pm module might help you with this.

David
--

Senior Computing Specialist
mathematical statistician



Wed, 18 Jun 1902 08:00:00 GMT  
 How to import URL contents!

MCMXCIII in <URL::">
~~ Hello helpful people!,
~~
~~ I am trying to find out how to import the contents of a URL from another
~~ web site of mine and parse it into a data structure such as an array or
~~ hash or even copy it into a file in my server so that I can access the
~~ contents and modify them.
~~
~~ Has anyone come accross this before?.

    use LWP;

or

    use LWP::Simple;

Abigail
--




  -----------== Posted via Newsfeeds.Com, Uncensored Usenet News ==----------
   http://www.newsfeeds.com       The Largest Usenet Servers in the World!
------== Over 73,000 Newsgroups - Including  Dedicated  Binaries Servers ==-----



Wed, 18 Jun 1902 08:00:00 GMT  
 How to import URL contents!

Quote:

> If you need to parse the returned data, there are several parsers in
> Perl.  You might want to check out the HTML::Parse and HTML::Parser
> modules first, even though neither really groks HTML.

Please tell me what HTML::Parser fails to grok?  I might be able to
fix it for you.

--
Gisle Aas



Wed, 18 Jun 1902 08:00:00 GMT  
 How to import URL contents!

Quote:


> > If you need to parse the returned data, there are several parsers in
> > Perl.  You might want to check out the HTML::Parse and HTML::Parser
> > modules first, even though neither really groks HTML.

> Please tell me what HTML::Parser fails to grok?  I might be able to
> fix it for you.

Don't fix anything.  I like it the way it is.  My point, as poorly
worded as it is, was more along the lines of HTML::Parser not being
strictly a parser of HTML as it is coded.  But as you say, it is not
quite up to being renamed SGML::Parser.

If you construed my original comments as criticism, I apologize.

David
--

Senior Computing Specialist
mathematical statistician



Wed, 18 Jun 1902 08:00:00 GMT  
 How to import URL contents!

Quote:



> > > If you need to parse the returned data, there are several parsers in
> > > Perl.  You might want to check out the HTML::Parse and HTML::Parser
> > > modules first, even though neither really groks HTML.

> > Please tell me what HTML::Parser fails to grok?  I might be able to
> > fix it for you.

> Don't fix anything.  I like it the way it is.  My point, as poorly
> worded as it is, was more along the lines of HTML::Parser not being
> strictly a parser of HTML as it is coded.

But this has changed with HTML-Parser-3.xx.  I now consider it to be
strictly a parser of HTML.  The 2.xx releases had problems with CDATA
elements like <script>, <style>, <xmp> and would not accept as many
strange tag and attribute names as Netscape/MSIE did, but this is
fixed now.  We even support marked sections for those that think that
is cool.  And the new parser is much faster too.

The only real problem I know about with the new parser is that is
leaves <plaintext> mode when it sees </plaintext>.

Quote:
> If you construed my original comments as criticism, I apologize.

No offence taken.  I just got the impression that you were taking
about the state of affairs before the 3.xx series came out.

--
Gisle Aas



Wed, 18 Jun 1902 08:00:00 GMT  
 How to import URL contents!
[snip of my usual drivel]

Quote:
> No offence taken.  I just got the impression that you were taking
> about the state of affairs before the 3.xx series came out.

Uh-oh.  I was.  I was messing with the version I got from ActiveState.
Which is 2.23 .  The most recent version in the ActiveState repository
is 2.25, I believe.  I guess I'm going to have to do some CPANning...

David
--

Senior Computing Specialist
mathematical statistician



Wed, 18 Jun 1902 08:00:00 GMT  
 How to import URL contents!

Quote:

> [snip of my usual drivel]
>> No offence taken.  I just got the impression that you were taking
>> about the state of affairs before the 3.xx series came out.

> Uh-oh.  I was.  I was messing with the version I got from ActiveState.
> Which is 2.23 .  The most recent version in the ActiveState repository
> is 2.25, I believe.  I guess I'm going to have to do some CPANning...

I will be trying an experiment for a while wherein I will post examples
using both versions - I have been trying out the 3.* thing for a week
now and am very e{*filter*}d about it (as far as I can get e{*filter*}d about
anything except for beer ... )

/J\
--

< http://www.*-*-*.com/ >
** Uri Guttman - Have You CPANed Backward.pm Yet ? **



Wed, 18 Jun 1902 08:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Getting contents of a URL

2. Retrieving contents of a url

3. Retrieving contents of url

4. getting the content of a url...

5. Getting the contents of a URL.

6. Mod_perl - reading POST-content when Content-length is missing

7. trasform content txt in to content xml

8. URL->Text->URL

9. convert escape url to unescaped url

10. URL - How can I tell from within PERL if a URL is Up

11. Why doesn't use base call import?

12. importing - perhaps basic

 

 
Powered by phpBB® Forum Software