Partial URL Reads 
Author Message
 Partial URL Reads

Hi,
I'm trying to do a partial read from a URL, check for some values, and
move on to another URL. Since LWP::Simple's get and getstore functions
download the whole page, and it takes a lot of time, I need to find a
way to do just a partial read from a URL instead of wasting time
dowloading complete contents of the page, becase the info I'm looking
for is in the first part of the page. For example, I just need to read
the first 100 bytes from a url file and store the contents in the
variable, and then search the contents. I tried to write it something
like this, but it doesn't work.

$myvalue = "something";
$url = " http://www.*-*-*.com/ ";
open(URLFILE, "<$url")|| warn "unable to open $url";
read (URLFILE, $contents, 100)|| warn "unable to read ";
if ($contents =~/$myvalue/)
{print("got it!");}

I searched tons of info about this, but couldn't find anything that
would do that. I appreciate any input on this.
Thanks,
Pavel.



Sun, 04 Jul 2004 18:41:48 GMT  
 Partial URL Reads

Quote:

> Hi,
> I'm trying to do a partial read from a URL, check for some values, and
> move on to another URL. Since LWP::Simple's get and getstore functions
> download the whole page, and it takes a lot of time, I need to find a
> way to do just a partial read from a URL instead of wasting time
> dowloading complete contents of the page, becase the info I'm looking
> for is in the first part of the page. For example, I just need to read
> the first 100 bytes from a url file and store the contents in the
> variable, and then search the contents. I tried to write it something
> like this, but it doesn't work.

> $myvalue = "something";
> $url = "http://www.perl.com";
> open(URLFILE, "<$url")|| warn "unable to open $url";
> read (URLFILE, $contents, 100)|| warn "unable to read ";
> if ($contents =~/$myvalue/)
> {print("got it!");}

what you mean with a "url" file?
give an example of what is it contents ....


Sun, 04 Jul 2004 21:51:05 GMT  
 Partial URL Reads

Quote:


> > Hi,
> > I'm trying to do a partial read from a URL, check for some values, and
> > move on to another URL. Since LWP::Simple's get and getstore functions
> > download the whole page, and it takes a lot of time, I need to find a
> > way to do just a partial read from a URL instead of wasting time
> > dowloading complete contents of the page, becase the info I'm looking
> > for is in the first part of the page. For example, I just need to read
> > the first 100 bytes from a url file and store the contents in the
> > variable, and then search the contents. I tried to write it something
> > like this, but it doesn't work.

> > $myvalue = "something";
> > $url = "http://www.perl.com";
> > open(URLFILE, "<$url")|| warn "unable to open $url";
> > read (URLFILE, $contents, 100)|| warn "unable to read ";
> > if ($contents =~/$myvalue/)
> > {print("got it!");}

> what you mean with a "url" file?
> give an example of what is it contents ....

Thanks for your interest Viktor!
url contains a link to an html file. I want to read the first 4096
bytes from that file into $contents and then search for the
information I am looking for in the $contents. This will speed up the
process if I need for example search hundreds of pages. Now I'm trying
something like this (it will be included in the for loop):

$url = "http://www.perl.com"
$ua = LWP::UserAgent->new();
$request = HTTP::Request->new('GET', $url);    
$response = $ua->request($request, \&callback, 5472);

sub callback{

if ($data =~/title/){#title is found in the first chunk
   #capture the url
   print("found");
   die();#stop downloading
   }#end if

   #title was not found in the first chunk
   die(); #stop downloading, no reason to waste the time anyway

Quote:
}#end callback

But this doesn't allow me to increase the chunk size for some reason.
whether I put 5472 or 8000000, when I check the contents of $data in
the first run, it has the same amount of text. What might I be doing
wrong here? Is it an issue with timeouts or something and how to set
it up?
Any help is appreciated.
Thanks,
Pavel.


Mon, 05 Jul 2004 06:31:37 GMT  
 Partial URL Reads
[snip]

Quote:
> > what you mean with a "url" file?
> > give an example of what is it contents ....

> Thanks for your interest Viktor!
> url contains a link to an html file. I want to read the first
> 4096 bytes from that file into $contents and then search for the
> information I am looking for in the $contents. This will speed up the
> process if I need for example search hundreds of pages. Now I'm trying
> something like this (it will be included in the for loop):

> $url = "http://www.perl.com"
> $ua = LWP::UserAgent->new();
> $request = HTTP::Request->new('GET', $url);
> $response = $ua->request($request, \&callback, 5472);

> sub callback{

> if ($data =~/title/){#title is found in the first chunk
>    #capture the url
>    print("found");
>    die();#stop downloading
>    }#end if

>    #title was not found in the first chunk
>    die(); #stop downloading, no reason to waste the time anyway
> }#end callback

> But this doesn't allow me to increase the chunk size for some reason.
> whether I put 5472 or 8000000, when I check the contents of $data in
> the first run, it has the same amount of text. What might I be doing
> wrong here? Is it an issue with timeouts or something and how to set
> it up?

No, it's not timeouts.  It's because the chunk size is a maximum amount
you are willing to process, not a minimum.  What you need to do is keep
a counter of how many bytes you've read.  Eg:

use LWP::UserAgent;
my $ua = LWP::UserAgent->new
# use LWP::Parallel::UserAgent qw(:CALLBACK);
# my $ua = LWP::Parallel::UserAgent->new;
while( <FILE_OF_URLS> ) {
   chomp(my $uri = $_);
   my $got = 0;
   $ua->request( HTTP::Request->new( GET => $uri ),
   sub {

      print("Found title in $uri\n"), die
      # print("Found title in $uri\n"), return C_ENDCON
         if /title/;
      $got += length;
      print("Didn't find title in $uri\n"), die
      # print("Didn't find title in $uri\n"), return C_ENDCON
         if $got >= 4096;
   }, 4096 );

Quote:
}

# $ua->wait();

Using the LWP::Parallel::UserAgent (available from CPAN) will result in
much faster performance.  The comments should make obvious the changes
you'd need to convert from LWP::UserAgent to LWP::Parallel::UserAgent.

--
DATA COMPRESSION: What you get when you squish an android



Mon, 05 Jul 2004 07:48:51 GMT  
 Partial URL Reads
daf

thankx

---------------------------------

Quote:

>Hi,
>I'm trying to do a partial read from a URL, check for some values, and
>move on to another URL. Since LWP::Simple's get and getstore functions
>download the whole page, and it takes a lot of time, I need to find a
>way to do just a partial read from a URL instead of wasting time
>dowloading complete contents of the page, becase the info I'm looking
>for is in the first part of the page. For example, I just need to read
>the first 100 bytes from a url file and store the contents in the
>variable, and then search the contents. I tried to write it something
>like this, but it doesn't work.

>$myvalue = "something";
>$url = "http://www.perl.com";
>open(URLFILE, "<$url")|| warn "unable to open $url";
>read (URLFILE, $contents, 100)|| warn "unable to read ";
>if ($contents =~/$myvalue/)
>{print("got it!");}

>I searched tons of info about this, but couldn't find anything that
>would do that. I appreciate any input on this.
>Thanks,
>Pavel.



Mon, 05 Jul 2004 17:04:10 GMT  
 Partial URL Reads
Thanks a lot Ben!
It works great!

~Pavel.



Thu, 08 Jul 2004 00:01:14 GMT  
 Partial URL Reads
I left out a bit...

[snip]

Quote:
> use LWP::UserAgent;
> my $ua = LWP::UserAgent->new
> # use LWP::Parallel::UserAgent qw(:CALLBACK);
> # my $ua = LWP::Parallel::UserAgent->new;
> while( <FILE_OF_URLS> ) {
>    chomp(my $uri = $_);
>    my $got = 0;
>    $ua->request( HTTP::Request->new( GET => $uri ),

If you are writing this with LWP::Parallel::UserAgent, it should be
$ua->register, not $ua->request.  I should have included a comment
saying that, but forgot.

[snip]

Quote:
> Using the LWP::Parallel::UserAgent (available from CPAN) will result
> in much faster performance.  The comments should make obvious the
> changes you'd need to convert from LWP::UserAgent to
> LWP::Parallel::UserAgent.

--
Found on a door in the MSU music building:
This door is baroquen, please wiggle Handel.
(If I wiggle Handel, will it wiggle Bach?)


Sat, 10 Jul 2004 22:19:21 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. File marker position on partial reads

2. read an html from a url

3. read url package?

4. reading current url from buffer ?

5. READS LOCATION WINDOW & POINTS TO NEW URL

6. getting a URL and Reading across the web

7. Skipping within a line to read URL?

8. Reading a text file from an URL

9. perl to read a given URL

10. Reading a dynamic file via a URL

11. convert partial-url's (in emails?) to .html (for lynx, ...)

12. partial buffered writes

 

 
Powered by phpBB® Forum Software