Is it possible to automatically download from a site that insists on frames? 
Author Message
 Is it possible to automatically download from a site that insists on frames?

I'm sure this is a very common problem.  I am used to using snarf,
wget, and/or lynx to grab information from WWW sites.  However, I now need
to get stuff from a site that insists on your browser having frames.

Is there anyway to fake it out and have it work?  I've tried all 3 of the
mentioned tools and have used snarf's ability to tell the host that it is a
certain browser (I've tried both the "I'm Netscape" and "I'm Internet
Explorere" options, but neither works.)

It must be possible to do it, since obviously the GUI browsers are able to
navigate the terrain and get the data.  But I need to do it in Unix, from a
shell script.  Are there any commonly available tools to do this?



Fri, 31 May 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?

Quote:
> I'm sure this is a very common problem. I am used to using snarf,
> wget, and/or lynx to grab information from WWW sites. However, I now need
> to get stuff from a site that insists on your browser having frames.

> Is there anyway to fake it out and have it work? I've tried all 3 of the
> mentioned tools and have used snarf's ability to tell the host that it is a
> certain browser (I've tried both the "I'm Netscape" and "I'm Internet
> Explorere" options, but neither works.)

> It must be possible to do it, since obviously the GUI browsers are able to
> navigate the terrain and get the data. But I need to do it in Unix, from a
> shell script. Are there any commonly available tools to do this?


Quote:
> There is a general idea that the reason we have multiple newsgroups
> is so that we can discuss topics in newsgroups dedicated to the
> given topic. It is called "Staying on topic".

> I am fully aware that this is kind of an old-fashioned notion.
> And, for as long as there have been computer
> BBSs/newsgroups/chat-rooms/discussion-fora/whatever, there has
> been a tug-of-war between ppl who think it is a good idea to keep
> things in separate areas and those who think that all topics are
> acceptable in all fora. The later view quickly leads to the
> conclusion that there need be only one newsgroup, and begs the
> question as to why we have over 80,000 of them.

The answer to your question is: Use Perl.

--
Jim Monty

Tempe, Arizona USA



Fri, 31 May 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?

Quote:

> I'm sure this is a very common problem.  I am used to using snarf,
> wget, and/or lynx to grab information from WWW sites.  However, I now
need
> to get stuff from a site that insists on your browser having frames.

> Is there anyway to fake it out and have it work?  I've tried all 3 of
the
> mentioned tools and have used snarf's ability to tell the host that
it is a
> certain browser (I've tried both the "I'm Netscape" and "I'm Internet
> Explorere" options, but neither works.)

> It must be possible to do it, since obviously the GUI browsers are
able to
> navigate the terrain and get the data.  But I need to do it in Unix,
from a
> shell script.  Are there any commonly available tools to do this?

A commonly available tool is Perl. Now, Kenny I know you have sworn to
never look at this tool but anyway. With Perl comes the LWP modules and
they can be used to download just about any web site. Mr Clinton Wong
has written an entired book on how to use Perl in general and LWP in
particular for tasks like these. Web Client Programming (with Perl).
Published by O'Reilly & Associates. Many of the chapters are more
general and can be applied to any language where you can "talk" tcp/ip.
Check out the code snippets at:

ftp://ftp.ora.com/published/oreilly/nutshell/web-client/

For your particular task, maybe the hcat script in Chapter 6 is usable.

I'm sure Java could be a workable alternative as well.

Regards,
/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.deja.com/
Before you buy.



Sat, 01 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?

Quote:



>> I'm sure this is a very common problem.  I am used to using snarf,
>>wget, and/or lynx to grab information from WWW sites.  However, I now
>>need to get stuff from a site that insists on your browser having
>>frames.

>> Is there anyway to fake it out and have it work?  I've tried all 3
>>of the mentioned tools and have used snarf's ability to tell the host
>>that it is a certain browser (I've tried both the "I'm Netscape" and
>>"I'm Internet Explorer" options, but neither works.)

>> It must be possible to do it, since obviously the GUI browsers are
>>able to navigate the terrain and get the data.  But I need to do it in
>>Unix, from a shell script.  Are there any commonly available tools to
>>do this?
>A commonly available tool is Perl. Now, Kenny I know you have sworn to
>never look at this tool but anyway.

I don't mind using working, pre-compiled programs written in ugly languages
like C, Assembler, or Perl.  If they come off-the-shelf and working and do
what I need done, I'm fine with that.  In fact, I use pre-compiled stuff
written in C and Assembler every day; I suspect you do, too.  I just don't
enjoy doing recreational programming in any of these write-only languages.

Note also that I don't offer up programming solutions in any of these
ugly languages in an AWK newsgroup.  I consider that akin to suggesting, in
a newsgroup dedicated to the discussion of luxury automobiles, that someone
take the subway.  Mind you, nothing wrong with the subway, I ride it myself
when I'm in the city - and it is, as they say, "the cheapest, fastest,
dirtiest way to get anywhere".  The subway is also critical to the economic
underpinnings of society, as was shown by the recent, narrowly avoided,
threatened NYC transit strike.

Quote:
>With Perl comes the LWP modules and they can be used to download
>just about any web site. Mr Clinton Wong has written an entired book
>on how to use Perl in general and LWP in particular for tasks like
>these. Web Client Programming (with Perl).  Published by O'Reilly &
>Associates. Many of the chapters are more general and can be applied to
>any language where you can "talk" tcp/ip.  Check out the code snippets
>at:

>ftp://ftp.ora.com/published/oreilly/nutshell/web-client/

>For your particular task, maybe the hcat script in Chapter 6 is usable.

I tried it - downloaded it, edited out the configuration errors, etc - but
it doesn't work any better than lynx or wget.  Still says "You need a
browser with frames".  So, your solution doesn't meet the criteria mentioned
above (proposed solutions in ugly languages must work out-of-the-box).


Sun, 02 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?
I've been trying not to jump in, but I can't resist any
longer. Were you looking for a gawk 3.1 (network gawk)
solution? If not, why post this in comp.lang.awk?

* Sent from AltaVista http://www.altavista.com Where you can also find related Web Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful



Sun, 02 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?

I think you shouldn't snap at people trying to help you. In the long
run that's not productive.

Quote:
> >> I'm sure this is a very common problem.  I am used to using snarf,
> >>wget, and/or lynx to grab information from WWW sites.  However, I
now
> >>need to get stuff from a site that insists on your browser having
> >>frames.

> >> Is there anyway to fake it out and have it work?  I've tried all 3
> >>of the mentioned tools and have used snarf's ability to tell the
host
> >>that it is a certain browser (I've tried both the "I'm Netscape" and
> >>"I'm Internet Explorer" options, but neither works.)

> >> It must be possible to do it, since obviously the GUI browsers are
> >>able to navigate the terrain and get the data.  But I need to do it
in
> >>Unix, from a shell script.  Are there any commonly available tools
to
> >>do this?

> >A commonly available tool is Perl. Now, Kenny I know you have sworn
to
> >never look at this tool but anyway.

> I don't mind using working, pre-compiled programs written in ugly
languages
> like C, Assembler, or Perl.  If they come off-the-shelf and working
and do
> what I need done, I'm fine with that.  In fact, I use pre-compiled
stuff
> written in C and Assembler every day; I suspect you do, too.  I just
don't
> enjoy doing recreational programming in any of these write-only
languages.

> Note also that I don't offer up programming solutions in any of these
> ugly languages in an AWK newsgroup.  I consider that akin to
suggesting, in
> a newsgroup dedicated to the discussion of luxury automobiles, that
someone
> take the subway.  Mind you, nothing wrong with the subway, I ride it
myself
> when I'm in the city - and it is, as they say, "the cheapest, fastest,
> dirtiest way to get anywhere".  The subway is also critical to the
economic
> underpinnings of society, as was shown by the recent, narrowly
avoided,
> threatened NYC transit strike.

> >With Perl comes the LWP modules and they can be used to download
> >just about any web site. Mr Clinton Wong has written an entired book
> >on how to use Perl in general and LWP in particular for tasks like
> >these. Web Client Programming (with Perl).  Published by O'Reilly &
> >Associates. Many of the chapters are more general and can be applied
to
> >any language where you can "talk" tcp/ip.  Check out the code
snippets
> >at:

> >ftp://ftp.ora.com/published/oreilly/nutshell/web-client/

> >For your particular task, maybe the hcat script in Chapter 6 is
usable.

> I tried it - downloaded it, edited out the configuration errors, etc -
 but
> it doesn't work any better than lynx or wget.  Still says "You need a
> browser with frames".  So, your solution doesn't meet the criteria
mentioned
> above (proposed solutions in ugly languages must work out-of-the-box).

I said "maybe". If you wanted a guaranteed out-of-the-box solution you
shouldn't have tried something that had the label "maybe" on it. I
don't understand your anger Kenny. Honestly I just tried to be helpful.
Eventhough if I had followed your selectively applied "netiquette"
rules I should have ignored such an off topic question. Perl with LWP
is probably your best shot on solving your problem. It would require
though that you or someone else makes a script (or have made a script)
to address the problem.

Do you know what's going on on that site since it somehow rejects
certain user agents? Is it a JavaScript maybe? If so you could maybe
try view-source:url in Navigator to check the script and get a clue on
how to trick the site. Or maybe you could try to "listen" on the
conversation between the web server and the client? I don't really know
how to do that but I guess Perl with LWP can be put to such use and
maybe lynx or Navigator can as well. Maybe if you gave us the url to
the site someone might get curious enough about the problem to help you
solve it.

Also I would want to know what you concider is ugly about assembler, C
and Perl? When I look at them I see only pure beauty. I have seen some,
and even written some, {*filter*}ugly scripts in all those languages and
many languagues more. But the languages as such are without exception
beautiful! Full of joys and wonders as well.

Regards,
/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Mon, 03 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?


   >>> I'm sure this is a very common problem.  I am used to using
   >>>snarf, wget, and/or lynx to grab information from WWW sites.
   >>>However, I now need to get stuff from a site that insists on your
   >>>browser having frames.
 I don't know why this was crossposted to comp.lang.awk, but in lynx, if
a page uses frames, it may display a message saying "This page requires
frames, get a real browser, you loser!", but there will be one or more
links at the top of the page; those are the frames.  One of those links
should lead to the information you seek.  If that does not work you may
have to look at the source code of the page.  If a page requires Java,
you can read the source code and try to interpret it yourself.
Net-Tamer V 1.08X - Test Drive


Mon, 03 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?

Quote:
> I'm sure this is a very common problem.  I am used to using snarf,
> wget, and/or lynx to grab information from WWW sites.  However, I now need
> to get stuff from a site that insists on your browser having frames.

> Is there anyway to fake it out and have it work?  I've tried all 3 of the
> mentioned tools and have used snarf's ability to tell the host that it is
a
> certain browser (I've tried both the "I'm Netscape" and "I'm Internet
> Explorere" options, but neither works.)

Well that won't make a difference as the useragent is probably not what's
being looked at but the capability of the "browser" itself.  I must admit I
don't have a full working solution although I'm sure one's possible (if you
really want to get dirty you could download the Mozilla source and see how
they tell a server they support frames)...however if you're just intending
on using the same site(s) simply go to the site in a frames capable browser
first...grab the URIs in the FRAME SOURCE and go directly to those URIs in
Lynx/LWP or whatever you intend to use.  Most (checking if a user has
frames) is done in the URI containing the frameset so hopefully you'll
bypass this requirement using my method.

--
Sincerely,

Craig Vincent
Senior Webmaster/Programmer
Web Dream Inc.



Mon, 03 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?

Quote:
> I've been trying not to jump in, but I can't resist any
> longer. Were you looking for a gawk 3.1 (network gawk)
> solution? If not, why post this in comp.lang.awk?

Did you not take a look at the newsgroups posted to?  He did post to that
group.

--
Sincerely,

Craig Vincent
Senior Webmaster/Programmer
Web Dream Inc.



Mon, 03 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?

...

Quote:
>Well that won't make a difference as the useragent is probably not what's
>being looked at but the capability of the "browser" itself.  I must admit I
>don't have a full working solution although I'm sure one's possible (if you
>really want to get dirty you could download the Mozilla source and see how
>they tell a server they support frames)...however if you're just intending
>on using the same site(s) simply go to the site in a frames capable browser
>first...grab the URIs in the FRAME SOURCE and go directly to those URIs in
>Lynx/LWP or whatever you intend to use.  Most (checking if a user has
>frames) is done in the URI containing the frameset so hopefully you'll
>bypass this requirement using my method.

Thank you.  In fact, this turned out to be the solution (I figured it out
myself earlier today, using this method).  I can now snarf the page I'm
actually interested in w/o problems.

Still, I like to know a general solution - since, as you say, this only
solves the problem for a given site (actually, page).

And to the other poster, who suggested lynx - Yes, that had been my
experience with most other "Go get a real browser, you loser" pages as
well - that is, that you could still get there with lynx, even if the road
is a little bumpy.  But, that doesn't work with the page in question, which
just gives you a curt "You need a browser that supports frames" and exits.



Mon, 03 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?





%
% I think you shouldn't snap at people trying to help you. In the long
% run that's not productive.

I don't agree with this. I think everybody ought to have more tolerance
for people snapping at them. On the other hand:

% > I tried it - downloaded it, edited out the configuration errors, etc -
%  but
% > it doesn't work any better than lynx or wget.  Still says "You need a
% > browser with frames".  So, your solution doesn't meet the criteria
% mentioned
% > above (proposed solutions in ugly languages must work out-of-the-box).
%
% I said "maybe". If you wanted a guaranteed out-of-the-box solution you
% shouldn't have tried something that had the label "maybe" on it. I
% don't understand your anger Kenny. Honestly I just tried to be helpful.

First, the world would have to already be a better place because people
didn't take offense so easily. I didn't detect any anger in Kenny's
off-topic posting.

% rules I should have ignored such an off topic question. Perl with LWP
% is probably your best shot on solving your problem. It would require

It's not, because it doesn't work, and won't work without re-writing
the web support.

% Do you know what's going on on that site since it somehow rejects
% certain user agents?

Yes. It is a site which uses frames and doesn't provide an alternative.
This is a well-known issue. If you use frames and don't provide a non-frame
alternative, then web-crawling software can't get in to your site.

% Also I would want to know what you concider is ugly about assembler, C
% and Perl? When I look at them I see only pure beauty. I have seen some,

You can't talk about assembler in general terms. There is no such thing
as 'assembler', although there are things such as x86 assembler (but not
just one!), 68050 assember, system/360 assembler and so on. They're all
{*filter*}as languages. There are such things as perl (but not just one!),
and they're all {*filter*}languages. Even the language designer has been known
to admit that.
--

Patrick TJ McPhee
East York  Canada



Tue, 04 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?


Quote:








> %
> % rules I should have ignored such an off topic question. Perl with
LWP
> % is probably your best shot on solving your problem. It would require

> It's not, because it doesn't work, and won't work without re-writing
> the web support.

Oh, I didn't know you had tried Perl with LWP on the particular site.

Quote:
> % Do you know what's going on on that site since it somehow rejects
> % certain user agents?

> Yes. It is a site which uses frames and doesn't provide an
alternative.
> This is a well-known issue. If you use frames and don't provide a non-
frame
> alternative, then web-crawling software can't get in to your site.

That's plain and simple bullshit. Like Kenny noted so correctly; The
GUI webbrowsers can navigate the site, then so can a web-crawler. For
instance on my Palm V I have a web-crawling software that automaitcally
downloads websites I am interested in for later offline reading. It has
no problems whatsoever with frames. (The software is Avantgo.) The site
must take some action in order to not show the page. That action must
be based on some knowledge the site collects from the user agent. This
is very probably some information the user agent provides via the HTTP
headers. You have full control over these headers using the LWP library
of modules.

Quote:
> % Also I would want to know what you concider is ugly about
assembler, C
> % and Perl? When I look at them I see only pure beauty. I have seen
some,

> You can't talk about assembler in general terms. There is no such
thing
> as 'assembler', although there are things such as x86 assembler (but
not
> just one!), 68050 assember, system/360 assembler and so on. They're
all
> {*filter*}as languages. There are such things as perl (but not just one!),
> and they're all {*filter*}languages. Even the language designer has been
known
> to admit that.

I can too talk about assembler in general terms. {*filter*}languages? Now
that was a big improvement over "ugly" languages.

/Peter
--
-= Spam safe(?) e-mail address: pez68 at netscape.net =-

Sent via Deja.com http://www.*-*-*.com/
Before you buy.



Tue, 04 Jun 2002 03:00:00 GMT  
 Is it possible to automatically download from a site that insists on frames?
Quote:

> Thank you.  In fact, this turned out to be the solution (I figured it out
> myself earlier today, using this method).  I can now snarf the page I'm
> actually interested in w/o problems.

> Still, I like to know a general solution - since, as you say, this only
> solves the problem for a given site (actually, page).

    The general solution is "implement frame support" :)

    Ok, I'll explain it better.

    There is nothing magic that a browser sends to a server to tell it
supports
    frames.

    The trick is that the server sends a page with frames: a <FRAMESET> tag,
that
    is the container, and a set of <FRAME> tags, each one pointing to the web
page
    that will populate the frame ( SRC attribute, as in images ).

    If the page creator was nice enough to think about browsers that would not
    support frames, (s)he will add a <BODY> with more or less the same content
    as the frame version but all in one page.

    Most of the time the page creator only adds a line like
    "Get a better browser! You loser!" :)

    A browser that doesn't support frames (like one of the web suckers you
describe :)
    will ignore the <FRAMESET> and <FRAME> tags and only display the content
of <BODY>
    (the nice message that explains you what to do).

    So, the general solution is: modify one of those tools, the one written in
the
    language you like the most (or hate the least :) to recognize the <FRAME>
tags
    and also grab those URLs, because that's where the content you are looking
for
    is.

Hope this helps,

    Raffaele

----------------------------------------------------

        http://www.aromatic.org/



Sun, 09 Jun 2002 03:00:00 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. How to automatically download Cosmo Player when it is absent

2. Automatically resume a download w/ urllib?

3. f90: is it possible to initialize modules automatically ?

4. automatically create a list of widget types- Possible?

5. I am looking MRDEBUG download web

6. Possible to embed VRML without frames?

7. I am thinking of if this is possible

8. Possible?: Tk-only frame with scrollbars....

9. possible to show/hide frames???

10. NC Download Site for Cincom

11. problems with the download site or just busy?

12. Announcing VisualTestCoverage, Download it at IBM's AlphaWorks site

 

 
Powered by phpBB® Forum Software