Tracking down a memory-eating bug 
Author Message
 Tracking down a memory-eating bug

I posted this question to the LWP list, but have had no
response.  Complete working code example is listed there.

I've got a bug that is only showing up under FreeBSD, and I need
direction in how to track it down.

I have a spider program that works basically like:

spider( $url );
sub spider {
   my $doc = get( shift );
   my $links = extract_links( \$doc );


Quote:
}

Under FreeBSD 2.2.8, perl 5.6.1 and current LWP lib, if spider is passed
a URI object (and extract_links() returns a ref to a list of URI
objects) memory climbs quickly.  This does not happen under Linux and
5.6.0.

IF instead spider() takes a scalar URL string (and extract_links()
returns a list of $uri->as_string URLs), then memory holds steady.

So when returning URI objects, the FreeBSD machine is climbing past 85M
virtual memory after just 70 docs, where when using scalars it is only
using about 8M after 2000 docs and holding steady.

My knowledge of perl internals (and lack of time) means I'm a bit stuck
at this point in finding out where the problem is.  I'm not trying to
get my script to work (I have a work-around), rather I', trying to find
out what is the problem.

What does one do at this point?  (besides just use scalars...)

Thanks,

--
Bill Moseley



Fri, 28 Nov 2003 23:06:55 GMT  
 Tracking down a memory-eating bug
[A complimentary Cc of this posting was sent to
Bill Moseley

Quote:
> sub spider {
>    my $doc = get( shift );
>    my $links = extract_links( \$doc );


> }
> IF instead spider() takes a scalar URL string (and extract_links()
> returns a list of $uri->as_string URLs), then memory holds steady.

How much things change if you use


?

Ilya



Sat, 29 Nov 2003 03:31:05 GMT  
 Tracking down a memory-eating bug
On Mon, 11 Jun 2001 19:31:05 +0000 (UTC) Ilya Zakharevich (nospam-

Quote:
> [A complimentary Cc of this posting was sent to
> Bill Moseley

> > sub spider {
> >    my $doc = get( shift );
> >    my $links = extract_links( \$doc );


> > }

> > IF instead spider() takes a scalar URL string (and extract_links()
> > returns a list of $uri->as_string URLs), then memory holds steady.

> How much things change if you use



Not at all.  Still eats memory like crazy on FreeBSD, but not on Linux.

What's next?

--
Bill Moseley



Sun, 30 Nov 2003 11:25:55 GMT  
 Tracking down a memory-eating bug
[A complimentary Cc of this posting was sent to
Bill Moseley

Quote:
> > > IF instead spider() takes a scalar URL string (and extract_links()
> > > returns a list of $uri->as_string URLs), then memory holds steady.

> > How much things change if you use


> Not at all.  Still eats memory like crazy on FreeBSD, but not on Linux.

> What's next?

Depends on how much you want to contribute.  The simplest way for a
guts-ignorant person is to converge two systems (leaking and
non-leaking) until you find a minimal change which causes the leak.

Try compiling identical versions, changing -Dusemymalloc, using the
same versions of LWP etc (many tries may be needed, be patient, and do
builds in different dirs - but with modern systems Perl may be build
less than in 10min).  When found, you need to simplify the logic of
your script until again, a *minimal* change causes the leak (best not
involving monsters like LWP - but this may be not possible if LWP is
the culprit).

After this investigation somebody who knows the guts *may* get a
possibility to participate in the debugging.

Hope this helps,
Ilya



Mon, 01 Dec 2003 04:51:21 GMT  
 Tracking down a memory-eating bug
[A complimentary Cc of this posting was sent to
Bill Moseley

Quote:
> > How much things change if you use


> Not at all.

Stupid me.  DWIM hitting again!  *Of course* undef does not act on $_
by default, so it should be written as


BTW, investigating why you did not get warnings for this, I see

    > perl -wle "sin, 5 for 1..4"
    Useless use of sin in void context at -e line 1.
    Useless use of a constant in void context at -e line 1.
    Useless use of sin in void context at -e line 1.

    > perl -wle "sin, undef for 1..4"
    Useless use of sin in void context at -e line 1.
    Useless use of sin in void context at -e line 1.

I see two problems: douplicate warning on sin, and no warning on undef
in void context.  Are they bugs?

Ilya



Mon, 01 Dec 2003 06:06:39 GMT  
 Tracking down a memory-eating bug
On Wed, 13 Jun 2001 22:06:39 +0000 (UTC) Ilya Zakharevich (nospam-

Quote:
> [A complimentary Cc of this posting was sent to
> Bill Moseley

> > > How much things change if you use


> > Not at all.

> Stupid me.  DWIM hitting again!  *Of course* undef does not act on $_
> by default, so it should be written as



That didn't help either -- still eating memory.

at

Quote:
>> +Fetched 22 Cnt: 45

ps -aux -p 1680
USER       PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED       TIME COMMAND
root      1680 55.5 56.5 72640 35576  p0  D+    9:36AM    0:26.45 perl
spider39.pl

Besides being a "guts-ignorant" person, I don't have FreeBSD to test
with.  When you have asked me to test something, I write up the test
code, and then send it to the person that's having the problem.  They
run it, and send back results.  Not a great debugging cycle.  

What would be helpful is if someone else that's running FreeBSD
2.2.8 and perl 5.6.1 could test it on their platform to see it if shows
the same thing.

--
Bill Moseley



Tue, 02 Dec 2003 01:15:43 GMT  
 Tracking down a memory-eating bug


Quote:
>[A complimentary Cc of this posting was sent to
>Bill Moseley

>> sub spider {
>>    my $doc = get( shift );
>>    my $links = extract_links( \$doc );


>> }

>> IF instead spider() takes a scalar URL string (and extract_links()
>> returns a list of $uri->as_string URLs), then memory holds steady.

>How much things change if you use



What's the purpose behind your ", undef" suggestion?

I think I'll learn something general about Perl
from your answer.

Thanks

David



Sat, 06 Dec 2003 07:06:54 GMT  
 Tracking down a memory-eating bug

Quote:



>>[A complimentary Cc of this posting was sent to
>>Bill Moseley

>>> sub spider {
>>>    my $doc = get( shift );
>>>    my $links = extract_links( \$doc );


>>> }

>>> IF instead spider() takes a scalar URL string (and extract_links()
>>> returns a list of $uri->as_string URLs), then memory holds steady.

>>How much things change if you use


>What's the purpose behind your ", undef" suggestion?


been used.  If the element is a complex object, it gets DESTROY'ed,
freeing up memory that it may have been referring to.
        -Joe

--
See http://www.inwap.com/ for PDP-10 and "ReBoot" pages.



Thu, 11 Dec 2003 17:35:17 GMT  
 Tracking down a memory-eating bug

Quote:





[snip]
> >>How much things change if you use


> >What's the purpose behind your ", undef" suggestion?


> been used.  If the element is a complex object, it gets DESTROY'ed,
> freeing up memory that it may have been referring to.


(the one on 06/13/2001 4:06 pm):

IZ> Stupid me.  DWIM hitting again!  *Of course* undef does not act on $_
IZ> by default, so it should be written as
IZ>

--
The longer a man is wrong, the surer he is that he's right.



Fri, 12 Dec 2003 11:40:19 GMT  
 Tracking down a memory-eating bug

Quote:

> I posted this question to the LWP list, but have had no
> response.  Complete working code example is listed there.

> I've got a bug that is only showing up under FreeBSD, and I need
> direction in how to track it down.

> I have a spider program that works basically like:

> spider( $url );
> sub spider {
>    my $doc = get( shift );
>    my $links = extract_links( \$doc );


> }

> ...

LWP or your program consume storage: Try to check your program by
coding non-recursive sub and undef $doc :-)

sub spider {

        do {

                my $links = extract_links( \$doc );


Quote:
}

Zur

P.S. IMHO spider must use "push" and not "unshift", to make your
spider more friendly. The unshift version "attack" the first sites
with many "get".

P.P.S Check LWP if this does not solve the problem;



Tue, 16 Dec 2003 07:23:45 GMT  
 
 [ 10 post ] 

 Relevant Pages 

1. Add and delete items from arrays

2. DDE between VB and Delphi

3. tracking down memory leak

4. tracking down a memory leak in 5.005_03

5. Tracking down a bug

6. findNearest and a null value...

7. URGENT: Tracking down a spammer!!

8. Tracking down Stray Modules and Versions?

9. Tracking down core dumps in Perl/Tk

10. How do I track down a coredump in my new widget

11. eval eats up all my memory! (in perl5)

12. Sybperl eats memory like mad

 

 
Powered by phpBB® Forum Software