RFC: FAQ3 update -- Using less memory 
Author Message
 RFC: FAQ3 update -- Using less memory

Having recently seen yet another question on reducing memory usage
posted to c.l.p.misc, I reviewed the FAQ entry and decided that the
answer given there isn't very helpful for most people. (Or at least the
people most likely to ask this question.) Instead of grumbling about
it, I decided to fix it. In the spirit of the Perl community, I'd
appreciate any comments/additions/corrections before I submit this. The
first three paragraphs are the current text, the rest is my addendum.

[POD follows]

=head2 How can I make my Perl program take less memory?

When it comes to time-space tradeoffs, Perl nearly always prefers to
throw memory at a problem.  Scalars in Perl use more memory than strings
in C, arrays take more than that, and hashes use even more.  While
there's still a lot to be done, recent releases have been addressing
these issues.  For example, as of 5.004, duplicate hash keys are shared
amongst all hashes using them, so require no reallocation.

In some cases, using substr() or vec() to simulate arrays can be
highly beneficial.  For example, an array of a thousand booleans will
take at least 20,000 bytes of space, but it can be turned into one
125-byte bit vector for a considerable memory savings.  The standard
Tie::SubstrHash module can also help for certain types of data
structure.  If you're working with specialist data structures
(matrices, for instance) modules that implement these in C may use
less memory than equivalent Perl modules.

Another thing to try is learning whether your Perl was compiled with
the system malloc or with Perl's builtin malloc.  Whichever one it
is, try using the other one and see whether this makes a difference.
Information about malloc is in the F<INSTALL> file in the source
distribution.  You can find out whether you are using perl's malloc by
typing C<perl -V:usemymalloc>.

Of course, the best way to save memory is to not do anything to waste
it in the first place. Good programming practices can go a long way
toward this:

=over 4

=item * Don't slurp!

Don't read an entire file into memory if you can process it line
by line. Whenever possible, use this:

        while (<FILE>) {
           # ...
        }

instead of this:



            # ...
        }

When the files you're processing are small, it doesn't much matter which
way you do it, but it makes a huge difference when they start getting
larger. The latter method keeps eating up more and more memory, while
the former method scales to files of any size.

If you do need the whole file in memory, read it directly into the data
stucture where it will be used; that way you don't have multiple copies
of data clogging up RAM.

=item * Localize!

Don't make anything global that doesn't have to be. Use my() prodigously
to localize variables to the smallest possible scope. Memory freed by
variables that have gone out of scope can be reused elsewhere,
preventing the need for additional allocations.

=item * Pass by reference

Pass arrays and hashes by reference, not by value. For one thing, it's
the only way to pass multiple lists or hashes (or both) in a single
call/return. It also avoids creating a copy of all the contents. This
requires some judgement, however, because any changes will be propagated
back to the original data. If you really want to mangle (er, modify) a
copy, you'll have to sacrifice the memory needed to make one.

=item * Tie large variables to disk.

For "big" data stores (i.e. ones that exceed available memory) consider
using one of the DB modules to store it on disk instead of in RAM. This
will incur a penalty in access time, but that's probably better that
causing your hard disk to thrash due to massive swapping.

=back

-mjc



Fri, 22 Aug 2003 22:35:41 GMT  
 RFC: FAQ3 update -- Using less memory


Quote:
>Having recently seen yet another question on reducing memory usage
>posted to c.l.p.misc, I reviewed the FAQ entry and decided that the
>answer given there isn't very helpful for most people. (Or at least the
>people most likely to ask this question.) Instead of grumbling about
>it, I decided to fix it. In the spirit of the Perl community, I'd
>appreciate any comments/additions/corrections before I submit this. The
>first three paragraphs are the current text, the rest is my addendum.

I've added your changes to my copy of the perl-5.6.1-TRIAL2 perlfaq3.pod
with the following very minor change. I'll post the changes to perl5porters
and the pumpking if there is no further discussion.

diff -c faqaddition.orig faqaddition
*** faqaddition.orig    Mon Mar  5 12:37:26 2001
--- faqaddition Mon Mar  5 12:36:25 2001
***************
*** 56,65 ****

  =item * Localize!

! Don't make anything global that doesn't have to be. Use my() prodigously
! to localize variables to the smallest possible scope. Memory freed by
! variables that have gone out of scope can be reused elsewhere,
! preventing the need for additional allocations.

  =item * Pass by reference

--- 56,66 ----

  =item * Localize!

! Don't make anything global that doesn't have to be. Use my()
! prodigiously to localize variables to the smallest possible scope.
! Memory freed by variables that have gone out of scope can be reused
! elsewhere in the current program, preventing the need for additional
! allocations from system memory.

  =item * Pass by reference

***************
*** 78,81 ****
--- 79,83 ----
  causing your hard disk to thrash due to massive swapping.

  =back
+
--
    This space intentionally left blank



Sat, 23 Aug 2003 04:05:17 GMT  
 RFC: FAQ3 update -- Using less memory
[A complimentary Cc of this posting was sent to Chris Fedde


Quote:
> ! Don't make anything global that doesn't have to be. Use my()
> ! prodigiously to localize variables to the smallest possible scope.
> ! Memory freed by variables that have gone out of scope can be reused
> ! elsewhere in the current program, preventing the need for additional
> ! allocations from system memory.

This ignores the fact that memory used by locals *is* reused, but one
used by lexicals *is not*.

But I'm not surprized.  Perl's FAQ is much more a political document
than a reliable document....

Hope this helps,
Ilya



Sat, 23 Aug 2003 15:30:46 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:


>> ! Don't make anything global that doesn't have to be. Use my()
>> ! prodigiously to localize variables to the smallest possible scope.
>> ! Memory freed by variables that have gone out of scope can be reused
>> ! elsewhere in the current program, preventing the need for additional
>> ! allocations from system memory.

> This ignores the fact that memory used by locals *is* reused, but one
> used by lexicals *is not*.

Whoa there! That's news to me.

So if I do this:

SOME_BLOCK: {

    # ...

Quote:
}


allocated to it still won't be freed up for use elsewhere in the program
after I leave the block?

Quote:
> But I'm not surprized.  Perl's FAQ is much more a political document
> than a reliable document....

Apparently, because I've read that memory is reused in other parts of
the FAQ. I thought that I'd seen it elsewhere as well, but as I can't
find a book reference to it right now I may just be recalling reading it
here.

If this is true, then I want to strike that addition. (Okay, Chris?)
Localizing is still good advice, of course, but I'd hate to recommend it
for reasons that are inaccurate.

Side note: Admittedly, my "Pass by reference" addition is a little fuzzy
as well, but I was trying to keep things fairly short and didn't want to
get into all the details and subtle points.

-mjc



Sun, 24 Aug 2003 01:09:51 GMT  
 RFC: FAQ3 update -- Using less memory
: [A complimentary Cc of this posting was sent to Chris Fedde


: > ! Don't make anything global that doesn't have to be. Use my()
: > ! prodigiously to localize variables to the smallest possible scope.
: > ! Memory freed by variables that have gone out of scope can be reused
: > ! elsewhere in the current program, preventing the need for additional
: > ! allocations from system memory.

: This ignores the fact that memory used by locals *is* reused, but one
: used by lexicals *is not*.

: But I'm not surprized.  Perl's FAQ is much more a political document
: than a reliable document....

On the issue of memory, you should be careful when using map or grep.
This may not be a problem anymore, but in earlier versions (probably 5.003
or 5.004), it appeared that map and grep would potentially cause an entire
file to be slurped.

I forget the exact reason I tested this and the exact syntax of my tests,
but from memory they were similar to these examples



I could bring my machine to a grinding halt by running -1- on very large
files.  It appeared from the OS memory stats that the entire file must
have been slurped.  -2- had no such problem.

This was probably on 5.003 or 5.004.  



Sun, 24 Aug 2003 01:41:59 GMT  
 RFC: FAQ3 update -- Using less memory

  [ This is a repost of a followup I posted yesterday that seems to
    have been lost.  Apologies if you've read the original. ]  

[...]

Quote:

> =item * Don't slurp!

> Don't read an entire file into memory if you can process it line
> by line. Whenever possible, use this:

>    while (<FILE>) {
>       # ...
>    }

> instead of this:



>        # ...
>    }

  and B<never> use this:

        for (<FILE>) {
           # ...
        }

[...]

Quote:
> =item * Localize!

> Don't make anything global that doesn't have to be. Use my() prodigously
> to localize variables to the smallest possible scope. Memory freed by
> variables that have gone out of scope can be reused elsewhere,
> preventing the need for additional allocations.

  =item * Avoid unnecessary quotes and stringification

  Don't use quote large strings unless absolutely necessary:

        my $copy = "$large_string";

  makes 2 copies of $large_string (one for $copy and another for
  the quotes), whereas

        my $copy = $large_string;

  only makes one copy.

  Ditto for stringifying large arrays:

        {
            local $, = "\n";

        }

  is much more memory-efficient than either


  or
        {
            local $" = "\n";

        }

  If you need to initialize a large variable in your code, you
  might consider doing it with an eval statement like this:

        my $large_string = eval ' "a" x 5_000_000 ';

  This allows perl to immediately free the memory allocated to the
  eval statement, but carries a (small) performance penalty.

Quote:
> =item * Pass by reference

> Pass arrays and hashes by reference, not by value. For one thing, it's
> the only way

  (sans prototyping)

Quote:
> to pass multiple lists or hashes (or both) in a single call/return. It
> also avoids creating a copy of all the contents.

<correction>

  Array elements are passed by reference, not copied (like hash entries
are). The differences between


and


are




There's probably more, that's all I can think of off the top of my head.
I think you should rework this section a bit.

</correction>

Quote:
> This requires some judgement, however, because any changes will be
> propagated back to the original data. If you really want to mangle
> (er, modify) a copy, you'll have to sacrifice the memory needed to
> make one.

        ... If your copy consumes a large amount of RAM, you may want
  to explicitly undef() your copy once you are no longer need it. Perl
  might then return the additional memory back to the OS.

[...]

Otherwise it looks good to me.

HTH

--
Joe Schaefer    "Not everything that counts can be counted, and not everything
                                 that can be counted counts."
                                               --Albert Einstein



Sun, 24 Aug 2003 01:43:02 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:

> On the issue of memory, you should be careful when using map or grep.
> This may not be a problem anymore, but in earlier versions
> (probably 5.003 or 5.004), it appeared that map and grep would
> potentially cause an entire file to be slurped.



Good point. It still does this in 5.6, which makes perfect sense, as map
and grep both expect a list. It would require some special magic to make
this loop over <FILE> instead of slurping it.

-mjc



Sun, 24 Aug 2003 02:45:52 GMT  
 RFC: FAQ3 update -- Using less memory
[A complimentary Cc of this posting was sent to Malcolm Dew-Jones


Quote:
> On the issue of memory, you should be careful when using map or grep.
> This may not be a problem anymore, but in earlier versions (probably 5.003
> or 5.004), it appeared that map and grep would potentially cause an entire
> file to be slurped.

Since map and grep have no relationship to files, this is meaningless

Here <FILE> is in an array context.

Hope this helps,
Ilya



Sun, 24 Aug 2003 02:57:25 GMT  
 RFC: FAQ3 update -- Using less memory
(snip)

: On the issue of memory, you should be careful when using map or grep.
: This may not be a problem anymore, but in earlier versions (probably 5.003
: or 5.004), it appeared that map and grep would potentially cause an entire
: file to be slurped.

: I forget the exact reason I tested this and the exact syntax of my tests,
: but from memory they were similar to these examples



: I could bring my machine to a grinding halt by running -1- on very large
: files.  It appeared from the OS memory stats that the entire file must
: have been slurped.  -2- had no such problem.

: This was probably on 5.003 or 5.004.  

It has been pointed out to me that the above behaviour has nothing to do
with map or grep.  It is because <FILE> is used in an array context, and
is therefore simply a slurp.

However, the point still remains.

        Whereas a shell script might sensibly use something like this...
        E.g.
                make_data | grep condition | cut columns > newfile

        which doesn't need much memory,

        the intuitively equivalent perl
        E.g.

        may not be a good idea because it may use lots of memory.



Sun, 24 Aug 2003 04:17:47 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:
> ...is used in an array context...

I should point out that there's no such thing as array context in
perl--it's either scalar or list. (Ilya mis-spoke when he used
"array context" earlier)

                                Dan



Sun, 24 Aug 2003 05:57:22 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:

> I decided to fix [perlfaq3.17]. In the spirit of the Perl community,
> I'd appreciate any comments/additions/corrections before I submit this.

Based on the comments I've recieved so far (thanks!), here's the current
revision:

=head2 How can I make my Perl program take less memory?

When it comes to time-space tradeoffs, Perl nearly always prefers to
throw memory at a problem.  Scalars in Perl use more memory than strings
in C, arrays take more than that, and hashes use even more.  While
there's still a lot to be done, recent releases have been addressing
these issues.  For example, as of 5.004, duplicate hash keys are shared
amongst all hashes using them, so require no reallocation.

In some cases, using substr() or vec() to simulate arrays can be
highly beneficial.  For example, an array of a thousand booleans will
take at least 20,000 bytes of space, but it can be turned into one
125-byte bit vector for a considerable memory savings.  The standard
Tie::SubstrHash module can also help for certain types of data
structure.  If you're working with specialist data structures
(matrices, for instance) modules that implement these in C may use
less memory than equivalent Perl modules.

Another thing to try is learning whether your Perl was compiled with
the system malloc or with Perl's builtin malloc.  Whichever one it
is, try using the other one and see whether this makes a difference.
Information about malloc is in the F<INSTALL> file in the source
distribution.  You can find out whether you are using perl's malloc by
typing C<perl -V:usemymalloc>.

Of course, the best way to save memory is to not do anything to waste
it in the first place. Good programming practices can go a long way
toward this:

=over 4

=item * Don't slurp!

Don't read an entire file into memory if you can process it line
by line. Whenever possible, use this:

        while (<FILE>) {
           # ...
        }

instead of this:



            # ...
        }

and B<never> use this:

        for (<FILE>) {
           # ...
        }

When the files you're processing are small, it doesn't much matter which
way you do it, but it makes a huge difference when they start getting
larger. The latter method keeps eating up more and more memory, while
the former method scales to files of any size.

If you do need the whole file in memory, read it directly into the data
structure where it will be used; that way you don't have multiple copies
of data clogging up RAM.

=item * Use map and grep selectively

Remember that both map and grep expect a LIST argument, so doing this:


will cause the entire file to be slurped. For large files, it's better
to loop:

        while (<FILE>) {

        }

=item * Avoid unnecessary quotes and stringification

Don't quote large strings unless absolutely necessary:

        my $copy = "$large_string";

makes 2 copies of $large_string (one for $copy and another for the
quotes), whereas

        my $copy = $large_string;

only makes one copy.

Ditto for stringifying large arrays:

        {
                local $, = "\n";

        }

is much more memory-efficient than either


or

        {
                local $" = "\n";

        }

=item * Consider using C<eval BLOCK>

If you need to initialize a large variable in your code, you
might consider doing it with an eval statement like this:

        my $large_string = eval ' "a" x 5_000_000 ';

This allows perl to immediately free the memory allocated to the
eval statement, but carries a (small) performance penalty.

=item * Pass by reference

Pass arrays and hashes by reference. Perl always passes references, but
calling





judgement, however, because any changes will be propagated back to the
original data. If you really want to mangle (er, modify) a copy, you'll
have to sacrifice the memory needed to make one.

Note: This is also the only way (sans prototyping) to pass multiple
lists and/or hashes in a single call.

=item * Tie large variables to disk.

For "big" data stores (i.e. ones that exceed available memory) consider
using one of the DB modules to store it on disk instead of in RAM. This
will incur a penalty in access time, but that's probably better that
causing your hard disk to thrash due to massive swapping.

=item * Clean out the trash

If you have a variable which consumes a large amount of RAM, you may
want to explicitly undef() once it's no longer needed. Perl might then
return the additional memory back to the OS.

=back



Sun, 24 Aug 2003 06:59:56 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:


>> =item * Don't slurp!

>> [...]

>   and B<never> use this:

>         for (<FILE>) {
>            # ...
>         }

Does that really hit memory harder than the explicit slurp, or is it
just ugly?

Quote:
> =item * Avoid unnecessary quotes and stringification
> [...]
> Ditto for stringifying large arrays:

Interesting; I'd never thought about that before.

Quote:
> If you need to initialize a large variable in your code, you
> might consider doing it with an eval statement like this:

>       my $large_string = eval ' "a" x 5_000_000 ';

> This allows perl to immediately free the memory allocated to the
> eval statement, but carries a (small) performance penalty.

You're just full of ideas, aren't you?

Quote:
>> =item * Pass by reference

>> Pass arrays and hashes by reference, not by value. For one thing, it's
>> the only way

>   (sans prototyping)

Noted.

Quote:
>> to pass multiple lists or hashes (or both) in a single call/return. It
>> also avoids creating a copy of all the contents.

> <correction>

> Array elements are passed by reference, not copied
> [...]
> I think you should rework this section a bit.

Yes, I know, but I was trying to avoid having so much detail that when a
newbie checks the docs (hooray!) they can't see the forest for the
trees.
I agree it needs a little work, and will try to come up with something
better. (Accurate but concise.)

Quote:
>       ... If your copy consumes a large amount of RAM, you may want
> to explicitly undef() your copy once you are no longer need it. Perl
> might then return the additional memory back to the OS.

Hmm. I'd agree with you if not for what Ilya said in another branch of
this thread. It *should* help, but will it?

Thanks for your comments.

-mjc



Sun, 24 Aug 2003 07:00:54 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:



> >> =item * Don't slurp!

> >> [...]

> >   and B<never> use this:

> >         for (<FILE>) {
> >            # ...
> >         }

> Does that really hit memory harder than the explicit slurp, or is it
> just ugly?

Dunno, but it's certainly worse than

        while (<FILE>) {
            #...
        }

which is what you recommended to use.  I can't think of a
situation where "for(<FILE>)" would be reasonable, yet I've
seen it appear in clp.misc on occasion.  I just thought an
explicit "don't do this" was warranted for the FAQ.

[...]

Quote:
> >       ... If your copy consumes a large amount of RAM, you may want
> > to explicitly undef() your copy once you are no longer need it. Perl
> > might then return the additional memory back to the OS.

> Hmm. I'd agree with you if not for what Ilya said in another branch of
> this thread. It *should* help, but will it?

If I understood Ilya correctly, he was discussing perl's internal reuse
(or lack thereof) for memory allocated to lexicals. undef()'ing a
_large_ variable usually (1) causes perl to return the memory to the OS.  
Generally I don't think there's anything to be gained (2) by undef()ing
lots of "normal-sized" variables, and it's certainly not a very Perl-ish
thing to do.

(1) - anecdotal to be sure, but it works for me on linux with 5.005_03
or better.  There was thread about a month ago where Jerome Abela and
I were trying to flesh this out via trial and error, and most of the
remarks I made here were based on that discussion:

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=4cd94c0e0c...

Of course, an expert like Ilya who is intimately familiar with the gc
could certainly do a better job than I did.

(2) a Silvio Dante-ism

Best.
--
Joe Schaefer   "If you pick up a starving dog and make him prosperous, he will
               not bite you. This is the principal difference between a dog and
                                           a man."
                                               --Mark Twain



Sun, 24 Aug 2003 09:13:44 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:

>=item * Consider using C<eval BLOCK>

>If you need to initialize a large variable in your code, you
>might consider doing it with an eval statement like this:

>    my $large_string = eval ' "a" x 5_000_000 ';

Hmm.. surely that's C<eval EXPR> rather than C<eval BLOCK>?

--
Ilmari Karonen - http://www.*-*-*.com/ ~iltzu/
"These fine people, forming the dot in ROOT-SERVERS DOT NET DOT have given
 us a {*filter*}y SERVICE PACK!"             -- Pim van Riezen in the monastery

Please ignore Godzilla / Kira -- do not feed the troll.



Sun, 24 Aug 2003 07:30:30 GMT  
 RFC: FAQ3 update -- Using less memory

Quote:


>> =item * Consider using C<eval BLOCK>

>> If you need to initialize a large variable in your code, you
>> might consider doing it with an eval statement like this:

>>       my $large_string = eval ' "a" x 5_000_000 ';

> Hmm.. surely that's C<eval EXPR> rather than C<eval BLOCK>?

Oops, I said BLOCK because I was thinking to use the compile at
compile-time form instead of compile at runtime; never mind what the
example showed. :) Since one probably can't generalize to say that the
BLOCK form is always feasible I'll change the header to say simply
C<eval>.

-mjc



Sun, 24 Aug 2003 21:53:55 GMT  
 
 [ 20 post ]  Go to page: [1] [2]

 Relevant Pages 

1. DBLookupComboBox behaviour

2. RFC: FAQ3 update -- Using less memory

3. Using less memory for hashes

4. Problems with Callback function

5. Perl Module for RFC 2136 UPDATE...

6. Memory leak in PERL 5 (at least for HPUX)

7. Save memory and measure used memory???

8. Save memory and measure used memory???

9. How do I add EditMask property into DBEdit component?

10. How Does Delphi get the Login UserName from your system?

11. HELP: Turbo Pascal for Windows - Object Windows

12. RFC: Name for new module - local/remote commands using Expect

 

 
Powered by phpBB® Forum Software