Perl Internals 
Author Message
 Perl Internals

Hello All!!
Is there any way to get teh sizes in bytes of how many bytes perl uses to
store refs, strings, arrays, hashes, etc


Thu, 21 Aug 2003 05:23:48 GMT  
 Perl Internals

Quote:

> Hello All!!
> Is there any way to get teh sizes in bytes of how many bytes perl uses to
> store refs, strings, arrays, hashes, etc

There is, but if you think you need to know it, you probably want to
Rethink Your Design. The size 'ivsize' (provided by Config.pm) is a
size guaranteed to hold both an integer value and a pointer. (reference)
Strings rather depend on the length of the string you're storing.

An SV (Scalar Value) consists of (and come on, it's easy enough for
you to check this out in sv.h) 32 bits of reference count, 32 bits of
flags, and then a variable number of bits for data. Let's take a
string value (internally, we call the string bit of the SV an "xpv")
as an example. Because Perl keeps track of how long strings are and
enables them to grow and shrink without many expensive calls to malloc
and free, it holds the string buffer itself. Hence, there's a pointer
to the string buffer, (which is probably about 32 bytes, right?) plus
two pointers to the end of the current string and the maximum possible
length of the string buffer - this allows for the *current* string to
grow and shrink while we also keep track of the largest *potential*
string. OK.

So, to store a string in an SV you need 64 bits of SV metadata, 32
bits of character pointer and 64 bits of pointers into that char
data. That's 160 bits. Plus whatever text you want in there. If you
feel you need to care about how many bits are representing your data,
you're probably working in the wrong language.

--
Familiarity breeds facility.
        -- Megahal (trained on asr), 1998-11-06



Thu, 21 Aug 2003 12:26:26 GMT  
 Perl Internals

Quote:


>> Hello All!!
>> Is there any way to get teh sizes in bytes of how many bytes perl uses to
>> store refs, strings, arrays, hashes, etc

(Eduard, when posting to more than one newsgroup you should cross post,
not post multiple copies. You've got separate answers in multiple
groups at this point)

Quote:
> There is, but if you think you need to know it, you probably want to
> Rethink Your Design. The size 'ivsize' (provided by Config.pm) is a
> size guaranteed to hold both an integer value and a pointer. (reference)
> Strings rather depend on the length of the string you're storing.

That's an awfully broad statement--there's a fair number of reasons to
want to know, not the least of which is being unable to change the
underlying design. Knowing how perl allocates and uses memory is also
the first step towards rational design choice--you can't really know
in advance whether you're going to blow through some resource unless
you have some idea what your resource usage is likely to be.

One hidden memory cost that tends to pop up is perl's array and hash
preallocation. When perl needs to make an array or hash larger, it
usually reallocates the structures about twice as large as they were,
whether you ultimately use the memory or not. For small structures
it's no big deal. For large ones it can be a significant drain on
memory.

Perl also tends to keep lexical structures around, so if you
enter a block, allocate a monster lexical array, then exit the block
that memory's held on to in case you reenter the block (at which point
it'll be reused). Sooner or later it'll be freed, but it can take
quite a while.

Also don't overlook the cost of flattening huge arrays or hashes.
Perl will grow its internal stack to be big enough to manage, but
that space will likely not get freed again. (Though it will be
reused) Push a million scalars onto the stack and, when it's
finally finished, you'll have a chunk of memory allocated that
you'll probably never use again.

                                        Dan



Fri, 22 Aug 2003 04:22:11 GMT  
 Perl Internals
[A complimentary Cc of this posting was sent to Dan Sugalski


Quote:
> One hidden memory cost that tends to pop up is perl's array and hash
> preallocation. When perl needs to make an array or hash larger, it
> usually reallocates the structures about twice as large as they were,
> whether you ultimately use the memory or not.

For arrays this is just plain not true.  For hashes this is true, but
irrelevant: the overhead of AvARRAY (which is doubled indeed) is
negligeable comparing to per-entry overheads.

Quote:
> Also don't overlook the cost of flattening huge arrays or hashes.
> Perl will grow its internal stack to be big enough to manage, but
> that space will likely not get freed again. (Though it will be
> reused) Push a million scalars onto the stack and, when it's
> finally finished, you'll have a chunk of memory allocated that
> you'll probably never use again.

Usually other kinds of growing happen too, so you end with 3 huge
chunks of memory that you'll probably never use again.  Peruse
env PERL_DEBUG_MSTATS=1 to see this.  IIRC, one other chunk is the
mortals stack, do not remember about the third one.

Compare:

  env PERL_DEBUG_MSTATS=1 \

  Name "main::x" used only once: possible typo at -e line 1.
  Memory allocation statistics after execution:   (buckets 4(4)..69624(65536)
     16836 free:   187    69    55    21    11   1   2     1   0 0 0 0 0 0
                449   116    46    41    16
    233376 used:    68    58    71    41     5   7   2   131   0 1 1 0 0 1
                 62    54   124    85     9
  Total sbrk(): 289736/32:180. Odd ends: pad+heads+chain+tail: 968+1692+28672+8192.

  env PERL_DEBUG_MSTATS=1 \

  Name "main::x" used only once: possible typo at -e line 1.
  Memory allocation statistics after execution:   (buckets 4(4)..135160(131072)
     15596 free:   187    69    51    21    11   1   2     0   0 0 0 0 0 0 0
                449   116    46    39    16
    681208 used:    68    58    75    41     5   7   2   370   0 1 1 0 0 2 1
                 62    54   124    87     9
  Total sbrk(): 756680/47:195. Odd ends: pad+heads+chain+tail: 968+3612+34816+20480.

The only difference is the call to f in an array context.

Ilya



Fri, 22 Aug 2003 06:36:04 GMT  
 Perl Internals

Quote:

> [A complimentary Cc of this posting was sent to Dan Sugalski


>> One hidden memory cost that tends to pop up is perl's array and hash
>> preallocation. When perl needs to make an array or hash larger, it
>> usually reallocates the structures about twice as large as they were,
>> whether you ultimately use the memory or not.
> For arrays this is just plain not true.

I see you're as tactful and helpful as always, Ilya. :) It looks like
the worst case resize for an array is the entry number you're extending
the array out to plus 1/5 the number of elements currently in the
array.

Quote:
> For hashes this is true, but
> irrelevant: the overhead of AvARRAY (which is doubled indeed) is
> negligeable comparing to per-entry overheads.

Not irrelevant at all. The array overhead is taken as a single
chunk of contiguous memory. For smaller arrays it's not a big deal as
they'll probably fit in existing holes in memory. With larger
arrays you're going to end up grabbing new memory from the OS
every time. While the released memory will get reused by new
scalars and such, you still need to be able to snag a chunk of
memory that's potentially large. In this case (if the 15
million element number was correct) that could be up to
a 64M chunk.

A single 64M piece of memory's not an insignificant allocation.

Quote:
>> Also don't overlook the cost of flattening huge arrays or hashes.
>> Perl will grow its internal stack to be big enough to manage, but
>> that space will likely not get freed again. (Though it will be
>> reused) Push a million scalars onto the stack and, when it's
>> finally finished, you'll have a chunk of memory allocated that
>> you'll probably never use again.
> Usually other kinds of growing happen too, so you end with 3 huge
> chunks of memory that you'll probably never use again.  Peruse
> env PERL_DEBUG_MSTATS=1 to see this.  IIRC, one other chunk is the
> mortals stack, do not remember about the third one.

I keep forgetting about the debugging stuff built into the
memory allocator. (Probably because I don't usually use it) We
really need to get this better documented somewhere, along with
the rest of the stuff you can get out of a DEBUGGING build.

                                Dan



Sat, 23 Aug 2003 01:06:41 GMT  
 Perl Internals

Quote:
> [A complimentary Cc of this posting was sent to Dan Sugalski


> > One hidden memory cost that tends to pop up is perl's array and hash
> > preallocation. When perl needs to make an array or hash larger, it
> > usually reallocates the structures about twice as large as they were,
> > whether you ultimately use the memory or not.

> For arrays this is just plain not true.  For hashes this is true, but
> irrelevant: the overhead of AvARRAY (which is doubled indeed) is
> negligeable comparing to per-entry overheads.

I discovered that groups.google.com has a convenient "view thread"
link on the list of found articles, so I can finally see which replies
to my posts did not propagate to our site...

  Not irrelevant at all. The array overhead is taken as a single
  chunk of contiguous memory. For smaller arrays it's not a big deal as
  they'll probably fit in existing holes in memory. With larger
  arrays you're going to end up grabbing new memory from the OS
  every time. While the released memory will get reused by new
  scalars and such, you still need to be able to snag a chunk of
  memory that's potentially large. In this case (if the 15
  million element number was correct) that could be up to
  a 64M chunk.

  A single 64M piece of memory's not an insignificant allocation.

This is hardly relevant if you rememeber that the *elements* of the
hash are going to take *at least* 600M of memory (unless some of them
are undefs ;-).   During allocation of these elements all the holes in
memory will be filled.  No reason to worry unless you have an
extremely bad malloc().

  I keep forgetting about the debugging stuff built into the
  memory allocator. (Probably because I don't usually use it) We
  really need to get this better documented somewhere, along with
  the rest of the stuff you can get out of a DEBUGGING build.

Until the grand dumbification of the debugging support this was
clearly documented in - guess where? - perldoc perldebug.

Ilya



Wed, 03 Sep 2003 09:36:39 GMT  
 Perl Internals

Quote:


>> [A complimentary Cc of this posting was sent to Dan Sugalski


>> > One hidden memory cost that tends to pop up is perl's array and hash
>> > preallocation. When perl needs to make an array or hash larger, it
>> > usually reallocates the structures about twice as large as they were,
>> > whether you ultimately use the memory or not.

>> For arrays this is just plain not true.  For hashes this is true, but
>> irrelevant: the overhead of AvARRAY (which is doubled indeed) is
>> negligeable comparing to per-entry overheads.

>   Not irrelevant at all. The array overhead is taken as a single
>   chunk of contiguous memory. For smaller arrays it's not a big deal as
>   they'll probably fit in existing holes in memory. With larger
>   arrays you're going to end up grabbing new memory from the OS
>   every time. While the released memory will get reused by new
>   scalars and such, you still need to be able to snag a chunk of
>   memory that's potentially large. In this case (if the 15
>   million element number was correct) that could be up to
>   a 64M chunk.
>   A single 64M piece of memory's not an insignificant allocation.
> This is hardly relevant if you rememeber that the *elements* of the
> hash are going to take *at least* 600M of memory (unless some of them
> are undefs ;-).   During allocation of these elements all the holes in
> memory will be filled.  No reason to worry unless you have an
> extremely bad malloc().

Nonsense. It's entirely possible to end up with a situation where you
have a lot of free memory, yet no chunk large enough to allocate for
the contiguous piece that perl needs for the array. Certain malloc
implementations can exacerbate this, certainly, but the situation can
and will arise.

In this particular case there's also the potentially large memory
overhead imposed by malloc itself, tracking all the memory allocations
for the string data in the scalars, but that's a separate problem
entirely from the lack of contiguous space issue.

All this definitely makes me want to have a copying collector and
better memory tracking and statistics as part of perl 6, but that's
a separate issue as well.

                                        Dan



Tue, 09 Sep 2003 23:33:29 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Weird index out of date error

2. Question about TField component.

3. Want to buy Delphi

4. Paradox table in a read only directory on a network

5. Help :What This means ?

6. Someone HELP me toooo!!!!!!

7. interrupt handler

8. Pretty-print

9. DBGrid --- D1

10. TMemo Component

11. SQLTable does not show list of tables

12. Perl internals question: changing PL_origfilename in script

 

 
Powered by phpBB® Forum Software