Memory behavior of String#dup 
Author Message
 Memory behavior of String#dup

Hi all,

does String#dup also copy the byte sequence of the string or does it only
copy a reference and does a copy on write?

    robert



Fri, 16 Dec 2005 15:41:08 GMT  
 Memory behavior of String#dup
Hi,

In message "Memory behavior of String#dup"

|does String#dup also copy the byte sequence of the string or does it only
|copy a reference and does a copy on write?

When memory is already shared between strings, it does copy-on-write,
otherwise it copies.  From my observation, many of duped strings are
modified right after the dup, so that I felt it is wise to avoid
making new internal copy-on-write entries for duping.

                                                        matz.



Fri, 16 Dec 2005 16:36:34 GMT  
 Memory behavior of String#dup



Quote:
> In message "Memory behavior of String#dup"

> |does String#dup also copy the byte sequence of the string or does it
only
> |copy a reference and does a copy on write?

> When memory is already shared between strings, it does copy-on-write,
> otherwise it copies.  From my observation, many of duped strings are
> modified right after the dup, so that I felt it is wise to avoid
> making new internal copy-on-write entries for duping.

s1 = "foo"
s2 = s2.dup

So, if I understand you correctly s1 and s2 don't share the same byte
sequence since s1 is the only string referring tho the sequence "foo" when
the dup occurs (i.e. the sequence is not shared).  Is that correct?

The question why I'm asking is, that for hashes where an entry shares the
key (either directly because it is the same string in h[s1]=s1 or
indirectly because the value is an instance that refers the key
indirectly) there would be enourmous memory consumption if all those
dup'ed hash key strings did also contain a copy of the byte sequence.  The
problem I have with this duping is that I can't prevent it.  So there's at
least the overhead of a new created String instance, because apparently (v
1.7.3) the Hash doesn't honor the freeze state of the string.

If that change has not been incorporated I suggest doing the dup only if a
string is not frozen.  Otherwise the user has no chance to avoid the dup
for strings.

Regards

    robert

h = Hash.new

s1 = "key 1"
s2 = "key 2"
s2.freeze

h[s1]=s1
h[s2]=s2

h.each do |k,v|
  puts "#{k}=>#{v}"
  puts "#{k.id}=>#{v.id}"
  case k
    when s1
      p [k.equal?( s1 ), v.equal?( s1 )]
    when s2
      p [k.equal?( s2 ), v.equal?( s2 )]
  end
end

yields

key 1=>key 1
22381332=>22394808
[false, true]
key 2=>key 2
22376868=>22390356
[false, true]



Fri, 16 Dec 2005 17:24:52 GMT  
 Memory behavior of String#dup
Hi,

In message "Re: Memory behavior of String#dup"

|> When memory is already shared between strings, it does copy-on-write,
|> otherwise it copies.  From my observation, many of duped strings are
|> modified right after the dup, so that I felt it is wise to avoid
|> making new internal copy-on-write entries for duping.
|
|s1 = "foo"
|s2 = s2.dup
|
|So, if I understand you correctly s1 and s2 don't share the same byte
|sequence since s1 is the only string referring tho the sequence "foo" when
|the dup occurs (i.e. the sequence is not shared).  Is that correct?

In this case, fortunately memory is shared.  Since all literal strings
have their copy-on-write entries internally.

|The question why I'm asking is, that for hashes where an entry shares the
|key (either directly because it is the same string in h[s1]=s1 or
|indirectly because the value is an instance that refers the key
|indirectly) there would be enourmous memory consumption if all those
|dup'ed hash key strings did also contain a copy of the byte sequence.  The
|problem I have with this duping is that I can't prevent it.  So there's at
|least the overhead of a new created String instance, because apparently (v
|1.7.3) the Hash doesn't honor the freeze state of the string.

String hash keys are duped and frozen with their memory _shared_.  Is
this what you want to hear?

                                                        matz.



Fri, 16 Dec 2005 17:45:51 GMT  
 Memory behavior of String#dup



Quote:
> Hi,

> In message "Re: Memory behavior of String#dup"

> |> When memory is already shared between strings, it does copy-on-write,
> |> otherwise it copies.  From my observation, many of duped strings are
> |> modified right after the dup, so that I felt it is wise to avoid
> |> making new internal copy-on-write entries for duping.
> |
> |s1 = "foo"
> |s2 = s2.dup
> |
> |So, if I understand you correctly s1 and s2 don't share the same byte
> |sequence since s1 is the only string referring tho the sequence "foo"
when
> |the dup occurs (i.e. the sequence is not shared).  Is that correct?

> In this case, fortunately memory is shared.  Since all literal strings
> have their copy-on-write entries internally.

Ok, then I possibly didn't understand you correctly.

Quote:
> |The question why I'm asking is, that for hashes where an entry shares
the
> |key (either directly because it is the same string in h[s1]=s1 or
> |indirectly because the value is an instance that refers the key
> |indirectly) there would be enourmous memory consumption if all those
> |dup'ed hash key strings did also contain a copy of the byte sequence.
The
> |problem I have with this duping is that I can't prevent it.  So there's
at
> |least the overhead of a new created String instance, because apparently
(v
> |1.7.3) the Hash doesn't honor the freeze state of the string.

> String hash keys are duped and frozen with their memory _shared_.  Is
> this what you want to hear?

:-))  Yeah, that sounds good.

Though I still worry about the overhead of one more ruby instance (there
must be some bookkeeping done etc.).  Is this neglectible?

    robert



Fri, 16 Dec 2005 18:14:54 GMT  
 Memory behavior of String#dup
Hi,

In message "Re: Memory behavior of String#dup"

|Though I still worry about the overhead of one more ruby instance (there
|must be some bookkeeping done etc.).  Is this neglectible?

I guess so.  It's only 20 bytes per object on 32 bit CPU.

                                                        matz.



Fri, 16 Dec 2005 22:13:32 GMT  
 Memory behavior of String#dup



Quote:
> Hi,

> In message "Re: Memory behavior of String#dup"

> |Though I still worry about the overhead of one more ruby instance
(there
> |must be some bookkeeping done etc.).  Is this neglectible?

> I guess so.  It's only 20 bytes per object on 32 bit CPU.

Hm, that amounts to 2 million bytes for 100000 instances - which is not to
much IMHO.  Plus, there will be some overheads for object lookups I guess.

I'd like to propose the change to not dup frozen strings as Hash keys.
Should I enter an RCR?  Do we discuss this here?

Regards

    robert



Sun, 18 Dec 2005 18:30:47 GMT  
 Memory behavior of String#dup
Hi,

In message "Re: Memory behavior of String#dup"

|Hm, that amounts to 2 million bytes for 100000 instances - which is not to
|much IMHO.  Plus, there will be some overheads for object lookups I guess.
|
|I'd like to propose the change to not dup frozen strings as Hash keys.
|Should I enter an RCR?  Do we discuss this here?

Early optimization is the source of all evil. ;-)

Putting joke aside, frozen key string is very useful for finding
bugs.  So I think optimization should be done differently.

                                                        matz.



Sun, 18 Dec 2005 20:57:16 GMT  
 Memory behavior of String#dup
Hi,

In message "Re: Memory behavior of String#dup"

|Putting joke aside, frozen key string is very useful for finding
|bugs.  So I think optimization should be done differently.

Your suggestion inspired me a new dup-freeze optimization.  It'll be
available soon on the CVS.  Thank you.

                                                        matz.



Sun, 18 Dec 2005 22:30:40 GMT  
 Memory behavior of String#dup



Quote:
> Hi,

> In message "Re: Memory behavior of String#dup"

> |Putting joke aside, frozen key string is very useful for finding
> |bugs.  So I think optimization should be done differently.

You lost me here.  Maybe I wasn't clear enough and we have a
misunderstanding.  I meant - quite informally:

class Hash
def []=(key, val)
  if key.kind_of? String && !key.frozen?
    key = key.dup
    key.freeze
  end

  # now insert key and value
end
end

Quote:
> Your suggestion inspired me a new dup-freeze optimization.  It'll be
> available soon on the CVS.  Thank you.

You're welcome!  Do you mean a specialized dup method that returns self if
frozen like

class Object
  def dupFreeze
    frozen? ? self : dup
  end
end

Kind regards

    robert



Mon, 19 Dec 2005 00:18:58 GMT  
 Memory behavior of String#dup
Hi,

In message "Re: Memory behavior of String#dup"

|> Your suggestion inspired me a new dup-freeze optimization.  It'll be
|> available soon on the CVS.  Thank you.
|
|You're welcome!  Do you mean a specialized dup method that returns self if
|frozen like

<snip>

Yes.  Also this specialized dup returns hidden shared string without
making copy if it is available.

                                                        matz.



Mon, 19 Dec 2005 01:13:50 GMT  
 Memory behavior of String#dup



Quote:
> Hi,

> In message "Re: Memory behavior of String#dup"

> |> Your suggestion inspired me a new dup-freeze optimization.  It'll be
> |> available soon on the CVS.  Thank you.
> |
> |You're welcome!  Do you mean a specialized dup method that returns self
if
> |frozen like

> <snip>

> Yes.  Also this specialized dup returns hidden shared string without
> making copy if it is available.

Sounds great!  Thanks a lot!

    robert



Mon, 19 Dec 2005 16:33:13 GMT  
 
 [ 12 post ] 

 Relevant Pages 

1. segfault dup'ing string through C++ (via swig) module

2. Modifying String behavior (superclass varies type)

3. String#slice! changed behavior

4. Strange Ruby String behavior

5. Strange string behavior

6. Odd String Comparison Behavior Question

7. String printing behavior?

8. string to number behavior 08 != 8

9. +R as a name for DUP >R

10. UNDROP an alternative to ?DUP

11. DUP >R

12. dup

 

 
Powered by phpBB® Forum Software