rb_gc_register_address problem 
Author Message
 rb_gc_register_address problem

We ran into a problem today with the garbage collector (caused by our
own bug).  We had a function that did something like this:
  std::vector<char *> foo;
  void func(int argc, VALUE * argv, VALUE self) {
    foo.push_back(STR2CSTR(argv[0]));
    rb_global_variable(&argv[0]);
  }

This is a bug, because once the function returns, argv[0] is invalid,
and global_List contains an invalid VALUE* that may or may not point to
the VALUE we want to keep global.  When the GC tries to mark this
VALUE*, it ends up marking an invalid object.

So my question is: why does rb_gc_register_address() store VALUE*'s
instead of VALUE's?  I suppose that if it stored a VALUE, then it would
be possible to register an object as a global but not keep a reference
to that object; this may not be desirable.  In our case, though, we just
wanted the string to remain valid.

If I do have a bug like this in the future, what techniques can I use to
track it down?

Paul



Sat, 09 Apr 2005 02:54:24 GMT  
 rb_gc_register_address problem

Quote:

> If I do have a bug like this in the future, what techniques can I use to
> track it down?

Not sure if this would help, but most memory debugging tools (like
Purify on Windows, or Valgrind on Linux) should trap it as a dangling
pointer and give you some information about where the original reference
was created -- in your case, during the call to rb_gc_register_address().


Sat, 09 Apr 2005 03:48:32 GMT  
 rb_gc_register_address problem

Quote:

>   std::vector<char *> foo;
>   void func(int argc, VALUE * argv, VALUE self) {
>     foo.push_back(STR2CSTR(argv[0]));
>     rb_global_variable(&argv[0]);
>   }

can i see more of this?

Quote:
> So my question is: why does rb_gc_register_address() store VALUE*'s
> instead of VALUE's?  I suppose that if it stored a VALUE, then it would
> be possible to register an object as a global but not keep a reference
> to that object; this may not be desirable.  In our case, though, we just
> wanted the string to remain valid.

i'm not _positive_, but if you did not store pointers to pointers to objects,
you would not be able to change, in the gc, what that pointerer was pointing
to!  you would only be able to change the thing pointed to itself! eg.  the gc
would never be able to set an object pointer to nil, or anywhere else...

the difference is illustrated here :

  #include <stdlib.h>
  #include <stdio.h>

  void
  i_can_change_what_memory_you_point_to (ptr)
    char **ptr;
  {
    *ptr = "barfoo";
  }

  void
  i_can_change_the_memory_you_point_to (ptr)
    char *ptr;
  {
    *ptr = 'b';
  }

  int
  main (int argc, char **argv, char **env)
  {
    char *buf = malloc(42);
    char *ptr;

    ptr = buf;
    strcpy (ptr, "foobar");
    printf("%s\n", ptr);                  /* -->> "foobar" */
    i_can_change_what_memory_you_point_to (&ptr);
    printf("%s\n", ptr);                  /* -->> "barfoo" */

    ptr = buf;
    strcpy (ptr, "foo");
    printf("%s\n", ptr);                  /* -->> "foo" */
    i_can_change_the_memory_you_point_to (ptr);
    printf("%s\n", ptr);                  /* -->> "boo"    */

    return 0;
  }

the gc must have a prototype like 'i_can_change_what_memory_you_point_to'

Quote:
> If I do have a bug like this in the future, what techniques can I use to
> track it down?

gcc has built in memory leak debugging - but it only works with c.  a good
reason not to use c++ imo. ;-)

i actually have a modification, somewhere, which works with c++, want it?

-a

--

 ====================================
 | Ara Howard
 | NOAA Forecast Systems Laboratory
 | Information and Technology Services
 | Data Systems Group
 | R/FST 325 Broadway
 | Boulder, CO 80305-3328

 | Phone:  303-497-7238
 | Fax:    303-497-7259
 ====================================



Sat, 09 Apr 2005 05:39:10 GMT  
 rb_gc_register_address problem
Hi Paul,

I was puzzled by the example itself; hopefully it is really just an
example:

1) Why do you have a vector of char* instead of std::vector<string> in
C++?  In this case you don't get the advantage of the string copy
constructor when you call the push_back() on the vector, which then
requires you to store the corresponding VALUE.

2) Why do you want store the char* of the VALUE instead of the VALUE
itself?  You can then store the VALUE in a global ruby array instead of a
global vector so that you don't need rb_global_variable(&argv[0]).

3) By its name, rb_global_variable() is supposed to store a C global
variable (of type VALUE); so when we call rb_global_variable() for a
temporary C variable, at that point the C code design is questionable.

Therefore, based on 1), 2), and 3) I don't see any problem that
rb_global_variable() (or rb_gc_register_address()) stores VALUE* instead
of VALUE.

Finally, if there is a technique that can track this kind of bug down
easily, please let us all know, since it will make C programming in
Ruby much safer (and much more fun :) ).

Regards,

Bill
====================================================================

Quote:

> We ran into a problem today with the garbage collector (caused by our
> own bug).  We had a function that did something like this:
>   std::vector<char *> foo;
>   void func(int argc, VALUE * argv, VALUE self) {
>     foo.push_back(STR2CSTR(argv[0]));
>     rb_global_variable(&argv[0]);
>   }
> This is a bug, because once the function returns, argv[0] is invalid,
> and global_List contains an invalid VALUE* that may or may not point to
> the VALUE we want to keep global.  When the GC tries to mark this
> VALUE*, it ends up marking an invalid object.
> So my question is: why does rb_gc_register_address() store VALUE*'s
> instead of VALUE's?  I suppose that if it stored a VALUE, then it would
> be possible to register an object as a global but not keep a reference
> to that object; this may not be desirable.  In our case, though, we just
> wanted the string to remain valid.
> If I do have a bug like this in the future, what techniques can I use to
> track it down?
> Paul



Sat, 09 Apr 2005 06:14:51 GMT  
 rb_gc_register_address problem
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

| I was about to fire up valgrind to find the problem.  I've been told
| that it doesn't play nicely with Ruby's GC, though... is this true?

Yes, but Tanaka Akira posted a valgrind suppressions file that you can
use with Ruby to tell it (valgrind) to shut up about those problems. See:

        http://www.ruby-talk.org/52065

Hope this helps,

Lyle
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE9tJCVRy8FwLkdsRcRAhvHAKCFx0sfJZG2Kp5qdbrUXTDaw1xqbACeJnS5
fxAC3kMB//keo7tQQtwe7Lg=
=JByd
-----END PGP SIGNATURE-----



Sat, 09 Apr 2005 07:41:09 GMT  
 rb_gc_register_address problem
Hi,

At Tue, 22 Oct 2002 03:54:24 +0900,

Quote:

> We ran into a problem today with the garbage collector (caused by our
> own bug).  We had a function that did something like this:
>   std::vector<char *> foo;
>   void func(int argc, VALUE * argv, VALUE self) {
>     foo.push_back(STR2CSTR(argv[0]));
>     rb_global_variable(&argv[0]);
>   }

> This is a bug, because once the function returns, argv[0] is invalid,
> and global_List contains an invalid VALUE* that may or may not point to
> the VALUE we want to keep global.  When the GC tries to mark this
> VALUE*, it ends up marking an invalid object.

If this "func" will be called from ruby as a method, you don't
need to call rb_global_variable() here.

Quote:
> So my question is: why does rb_gc_register_address() store VALUE*'s
> instead of VALUE's?  I suppose that if it stored a VALUE, then it would
> be possible to register an object as a global but not keep a reference
> to that object; this may not be desirable.  In our case, though, we just
> wanted the string to remain valid.

And introduce memory bugs?  Just store the VALUE in a local
variable instead.

--
Nobu Nakada



Sat, 09 Apr 2005 09:02:50 GMT  
 rb_gc_register_address problem

Quote:

> Hi Paul,

> I was puzzled by the example itself; hopefully it is really just an
> example:

> 1) Why do you have a vector of char* instead of std::vector<string> in
> C++?  In this case you don't get the advantage of the string copy
> constructor when you call the push_back() on the vector, which then
> requires you to store the corresponding VALUE.

I'm storing command-line arguments for later use, to pass to a function
that expects an array of char*.

Quote:
> 2) Why do you want store the char* of the VALUE instead of the VALUE
> itself?  You can then store the VALUE in a global ruby array instead of a
> global vector so that you don't need rb_global_variable(&argv[0]).

See above.

Quote:
> 3) By its name, rb_global_variable() is supposed to store a C global
> variable (of type VALUE); so when we call rb_global_variable() for a
> temporary C variable, at that point the C code design is questionable.

Agreed.

But if the GC stores a VALUE* instead of a VALUE, this is an extra (and
seemingly unnecessary) indirection.  I was curious if there was a reason
for it, other than to prevent me from doing what I did.

Paul



Sat, 09 Apr 2005 22:01:17 GMT  
 rb_gc_register_address problem

Quote:

> At Tue, 22 Oct 2002 03:54:24 +0900,

> > We ran into a problem today with the garbage collector (caused by our
> > own bug).  We had a function that did something like this:
> >   std::vector<char *> foo;
> >   void func(int argc, VALUE * argv, VALUE self) {
> >     foo.push_back(STR2CSTR(argv[0]));
> >     rb_global_variable(&argv[0]);
> >   }

> If this "func" will be called from ruby as a method, you don't
> need to call rb_global_variable() here.

foo is global, so the call is necessary, so that the string object will
still be valid after func() returns.

Quote:
> > So my question is: why does rb_gc_register_address() store VALUE*'s
> > instead of VALUE's?  I suppose that if it stored a VALUE, then it would
> > be possible to register an object as a global but not keep a reference
> > to that object; this may not be desirable.  In our case, though, we just
> > wanted the string to remain valid.

> And introduce memory bugs?  Just store the VALUE in a local
> variable instead.

What memory bugs would be introduced?

(My goal here isn't to get the interpreter to change, but to understand
why it is written the way it is).

Paul



Sat, 09 Apr 2005 22:04:28 GMT  
 rb_gc_register_address problem

P> But if the GC stores a VALUE* instead of a VALUE, this is an extra (and
P> seemingly unnecessary) indirection.  I was curious if there was a reason
P> for it, other than to prevent me from doing what I did.

    a = Qnil;
    rb_gc_register_address(a);
    a = rb_ary_new();

Guy Decoux



Sat, 09 Apr 2005 22:11:09 GMT  
 rb_gc_register_address problem

Quote:

> i'm not _positive_, but if you did not store pointers to pointers to
> objects, you would not be able to change, in the gc, what that
> pointerer was pointing to!  you would only be able to change the thing
> pointed to itself! eg.  the gc would never be able to set an object
> pointer to nil, or anywhere else...

I see!  That makes perfect sense.

I had not considered creating global variables that actually change.  I
generally leave them alone once they are created.  I guess I should be
using a constant instead.

Quote:
> gcc has built in memory leak debugging - but it only works with c.  a
> good reason not to use c++ imo. ;-)

Where can I read more about this facility?

Quote:
> i actually have a modification, somewhere, which works with c++, want
> it?

What does the modification involve?

Paul



Sat, 09 Apr 2005 22:13:00 GMT  
 rb_gc_register_address problem

Quote:

> > gcc has built in memory leak debugging - but it only works with c.  a
> > good reason not to use c++ imo. ;-)

> Where can I read more about this facility?

i told a little lie, it works with glibc, whcih on our systems is pretty much
synonymous with gcc (eg our linux systems).  you can read about it in

http://www.gnu.org/manual/glibc-2.2.3/html_mono/libc.html#SEC37

it's **really** usefull, and worth finding a machine with glibc on, at least
to debug with.

Quote:
> > i actually have a modification, somewhere, which works with c++, want
> > it?

> What does the modification involve?

you'll see in the docs above, that turning tracing on outputs a file which a
perl postprocessor then handles.  it consists of lines like

i malloc'd 10 bytes to ptr 0x377
i free'd 0x377

etc..

the postprocessor scans this files and makes sure mallocs and frees are
balanced.  it also uses some backtracing functions to note *where* the leaks
occur.  the problem with c++ is that the leaks will always appear to occur
from the method 'new' (or whatever the internal name is).  note that this is
only because new and delete are implemented in terms of malloc and free in
glibc - maybe this is normal - i don't really know.  anyhow, i added some
checking that works using this logic

  if im called from new
    trace who called me
  else
    normal

in other words, when called from new i output information from further up the
call stack, so you can see where your new is leaking...

this modification requires rebuilding glibc!  however, the file affected is
the mtrace.c file, which is only ever used for memory debugging, so i assure
you nothing would break!  anyhow, it's been six months since i looked at that,
but if you want i'll scan my notes and send you a tar ball of what i did.
when time allows i will contrib this to gcc - but time has not yet allowed!

-a

--

 ====================================
 | Ara Howard
 | NOAA Forecast Systems Laboratory
 | Information and Technology Services
 | Data Systems Group
 | R/FST 325 Broadway
 | Boulder, CO 80305-3328

 | Phone:  303-497-7238
 | Fax:    303-497-7259
 ====================================



Sat, 09 Apr 2005 22:28:56 GMT  
 rb_gc_register_address problem
Hi Paul,

Quote:

> I'm storing command-line arguments for later use, to pass to a function
> that expects an array of char*.

In this case then what you want is probably something like

  std::vector<char *> foo;
  void func(int argc, VALUE * argv, VALUE self) {
    int length;
    char *str;
    rb_str2cstr (argv[0], &length);
    str = malloc (sizeof(char) * (length + 1));
    strcpy (str, STR2CSTR(argv[0]));
    foo.push_back(str);
  }

Of course, I have created a macro/function that performs a task as above
so that I can simply write

  std::vector<char *> foo;
  void func(int argc, VALUE * argv, VALUE self) {
    foo.push_back(create_str_from_rubystr (argv[0]));
  }

My philosophy in Ruby-C integration: separate the C-part and the Ruby-part
as much and as far as possible; we are better-off in the long run this
way.

(deleted)

Quote:
> But if the GC stores a VALUE* instead of a VALUE, this is an extra (and
> seemingly unnecessary) indirection.  I was curious if there was a reason
> for it, other than to prevent me from doing what I did.

I think this is because all Ruby VALUE's of type pointer (not immediate
object) is allocated from a single heap.  As you call
rb_global_variable(), Matz created a linked list containing all the
addresses of the global variables.  During the gc marking process, Matz
simply goes through this list and marks the VALUE pointed by each
VALUE*.  If you had given Matz a VALUE instead, during the gc marking
process Matz has to go through the heap to find the exact VALUE and mark
it for each global variable, and now the process is of O(N**2) instead of
O(N).

Regards,

Bill



Sat, 09 Apr 2005 22:47:38 GMT  
 rb_gc_register_address problem
Well, as long as after each new object creation we call
rb_gc_register_address() (and optionally call rb_gc_unregister_address()),
this is fine:

     a = Qnil;
     rb_gc_register_address(a);
     rb_gc_unregister_address(a);    /* optional */
     a = rb_ary_new();
     rb_gc_register_address(a);

Of course, we are here assuming rb_gc_register_address(VALUE) instead of
rb_gc_register_address(VALUE*) and Matz has to do the O(N**2) process in
rb_gc().

Bill
===========================================================================

Quote:

>     a = Qnil;
>     rb_gc_register_address(a);
>     a = rb_ary_new();
> Guy Decoux



Sat, 09 Apr 2005 23:00:27 GMT  
 rb_gc_register_address problem

Quote:

> (deleted)
> > But if the GC stores a VALUE* instead of a VALUE, this is an extra (and
> > seemingly unnecessary) indirection.  I was curious if there was a reason
> > for it, other than to prevent me from doing what I did.

> I think this is because all Ruby VALUE's of type pointer (not immediate
> object) is allocated from a single heap.  As you call
> rb_global_variable(), Matz created a linked list containing all the
> addresses of the global variables.  During the gc marking process, Matz
> simply goes through this list and marks the VALUE pointed by each
> VALUE*.  If you had given Matz a VALUE instead, during the gc marking
> process Matz has to go through the heap to find the exact VALUE and mark
> it for each global variable, and now the process is of O(N**2) instead of
> O(N).

sounds complicated - why not this simple explanation illustrated here :

  #include <stdlib.h>
  #include <stdio.h>

  void
  i_can_change_what_memory_you_point_to (ptr)
    char **ptr;
  {
    *ptr = "barfoo";
  }

  void
  i_can_change_the_memory_you_point_to (ptr)
    char *ptr;
  {
    *ptr = 'b';
  }

  int
  main (int argc, char **argv, char **env)
  {
    char *buf = malloc(42);
    char *ptr;

    ptr = buf;
    strcpy (ptr, "foobar");
    printf("%s\n", ptr);                  /* -->> "foobar" */
    i_can_change_what_memory_you_point_to (&ptr);
    printf("%s\n", ptr);                  /* -->> "barfoo" */

    ptr = buf;
    strcpy (ptr, "foo");
    printf("%s\n", ptr);                  /* -->> "foo" */
    i_can_change_the_memory_you_point_to (ptr);
    printf("%s\n", ptr);                  /* -->> "boo"    */

    return 0;
  }

the gc _must_ have a prototype like 'i_can_change_what_memory_you_point_to' so
it may, for example, point an object to nil.  otherwise it could only change
the objects pointed to themselves!

-a

--

 ====================================
 | Ara Howard
 | NOAA Forecast Systems Laboratory
 | Information and Technology Services
 | Data Systems Group
 | R/FST 325 Broadway
 | Boulder, CO 80305-3328

 | Phone:  303-497-7238
 | Fax:    303-497-7259
 ====================================

--

 ====================================
 | Ara Howard
 | NOAA Forecast Systems Laboratory
 | Information and Technology Services
 | Data Systems Group
 | R/FST 325 Broadway
 | Boulder, CO 80305-3328

 | Phone:  303-497-7238
 | Fax:    303-497-7259
 ====================================



Sat, 09 Apr 2005 23:14:35 GMT  
 rb_gc_register_address problem

Quote:

> In this case then what you want is probably something like

>   std::vector<char *> foo;
>   void func(int argc, VALUE * argv, VALUE self) {
>     int length;
>     char *str;
>     rb_str2cstr (argv[0], &length);
>     str = malloc (sizeof(char) * (length + 1));
>     strcpy (str, STR2CSTR(argv[0]));
>     foo.push_back(str);
>   }

This is one possible solution.  We went a different route, but thanks
for the suggestion.

Note, though, that this is not exception-safe.  vector::push_back can
throw an exception (std::bad_alloc).  If it does, then str is left
allocated.  One solution is to push a dummy onto the vector before
allocating memory.  Using your helper fuction:

  std::vector<char *> foo;
  void func(int argc, VALUE * argv, VALUE self) {
    foo.push_back(0);
    foo[foo.length()-1] = create_str_from_rubystr (argv[0]));
  }

The solution we went with was to leave the memory allocation to Ruby.  I
prefer to let Ruby's GC handle as much memory as possible, so that I
don't have to worry quite so hard about exception-safety.  The only
exceptions to this rule I see are:
  1) If I have many objects being allocated, I allocate them myself
  2) If I have resources allocated other than memory (such as open files
     or sockets), I also handle that myself.

Quote:
> My philosophy in Ruby-C integration: separate the C-part and the Ruby-part
> as much and as far as possible; we are better-off in the long run this
> way.

I agree.  This is particularly important if the API ever changes.  It
also allows porting extensions easily to other languages.  This is part
of the philosophy behind SWIG.

Paul



Sat, 09 Apr 2005 23:30:16 GMT  
 
 [ 28 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Problems, problems, problems

2. Eiffel Problems, Problems, Problems

3. Finn Idiom problems and Re: {rho} problem

4. Combinatorial Problem [ & a new Combinatorial Problem ]

5. Database problem/Memory problem??

6. CW 2003 - Focus problem & a select problem

7. 32bit problem - one problem solved & another found

8. GForth problem or my problem?

9. REXX DLL problems solved (was Re: Problems getting C subcommand to work)

10. HTTP-Access2 problem (was Ruby Google problem)

11. Problem with this excercise problem.

12. RSX problems and texture problems

 

 
Powered by phpBB® Forum Software