Structures of indefinite size 
Author Message
 Structures of indefinite size

As a simple example, I would like to use what is sometimes called a
"counted string".  Instead of its extent being indicated by a sentinel
\0, I would like the first two bytes to contain the length, and the
other bytes the value.

The maximum length, theoretically, of such a structure could be 65537
bytes, but one wouldn't want to preallocate that much space for every
variable of that type that might be used.  Rather, a function creating
such a variable would get only as much memory as it needs, via
malloc() or whatever, for each particular instance and provide for its
being referenced with a pointer.

How can I define such a structure with a struct or typedef?

Is there a way for C to do this without fudging?



Tue, 06 Dec 2005 12:13:30 GMT  
 Structures of indefinite size

Quote:

> As a simple example, I would like to use what is sometimes called a
> "counted string".  Instead of its extent being indicated by a sentinel
> \0, I would like the first two bytes to contain the length, and the
> other bytes the value.

> The maximum length, theoretically, of such a structure could be 65537
> bytes, but one wouldn't want to preallocate that much space for every
> variable of that type that might be used.  Rather, a function creating
> such a variable would get only as much memory as it needs, via
> malloc() or whatever, for each particular instance and provide for its
> being referenced with a pointer.

> How can I define such a structure with a struct or typedef?

There are two issues associated with your question.  First of
all, your definition of a counted string seems incomplete for
efficient use with C.  This is because C provides no way to find
out the space allocated for a malloc()'d block.  Thus, the choice
is between *always* calling realloc() whenever the length of a
counted string changes, or storing the amount of space allocated
along with the length in the counted string structure.  The
latter is vastly preferable if there is no strict requirement
that the counted string structure only contain a length.

That said, the idea of including a variable-length array in a C
structure has a long and spotted history.  The typical way to do
it in C90 is this:
        struct string {
                unsigned short capacity;
                unsigned short length;
                char text[1];
        };
Then a string with capacity for N characters can be allocated
with malloc(sizeof(struct string) + (N - 1)).  The problem is
that this is, strictly speaking, not allowed by the standard.
Many people just don't care, though, because it normally works.

C99 has a solution.  In C99, you can instead write this:
        struct string {
                unsigned short capacity;
                unsigned short length;
                char text[];
        };
Then a string with capacity for N characters can be allocated
with malloc(sizeof(struct string) + N).  The problem is
that this is not supported yet by many actual compilers.

In practice, you might as well use the first method for now.  It
will probably work (i.e. it works everywhere I can think of).  In
the future, as C99 becomes prevalent, you can switch to the C99
method.

Here is what the FAQ says:

2.6:    I came across some code that declared a structure like this:

                struct name {
                        int namelen;
                        char namestr[1];
                };

        and then did some tricky allocation to make the namestr array
        act like it had several elements.  Is this legal or portable?

A:      This technique is popular, although Dennis Ritchie has called it
        "unwarranted chumminess with the C implementation."  An official
        interpretation has deemed that it is not strictly conforming
        with the C Standard, although it does seem to work under all
        known implementations.  (Compilers which check array bounds
        carefully might issue warnings.)

        Another possibility is to declare the variable-size element very
        large, rather than very small; in the case of the above example:

                ...
                char namestr[MAXSIZE];

        where MAXSIZE is larger than any name which will be stored.
        However, it looks like this technique is disallowed by a strict
        interpretation of the Standard as well.  Furthermore, either of
        these "chummy" structures must be used with care, since the
        programmer knows more about their size than the compiler does.
        (In particular, they can generally only be manipulated via
        pointers.)

        C9X will introduce the concept of a "flexible array member",
        which will allow the size of an array to be omitted if it is
        the last member in a structure, thus providing a well-defined
        solution.

        References: Rationale Sec. 3.5.4.2; C9X Sec. 6.5.2.1.
--
"Some programming practices beg for errors;
 this one is like calling an 800 number
 and having errors delivered to your door."
--Steve McConnell



Tue, 06 Dec 2005 12:32:52 GMT  
 Structures of indefinite size
Thank you for your lucid explanation.

This is what I thought, and what I have always done when needing such
a structure (and I seem to need them often.).

I'm glad that a cleaner solution is in the works for a future
standard.



Tue, 06 Dec 2005 12:56:37 GMT  
 Structures of indefinite size

Quote:

> As a simple example, I would like to use what is sometimes called
> a "counted string".  Instead of its extent being indicated by a
> sentinel \0, I would like the first two bytes to contain the
> length, and the other bytes the value.

> The maximum length, theoretically, of such a structure could be
> 65537 bytes, but one wouldn't want to preallocate that much space
> for every variable of that type that might be used.  Rather, a
> function creating such a variable would get only as much memory
> as it needs, via malloc() or whatever, for each particular
> instance and provide for its being referenced with a pointer.

> How can I define such a structure with a struct or typedef?

> Is there a way for C to do this without fudging?

There have been many suggestions, and most before C99 and VLA
arrays require some sort of fudge.  What I prefer is:

   struct countedstring {
      size_t  capacity;
      size_t  length;
      char   *data;
   }

where the data field can be mallocd and reallocd as needed,
keeping track of available space in the capacity field.  If kept
up to date, the length field can avoid possibly inefficient
scanning of data.  data can be a normal '\0' terminated field, and
thus is compatible with existing routines.  Properly used this is
completely standard, works with C89 and C99, and requires no
fudges whatsoever.

Since there is already considerable overhead to using such, it
would be useful to include an additional general purpose pointer
field, typed as void *.  This allows creation of lists of strings,
sorting, all sorts of interesting games.  Or it can be ignored.

--

   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>  USE worldnet address!



Tue, 06 Dec 2005 13:31:36 GMT  
 Structures of indefinite size

Quote:

> That said, the idea of including a variable-length array in a C
> structure has a long and spotted history.  The typical way to do
> it in C90 is this:
>         struct string {
>                 unsigned short capacity;
>                 unsigned short length;
>                 char text[1];
>         };
> Then a string with capacity for N characters can be allocated
> with malloc(sizeof(struct string) + (N - 1)).  The problem is
> that this is, strictly speaking, not allowed by the standard.
> Many people just don't care, though, because it normally works.

> C99 has a solution.  In C99, you can instead write this:
>         struct string {
>                 unsigned short capacity;
>                 unsigned short length;
>                 char text[];
>         };
> Then a string with capacity for N characters can be allocated
> with malloc(sizeof(struct string) + N).  The problem is
> that this is not supported yet by many actual compilers.

    A third method that works with both C90 and C99 is
to omit the `text' part of the struct altogether and just
rely on the knowledge that the characters follow the
struct.  A macro can hide the details:

        struct string {
            unsigned short capacity;
            unsigned short length;
        };
        #define TEXT(stringptr) (char*)((stringptr) + 1)

    As with the C99 method, you allocate `sizeof(struct string)'
plus `capacity' characters.  (BTW, in the C90 hack it is marginally
better to allocate `offsetof(struct string, text) + capacity'.
The method Ben outlines will work, but may waste a few bytes.)

        struct string *p;
        p = malloc(sizeof struct string) + capacity;
        assert (p != NULL);
        p->capacity = capacity;
        p->length = length;
        memcpy (TEXT(p), source, length);
        ...
        if (TEXT(p)[42] == '?') ...

--



Tue, 06 Dec 2005 22:22:05 GMT  
 Structures of indefinite size

Quote:
>That said, the idea of including a variable-length array in a C
>structure has a long and spotted history.  The typical way to do
>it in C90 is this:

As you say, I've never seen an implementation that doesn't work
properly with this, but I agree with Ritchie about the "unwarranted
chumminess with the C implementation."  In practice, I've never seen a
situation where      

struct string {
                unsigned short capacity;
                unsigned short length;
                char *text;
        };

didn't work just as well, and it allows changing the capacity without
reallocating the entire structure.

--
Al Balmer
Balmer Consulting



Wed, 07 Dec 2005 01:21:04 GMT  
 Structures of indefinite size


Quote:

> > That said, the idea of including a variable-length array in a C
> > structure has a long and spotted history.  The typical way to do
> > it in C90 is [struct hack ...] In C99, [FAM...]
>     A third method that works with both C90 and C99 is
> to omit the `text' part of the struct altogether and just
> rely on the knowledge that the characters follow the
> struct.  A macro can hide the details:

>    struct string {
>        unsigned short capacity;
>        unsigned short length;
>    };
>    #define TEXT(stringptr) (char*)((stringptr) + 1)

Just to note:  this method is only guaranteed to work for (the three
flavors of) char, because they cannot have nontrivial alignment; it
can be adjusted for other types, but not as easily as the struct-hack
or FAM methods.  The OP did ask for strings (which must be char, or
maybe wchar_t) although the subject line is phrased more generally.

- David.Thompson1 at worldnet.att.net



Mon, 12 Dec 2005 11:17:27 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Find size of variable size structure?

2. Indefinite-length array as member of struct: how?

3. event for Size structure

4. Structure problem - Size not known???

5. how to find the size of a private structure

6. how to copy a structure of record to another structure with type,size in the menber of structure...

7. binary file and different size of structures

8. zero sized structures

9. Help Me !! some strange in a structure size

10. Size of structure containing char fields

11. size of structure

12. Size of structures

 

 
Powered by phpBB® Forum Software