using <string.h> functions on non-character objects?
Author |
Message |
Eric Smit #1 / 9
|
 using <string.h> functions on non-character objects?
I posted code similar to this on sci.crypt: void foo (int x, y) { volatile int bar [SIZE]; do_some_stuff (); bar [x] = y; do_some_other_stuff (); memset (bar, 0, sizeof (bar)); }
Quote: > It's not *semantically* valid.
The two relevant sections of ISO/IEC 9899 seem to be: Section 6.2.6.1 paragraph 5: Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation. Section 7.21.1 paragraph 1: The header <string.h> declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type. [...] Can there be a trap representation for a character? I can't find anything that says that there can't. If there can, then no operations defined in <string.h> are useful on "other objects treated as arrays of character type." Can an all-binary-zeros representation be a trap representation for an integer? Does calling memset() with a pointer to an integer array as the first argument constitute a "side effect that modifies all or any part of the object by an lvalue expression that does not have character type"? And it sounds like passing any object of a non-character type to a string function may have undefined behavior because of the second sentence of 6.2.6.1 paragraph 5. Is it really the intent that writing a trap representation to an object using <string.h> functions should have undefined behavior? (I can certainly see why reading those objects afterware would have undefined behavior.) If this is really the case, doesn't that mean that calloc() can only be used on characters and character arrays? Section 7.20.3.1 paragraph 2: The calloc function allocates space for an array of nmemb objects, each of whose size is size. The space is initialized to all bits zero. 252) 252) Note that this need not be the same as the representation of floating-point zero or a null pointer constant. Therefore, it sounds like doing a calloc() and storing the result as a pointer to an array of integers (or most other types) can result in undefined behavior, as soon as the resulting object is read. If this is all true, it sounds like the only way to "zero" a non-integer array is to iterate over all the elements, storing zero into them. I suppose that's not too bad for an array of integers, because the compiler might be smart enough to optimize it, but trying to zero an array of structs will be abysmal.
|
Sun, 27 Jun 2004 10:42:07 GMT |
|
 |
David Rubi #2 / 9
|
 using <string.h> functions on non-character objects?
[snip - using memset to initialize types other than array of char] Quote: > If this is all true, it sounds like the only way to "zero" a non-integer > array is to iterate over all the elements, storing zero into them. I suppose > that's not too bad for an array of integers, because the compiler might > be smart enough to optimize it, but trying to zero an array of structs > will be abysmal.
Unless you use an initializer or make the variable static. However, it certainly seems like the original intent of memcpy was to initialize all types...unless the only type at the time was char. david -- If 91 were prime, it would be a counterexample to your conjecture. -- Bruce Wheeler
|
Sun, 27 Jun 2004 08:30:16 GMT |
|
 |
Richard Heathfiel #3 / 9
|
 using <string.h> functions on non-character objects?
Quote:
> I posted code similar to this on sci.crypt: > void foo (int x, y) > { > volatile int bar [SIZE]; > do_some_stuff (); > bar [x] = y; > do_some_other_stuff (); > memset (bar, 0, sizeof (bar)); > }
> > It's not *semantically* valid. > The two relevant sections of ISO/IEC 9899 seem to be:
<snip> Yes, those were both relevant. Quote: > Can there be a trap representation for a character? I can't find anything > that says that there can't.
I'm reasonably sure there can, but for /characters/ all-bits-zero cannot be a trap representation. There was a thread on this recently in comp.lang.c entitled "Extracing a substring (fast)" - complete with typo! - in which the following conclusion was reached (delimited by +++ signs): +++++++++++++ On Thursday, in article
Quote:
><snip> >> C99 6.2.6.2p5 says >> "The values of any padding bits are unspecified. A valid (non-trap) >> object representation of a signed integer type where the sign bit >> is zero is a valid object representation of the corresponding >> unsigned type, and shall represent the same value." >> unsigned char cannot have padding bits but signed char can if its range >> of values is sufficiently small. For example UCHAR_MAX==65535 and >> SCHAR_MAX=127 is valid and allows for 8 padding bits in a signed char. >> However for nonnegative values that they have in common signed char >> and unsigned char must use the same representation. So in the example >> here, for any value 0-127 written to the signed char all 8 padding bits >> must be zeroed. When reading the value of an object using a signed char >> lvalue the padding bits can be ignored. >If I am reading this aright, it implies that (even though signed char >may have padding bits) memset(signedchararray, 0, sizeof >signedchararray) gives you the expected array-full of '\0' characters, >and thus memset(a, 0, sizeof a) works for char, signed char, and >unsigned char arrays.
Correct. +++++++++++++ Quote: > If there can, then no operations defined > in <string.h> are useful on "other objects treated as arrays of character > type."
Not so. The memcpy, memmove, memcmp, and memchr functions can all be used safely in such a way. It's just memset that has problems, and that's because it doesn't just look at or copy bits - it actually /sets/ them, without any knowledge of their underlying object type. (See below, paragraph starting "Except with memset".) Quote: > Can an all-binary-zeros representation be a trap representation for an > integer?
Yes (except for the three types char, signed char, and unsigned char). The Standard allows integers to have padding bits, and the implementation is allowed to use those bits (for example, for parity checking). Consider an implementation which mandates odd parity for its integers, and traps if it discovers even parity in any integer. memset(myintarray, 0, sizeof myintarray) would trap on such an implementation. Quote: > Does calling memset() with a pointer to an integer array as the first > argument constitute a "side effect that modifies all or any part > of the object by an lvalue expression that does not have character type"?
The honest answer here is "I don't know and I don't have time right now to find out", so I'll pass on this one. :-) Quote: > And it sounds like passing any object of a non-character type to a string > function may have undefined behavior because of the second sentence > of 6.2.6.1 paragraph 5.
Except with memset(), you're actually all right here because the other mem* functions work by copying whole bytes, including any padding bits. So, provided those bytes were set correctly to start with, copying them is well-defined. Quote: > Is it really the intent that writing a trap representation to an object > using <string.h> functions should have undefined behavior?
That's really a question for comp.std.c IMHO. Quote: > (I can > certainly see why reading those objects afterware would have undefined > behavior.)
Right. Quote: > If this is really the case, doesn't that mean that calloc() can only > be used on characters and character arrays?
Yes, if you want well-defined behaviour. Quote: > Section 7.20.3.1 paragraph 2: > The calloc function allocates space for an array of nmemb > objects, each of whose size is size. The space is initialized > to all bits zero. 252) > 252) Note that this need not be the same as the representation > of floating-point zero or a null pointer constant. > Therefore, it sounds like doing a calloc() and storing the result > as a pointer to an array of integers (or most other types) can > result in undefined behavior, as soon as the resulting object is > read.
Correct, except that you don't have to wait that long. :-) Quote: > If this is all true, it sounds like the only way to "zero" a non-integer > array is to iterate over all the elements, storing zero into them.
Or you can initialise them at declaration: int blankarray[100] = {0}; /* guaranteed to zero out everything */ If you need to blank them later, you can do so with memcpy at the expense of some memory: memcpy(workingarray, blankarray, sizeof workingarray); Quote: > I suppose > that's not too bad for an array of integers, because the compiler might > be smart enough to optimize it, but trying to zero an array of structs > will be abysmal.
Again, the {0} trick works beautifully. --
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999. C FAQ: http://www.eskimo.com/~scs/C-faq/top.html K&R answers, C books, etc: http://users.powernet.co.uk/eton
|
Sun, 27 Jun 2004 18:20:59 GMT |
|
 |
Lawrence Kir #4 / 9
|
 using <string.h> functions on non-character objects?
Quote: >I posted code similar to this on sci.crypt: > void foo (int x, y) > { > volatile int bar [SIZE]; > do_some_stuff (); > bar [x] = y; > do_some_other_stuff (); > memset (bar, 0, sizeof (bar)); > }
>> It's not *semantically* valid. >The two relevant sections of ISO/IEC 9899 seem to be: > Section 6.2.6.1 paragraph 5: > Certain object representations need not represent a value of > the object type. If the stored value of an object has such a > representation and is read by an lvalue expression that does > not have character type, the behavior is undefined. If such a > representation is produced by a side effect that modifies all > or any part of the object by an lvalue expression that does > not have character type, the behavior is undefined. Such a > representation is called a trap representation. > Section 7.21.1 paragraph 1: > The header <string.h> declares one type and several functions, and > defines one macro useful for manipulating arrays of character type > and other objects treated as arrays of character type. [...] >Can there be a trap representation for a character?
There cannot be trap representations for unsigned char. 6.2.6.1p3 implies this. 5.2.4.2.1p2 says "UCHAR_MAX shall equal (2 (to the power of) CHAR_BIT)-1. For that to be possible every bit pattern in a byte must represent a value as an unsigned char. There is no room for trap representations. Quote: > I can't find anything >that says that there can't. If there can, then no operations defined >in <string.h> are useful on "other objects treated as arrays of character >type."
6.2.6.1p5 indicates that accessing any object with a character typed lvalue does not produce undefined behaviour. So character types can't trap. Quote: >Can an all-binary-zeros representation be a trap representation for an >integer?
Yes, for any integer type other than character types. Apart from the arguments above we know that all-bits-zero is a valid representation of zero for unsigned char. Because of 6.2.5p9 (also 6.2.6.2p5) it must also be a valid representation of zero for signed char. Taking these together the same must also be true for plain char. Quote: >Does calling memset() with a pointer to an integer array as the first >argument constitute a "side effect that modifies all or any part >of the object by an lvalue expression that does not have character type"?
No, 7.21.1.p1 says objects are treated as arrays of character type. Quote: >And it sounds like passing any object of a non-character type to a string >function may have undefined behavior because of the second sentence >of 6.2.6.1 paragraph 5.
Again, the functions behave as if they access objects character by character. Quote: >Is it really the intent that writing a trap representation to an object >using <string.h> functions should have undefined behavior? (I can >certainly see why reading those objects afterware would have undefined >behavior.)
6.2.6.1p4 and p5 (also see the footnotes) show that that is not the intent. Quote: >If this is really the case, doesn't that mean that calloc() can only >be used on characters and character arrays?
Yes, that is true along with memset(). This is pointed out quite regularly in comp.lang.c. :-) Quote: > Section 7.20.3.1 paragraph 2: > The calloc function allocates space for an array of nmemb > objects, each of whose size is size. The space is initialized > to all bits zero. 252) > 252) Note that this need not be the same as the representation > of floating-point zero or a null pointer constant. >Therefore, it sounds like doing a calloc() and storing the result >as a pointer to an array of integers (or most other types) can >result in undefined behavior, as soon as the resulting object is >read.
Correct. Quote: >If this is all true, it sounds like the only way to "zero" a non-integer >array is to iterate over all the elements, storing zero into them. I suppose >that's not too bad for an array of integers, because the compiler might >be smart enough to optimize it, but trying to zero an array of structs >will be abysmal.
Initialisation can be used to correctly set structure members to 0, 0.0, null as appropriate. One portable way of correctly zeroing the members of an array of structures would be to initialise an instance of the structure to zero and then copy that to each element of the array. -- -----------------------------------------
-----------------------------------------
|
Sun, 27 Jun 2004 20:46:40 GMT |
|
 |
Pai-Yi HSIA #5 / 9
|
 using <string.h> functions on non-character objects?
Quote:
> >Can there be a trap representation for a character? > There cannot be trap representations for unsigned char. 6.2.6.1p3 > implies this. 5.2.4.2.1p2 says "UCHAR_MAX shall equal > (2 (to the power of) CHAR_BIT)-1. For that to be possible every bit > pattern in a byte must represent a value as an unsigned char. There is > no room for trap representations.
Right. Quote: > 6.2.6.1p5 indicates that accessing any object with a character typed > lvalue does not produce undefined behaviour. > So character types can't trap.
Question is that can there be a trap representation for signed char type object? Is 6.2.6.1p5 sufficient to imply no trap representation for signed char? Quote: > >Can an all-binary-zeros representation be a trap representation for an > >integer? > Yes, for any integer type other than character types.
No, no trap of the form of all-binary-zeros. When sign bit is zero, the value of the signed integer is not affected by the rule described in 6.2.6.2p2. The interger value of all-binary-zeros is therefore the same as the value of the unsiged integer with all-binary-zeros, which is zero.
|
Sun, 27 Jun 2004 22:53:03 GMT |
|
 |
Pai-Yi HSIA #6 / 9
|
 using <string.h> functions on non-character objects?
Quote:
> >Can an all-binary-zeros representation be a trap representation for an > >integer? > Yes, for any integer type other than character types. Apart from the > arguments above we know that all-bits-zero is a valid representation > of zero for unsigned char. Because of 6.2.5p9 (also 6.2.6.2p5) it > must also be a valid representation of zero for signed char. Taking > these together the same must also be true for plain char.
After having read the other thread "Extracing a substring (fast)" which Richard Heathfield indicated, I stand to correct my previous reply. The interger type other than character one can has a trap representation of all-binary-zeros. Sorry for my ignorance. :-) I have a question for what C99 6.2.5p9 said: "The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same." If a system uses 1's complement or sign/magnitude integer representation, 0 has two representations in signed type. How can the representation of the same value in each type be the same? Does it imply that for a system with 1's complement or sign/magnitude integer representation, "-0" need to be a trap? thank you. paiyi
|
Mon, 28 Jun 2004 02:14:17 GMT |
|
 |
Lawrence Kir #7 / 9
|
 using <string.h> functions on non-character objects?
On Wednesday, in article
... Quote: >I have a question for what C99 6.2.5p9 said: >"The range of nonnegative values of a signed integer type is a subrange > of the corresponding unsigned integer type, and the representation of > the same value in each type is the same." >If a system uses 1's complement or sign/magnitude integer representation, >0 has two representations in signed type.
It can have 2 representations of 0, or the one with the sign bit set can be a trap representation. Quote: >How can the representation of the same value in each type be the same?
When there are 2 representations of 0 one is referred to as the positive representation and the other the negative representation (i.e. with ign bit clear and set respectively). The quote above refers to nonnegative values and must be taken to exclude the "negative zero". The wording could perhaps be better but there's really no other way to interpret it. Quote: >Does it imply that for a system with 1's complement or sign/magnitude >integer representation, "-0" need to be a trap?
No, 2 representation of zero are explicitly allowed by 6.2.6.2p2 which also defines the term "negative zero". -- -----------------------------------------
-----------------------------------------
|
Mon, 28 Jun 2004 22:03:44 GMT |
|
 |
Lawrence Kir #8 / 9
|
 using <string.h> functions on non-character objects?
On Wednesday, in article
Quote:
>> >Can there be a trap representation for a character? >> There cannot be trap representations for unsigned char. 6.2.6.1p3 >> implies this. 5.2.4.2.1p2 says "UCHAR_MAX shall equal >> (2 (to the power of) CHAR_BIT)-1. For that to be possible every bit >> pattern in a byte must represent a value as an unsigned char. There is >> no room for trap representations. >Right. >> 6.2.6.1p5 indicates that accessing any object with a character typed >> lvalue does not produce undefined behaviour. >> So character types can't trap. >Question is that > can there be a trap representation for signed char type object?
I'll answer your question with a question: what would be the significance of a trap representation that can't trap? Quote: >Is 6.2.6.1p5 sufficient to imply no trap representation for signed char?
I would say so unless you can think of implications of trap representations that don't involve undefined behaviour. -- -----------------------------------------
-----------------------------------------
|
Mon, 28 Jun 2004 21:59:24 GMT |
|
 |
Pai-Yi HSIA #9 / 9
|
 using <string.h> functions on non-character objects?
Quote:
> >> 6.2.6.1p5 indicates that accessing any object with a character typed > >> lvalue does not produce undefined behaviour. > >> So character types can't trap. > >Question is that > > can there be a trap representation for signed char type object? > I'll answer your question with a question: what would be the significance > of a trap representation that can't trap?
There is hence some contradiction. To avoid the contradiction is to let signed char type object have no trap representation. Quote: > >Is 6.2.6.1p5 sufficient to imply no trap representation for signed char? > I would say so unless you can think of implications of trap representations > that don't involve undefined behaviour.
I can not find such implications. As you have pointed out that a signed char may contained padding bits, any bit pattern in the padding can not cause trap. paiyi
|
Tue, 29 Jun 2004 20:51:02 GMT |
|
|
|