Aliasing through union, C++ vs. C
Author |
Message |
Martin Vuill #1 / 28
|
 Aliasing through union, C++ vs. C
It is my understanding that the C++ Standard makes it "undefined behaviour" to store one type in a member of a union, and then to refer to the contents of the union through a different member of the union, unless the latter is a char or unsigned char type. What I would like to know is whether C does or ever did make this "defined" behaviour? In other words, was this idiom ever correct, or is it just a habit programmers have gotten into because it "always seemed to work." MV -- Do not send e-mail to the above address. I do not read e-mail sent there.
[ about comp.lang.c++.moderated. First time posters: do this! ] --
|
Sun, 05 Sep 2004 01:39:03 GMT |
|
 |
Hans-Bernhard Broeke #2 / 28
|
 Aliasing through union, C++ vs. C
Quote: > What I would like to know is whether C does or ever > did make this "defined" behaviour?
No. It's just as undefined in C as it is in C++. Quote: > In other words, was this idiom ever correct, or is it just a habit > programmers have gotten into because it "always seemed to work."
The latter. --
Even if all the snow were burnt, ashes would remain. --
|
Mon, 06 Sep 2004 14:30:57 GMT |
|
 |
Jerry Coffi #3 / 28
|
 Aliasing through union, C++ vs. C
says... Quote: > It is my understanding that the C++ Standard makes > it "undefined behaviour" to store one type in a member > of a union, and then to refer to the contents of the > union through a different member of the union, unless > the latter is a char or unsigned char type. > What I would like to know is whether C does or ever > did make this "defined" behaviour?
For all practical purposes, it's always been undefined behavior. The possible exception is that if you look early enough in C's development, back when there was only one C compiler on earth, you could argue that "C" was defined as whatever that compiler accepted, and pretty much anything you could get away with using that compiler was "defined" behavior. OTOH, this was widely recognized as a dirty trick long before the C standard came along and officially said it was undefined. -- Later, Jerry. The Universe is a figment of its own imagination. --
|
Mon, 06 Sep 2004 14:31:30 GMT |
|
 |
Barry Schwar #4 / 28
|
 Aliasing through union, C++ vs. C
Quote: >It is my understanding that the C++ Standard makes >it "undefined behaviour" to store one type in a member >of a union, and then to refer to the contents of the >union through a different member of the union, unless >the latter is a char or unsigned char type. >What I would like to know is whether C does or ever >did make this "defined" behaviour? >In other words, was this idiom ever correct, or is it >just a habit programmers have gotten into because it >"always seemed to work."
Section 6.5.15, paragraph 3: "If the value being stored in an object is accessed from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined." <<Remove the del for email>> --
|
Mon, 06 Sep 2004 14:31:47 GMT |
|
 |
Geoff Fiel #5 / 28
|
 Aliasing through union, C++ vs. C
Quote: > It is my understanding that the C++ Standard makes > it "undefined behaviour" to store one type in a member > of a union, and then to refer to the contents of the > union through a different member of the union, unless > the latter is a char or unsigned char type.
"Undefined behaviour" in this context means that the results quite simply depend on the implementation. In other words, it depends on the endianness of the platform, alignment issues and padding, to name but a few things that differ from system to system. Even something like an unqualified char type may have different behaviours on different platforms. Regardless, a pointer to the first byte of any member of the union will always point to the first byte of all other members of the union. Whether this yields useful information is another matter that is very dependant on how the compiler and platform implement each data type. Quote: > What I would like to know is whether C does or ever > did make this "defined" behaviour?
No. It has never been fully "defined" behaviour to the best of my knowledge. Quote: > In other words, was this idiom ever correct, or is it > just a habit programmers have gotten into because it > "always seemed to work."
It is a useful trick for platform-dependant code. As soon as you move to a different platform, you'll have to revisit the code to allow for the issues mentioned above. Geoff -- Geoff Field, Professional geek, amateur stage-levelling gauge.
au My band's web page: http://www.geocities.com/southernarea/ --
|
Mon, 06 Sep 2004 14:32:16 GMT |
|
 |
Douglas A. Gwy #6 / 28
|
 Aliasing through union, C++ vs. C
Quote:
> It is my understanding that the C++ Standard makes > it "undefined behaviour" to store one type in a member > of a union, and then to refer to the contents of the > union through a different member of the union, unless > the latter is a char or unsigned char type. > What I would like to know is whether C does or ever > did make this "defined" behaviour?
Actually C++ followed C's lead. union members *overlap*, so without platform-specific restrictions on representation for the various types, reading a value as a different type than was stored should be expected to yield nonsense at best and an exception (trap) at worst, which is *why* it was put in the "undefined behavior" category. An exception is made for accessing as "raw bytes", which in C99 means as (array of) unsigned char type; any object storage with any contents can be safely accessed as raw bytes. Quote: > In other words, was this idiom ever correct, or is it > just a habit programmers have gotten into because it > "always seemed to work."
I was unaware that it is in wide use. The canonical example was: union u { long l; short s[2]; } x; // ... x.l = 0x11110000; lo_word = x.s[0]; big_endian = lo_word != 0; which is not perfectly portable, but worked on enough platforms that one encountered it occasionally. By now one hopes that such code has been changed to be portable. --
|
Mon, 06 Sep 2004 14:32:44 GMT |
|
 |
Jack Klei #7 / 28
|
 Aliasing through union, C++ vs. C
comp.lang.c.moderated: Quote: > It is my understanding that the C++ Standard makes > it "undefined behaviour" to store one type in a member > of a union, and then to refer to the contents of the > union through a different member of the union, unless > the latter is a char or unsigned char type.
Correct. Quote: > What I would like to know is whether C does or ever > did make this "defined" behaviour?
No. Quote: > In other words, was this idiom ever correct, or is it > just a habit programmers have gotten into because it > "always seemed to work." > MV
The behavior is undefined in C if you access a member of a union by a different type than the one you used to store it, unless that access is by the type unsigned char. Note that even though neither standard specifically states it, access via a signed char type, or via plain char if plain char happens to be signed, might still result in undefined behavior, because both languages allow trap representations in signed character types. -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq --
|
Mon, 06 Sep 2004 14:32:46 GMT |
|
 |
James Kan #8 / 28
|
 Aliasing through union, C++ vs. C
Quote: > It is my understanding that the C++ Standard makes it "undefined > behaviour" to store one type in a member of a union, and then to > refer to the contents of the union through a different member of the > union, unless the latter is a char or unsigned char type. > What I would like to know is whether C does or ever did make this > "defined" behaviour?
It probably depends on what you define as C. No ISO standard has ever permitted it. On the other hand, K&R 1 don't even have the concept of "undefined behavior", and in the early days, it was one of the accepted ways of type punning. Quote: > In other words, was this idiom ever correct, or is it just a habit > programmers have gotten into because it "always seemed to work."
Given the amount of existing code which depends on it, it's a pretty good bet that no implementation will dare break it. --
Beratung in objektorientierer Datenverarbeitung -- -- Conseils en informatique oriente objet Ziegelhttenweg 17a, 60598 Frankfurt, Germany, Tl.: +49 (0)69 19 86 27 --
|
Mon, 06 Sep 2004 14:34:53 GMT |
|
 |
Lothar Schumache #9 / 28
|
 Aliasing through union, C++ vs. C
Quote:
> It is my understanding that the C++ Standard makes > it "undefined behaviour" to store one type in a member > of a union, and then to refer to the contents of the > union through a different member of the union, unless > the latter is a char or unsigned char type. > What I would like to know is whether C does or ever > did make this "defined" behaviour? > In other words, was this idiom ever correct, or is it > just a habit programmers have gotten into because it > "always seemed to work." >From the rather up-to-date "C: A Reference Manual"
(http://www.careferencemanual.com/), p.165: "5.7.4 (Mis)using Union Types Unions are used in a nonportable fashion any time a union component is referenced when the last assignment to the union was not through the same component." So there is no difference between C and C++ in this point. --
--
|
Mon, 06 Sep 2004 14:34:58 GMT |
|
 |
Andrea Mur #10 / 28
|
 Aliasing through union, C++ vs. C
Does even this code makes an undefined behaviour? union Union { int* a_pointer; const int* a_const_pointer; Quote: }
void AFunction(const int*); //... int x; Union an_union; an_union.a_pointer = &x; // Assign a_pointer AFunction(an_union.a_const_pointer); // Uses a_const_pointer --
|
Tue, 07 Sep 2004 12:11:03 GMT |
|
 |
Kenneth Brod #11 / 28
|
 Aliasing through union, C++ vs. C
Quote:
> > It is my understanding that the C++ Standard makes > > it "undefined behaviour" to store one type in a member > > of a union, and then to refer to the contents of the > > union through a different member of the union, unless > > the latter is a char or unsigned char type. [...] > > In other words, was this idiom ever correct, or is it > > just a habit programmers have gotten into because it > > "always seemed to work." > It is a useful trick for platform-dependant code. As soon > as you move to a different platform, you'll have to revisit > the code to allow for the issues mentioned above.
On a similar note... Is the following "legal"? union foobar { long foo; unsigned char bar[sizeof(long)]; }; Can you then legally (ie: with defined behavior) access the bytes of "long foo" via bar[] ? If you copied the bytes out of bar[] and then later copied them back, are you guaranteed to have the same value in foo as before? (Yes, I know that the actual data in the bytes is system-dependent.) -- +---------+----------------------------------+-----------------------------+ | Kenneth | kenbrody at spamcop.net | "The opinions expressed | | J. | | herein are not necessarily | | Brody | http://www.hvcomputer.com | those of fP Technologies." | +---------+----------------------------------+-----------------------------+ GCS (ver 3.12) d- s+++: a C++$(+++) ULAVHSC^++++$ P+>+++ L+(++) E-(---)
DI+(++++) D---() G e* h---- r+++ y? --
|
Tue, 07 Sep 2004 12:12:32 GMT |
|
 |
Branimir Maksimovi #12 / 28
|
 Aliasing through union, C++ vs. C
> It is my understanding that the C++ Standard makes > it "undefined behaviour" to store one type in a member > of a union, and then to refer to the contents of the > union through a different member of the union, unless > the latter is a char or unsigned char type. > > What I would like to know is whether C does or ever > did make this "defined" behaviour? > > In other words, was this idiom ever correct, or is it > just a habit programmers have gotten into because it > "always seemed to work." Well, it seems to work and is very usefull (for low level programming), eg: #include <iostream> using namespace std; union IER{ unsigned char data; struct{ unsigned data_avail_interrupt:1; // 1 enable, 0 disable unsigned THRE_interrupt:1; // 1 enable, 0 disable unsigned line_status_report:1; // 1 enable, 0 disable unsigned modem_status_change:1;// 1 enable, 0 disable unsigned reserved:4; // always 0 }; IER(unsigned da=0, unsigned thre=0, unsigned lsr=0, unsigned msc=0) : data_avail_interrupt(da), THRE_interrupt(thre), line_status_report(lsr), modem_status_change(msc), reserved(0) { } Quote: };
int main() { IER p(1,0,1); cout<<hex<<(int)p.data<<'\n'; Quote: }
Greetings, Bane.
[ about comp.lang.c++.moderated. First time posters: do this! ] --
|
Tue, 07 Sep 2004 12:12:34 GMT |
|
 |
Geoff Fiel #13 / 28
|
 Aliasing through union, C++ vs. C
[snip] Quote: > On a similar note... > Is the following "legal"? > union foobar > { > long foo; > unsigned char bar[sizeof(long)]; > }; > Can you then legally (ie: with defined behavior) access the bytes of > "long foo" via bar[] ?
Yes, but as you note below it's *highly* system-dependant. Quote: > If you copied the bytes out of bar[] and then > later copied them back, are you guaranteed to have the same value in > foo as before?
I don't know about "guaranteed", but it's highly likely that you will on most platforms. Quote: > (Yes, I know that the actual data in the bytes is system-dependent.)
Extremely. Geoff -- Geoff Field, Professional geek, amateur stage-levelling gauge.
au My band's web page: http://www.geocities.com/southernarea/ --
|
Tue, 07 Sep 2004 22:18:09 GMT |
|
 |
Anthony William #14 / 28
|
 Aliasing through union, C++ vs. C
Quote:
> > > It is my understanding that the C++ Standard makes > > > it "undefined behaviour" to store one type in a member > > > of a union, and then to refer to the contents of the > > > union through a different member of the union, unless > > > the latter is a char or unsigned char type. > Is the following "legal"? > union foobar > { > long foo; > unsigned char bar[sizeof(long)]; > }; > Can you then legally (ie: with defined behavior) access the bytes of > "long foo" via bar[] ? If you copied the bytes out of bar[] and then > later copied them back, are you guaranteed to have the same value in > foo as before?
Yes. This is the special case --- it is always possible to access union data through an unsigned char or array of unsigned char member. Since only PODs can be union members, and copying the memory occupied by a POD away and back again preserves its value, you are guaranteed to have the same value in foo as before. Anthony -- Anthony Williams Software Engineer, Nortel Networks Optical Components Ltd The opinions expressed in this message are not necessarily those of my employer --
|
Tue, 07 Sep 2004 22:19:02 GMT |
|
 |
Hans-Bernhard Broeke #15 / 28
|
 Aliasing through union, C++ vs. C
[...] Quote: > Is the following "legal"? > union foobar > { > long foo; > unsigned char bar[sizeof(long)]; > };
Yes, it's a well-defined type of type definition. Quote: > Can you then legally (ie: with defined behavior) access the bytes of > "long foo" via bar[] ?
Yes. But the result you get is implementation-defined. There may be garbage bits in bar[] that you wouldn't have been able to see in foo; Quote: > If you copied the bytes out of bar[] and then later copied them > back, are you guaranteed to have the same value in foo as before?
AFAIK: no. Because the moment you read foo after having written to bar[], you're causing undefined behaviour. Your machine may rightfully jump into your face the instant you do that. --
Even if all the snow were burnt, ashes would remain. --
|
Tue, 07 Sep 2004 22:19:13 GMT |
|
|
Page 1 of 2
|
[ 28 post ] |
|
Go to page:
[1]
[2] |
|