Aliasing through union, C++ vs. C 
Author Message
 Aliasing through union, C++ vs. C

It is my understanding that the C++ Standard makes
it "undefined behaviour" to store one type in a member
of a union, and then to refer to the contents of the
union through a different member of the union, unless
the latter is a char or unsigned char type.

What I would like to know is whether C does or ever
did make this "defined" behaviour?

In other words, was this idiom ever correct, or is it
just a habit programmers have gotten into because it
"always seemed to work."

MV

--
Do not send e-mail to the above address. I do not read e-mail sent there.


      [ about comp.lang.c++.moderated. First time posters: do this! ]
--



Sun, 05 Sep 2004 01:39:03 GMT  
 Aliasing through union, C++ vs. C

Quote:
> What I would like to know is whether C does or ever
> did make this "defined" behaviour?

No.  It's just as undefined in C as it is in C++.

Quote:
> In other words, was this idiom ever correct, or is it just a habit
> programmers have gotten into because it "always seemed to work."

The latter.
--

Even if all the snow were burnt, ashes would remain.
--



Mon, 06 Sep 2004 14:30:57 GMT  
 Aliasing through union, C++ vs. C

says...

Quote:
> It is my understanding that the C++ Standard makes
> it "undefined behaviour" to store one type in a member
> of a union, and then to refer to the contents of the
> union through a different member of the union, unless
> the latter is a char or unsigned char type.

> What I would like to know is whether C does or ever
> did make this "defined" behaviour?

For all practical purposes, it's always been undefined behavior.

The possible exception is that if you look early enough in C's
development, back when there was only one C compiler on earth, you
could argue that "C" was defined as whatever that compiler accepted,
and pretty much anything you could get away with using that compiler
was "defined" behavior.  OTOH, this was widely recognized as a dirty
trick long before the C standard came along and officially said it
was undefined.

--
    Later,
    Jerry.

The Universe is a figment of its own imagination.
--



Mon, 06 Sep 2004 14:31:30 GMT  
 Aliasing through union, C++ vs. C

Quote:
>It is my understanding that the C++ Standard makes
>it "undefined behaviour" to store one type in a member
>of a union, and then to refer to the contents of the
>union through a different member of the union, unless
>the latter is a char or unsigned char type.

>What I would like to know is whether C does or ever
>did make this "defined" behaviour?

>In other words, was this idiom ever correct, or is it
>just a habit programmers have gotten into because it
>"always seemed to work."

Section 6.5.15, paragraph 3:

"If the value being stored in an object is accessed from another
object that overlaps in any way the storage of the first object, then
the overlap shall be exact and the two objects shall have qualified or
unqualified versions of a compatible type; otherwise, the behavior is
undefined."

<<Remove the del for email>>
--



Mon, 06 Sep 2004 14:31:47 GMT  
 Aliasing through union, C++ vs. C

Quote:
> It is my understanding that the C++ Standard makes
> it "undefined behaviour" to store one type in a member
> of a union, and then to refer to the contents of the
> union through a different member of the union, unless
> the latter is a char or unsigned char type.

"Undefined behaviour" in this context means that the
results quite simply depend on the implementation.
In other words, it depends on the endianness of the
platform, alignment issues and padding, to name but
a few things that differ from system to system.

Even something like an unqualified char type may have
different behaviours on different platforms.

Regardless, a pointer to the first byte of any member
of the union will always point to the first byte of all
other members of the union.  Whether this yields
useful information is another matter that is very
dependant on how the compiler and platform implement
each data type.

Quote:
> What I would like to know is whether C does or ever
> did make this "defined" behaviour?

No.  It has never been fully "defined" behaviour to the
best of my knowledge.

Quote:
> In other words, was this idiom ever correct, or is it
> just a habit programmers have gotten into because it
> "always seemed to work."

It is a useful trick for platform-dependant code.  As soon
as you move to a different platform, you'll have to revisit
the code to allow for the issues mentioned above.

Geoff

--
Geoff Field,    Professional geek, amateur stage-levelling gauge.


au
My band's web page: http://www.geocities.com/southernarea/
--



Mon, 06 Sep 2004 14:32:16 GMT  
 Aliasing through union, C++ vs. C

Quote:

> It is my understanding that the C++ Standard makes
> it "undefined behaviour" to store one type in a member
> of a union, and then to refer to the contents of the
> union through a different member of the union, unless
> the latter is a char or unsigned char type.
> What I would like to know is whether C does or ever
> did make this "defined" behaviour?

Actually C++ followed C's lead.  union members *overlap*,
so without platform-specific restrictions on representation
for the various types, reading a value as a different type
than was stored should be expected to yield nonsense at
best and an exception (trap) at worst, which is *why* it
was put in the "undefined behavior" category.

An exception is made for accessing as "raw bytes", which
in C99 means as (array of) unsigned char type; any object
storage with any contents can be safely accessed as raw
bytes.

Quote:
> In other words, was this idiom ever correct, or is it
> just a habit programmers have gotten into because it
> "always seemed to work."

I was unaware that it is in wide use.  The canonical
example was:
        union u {
                long l;
                short s[2];
        } x;
        // ...
        x.l = 0x11110000;
        lo_word = x.s[0];
        big_endian = lo_word != 0;
which is not perfectly portable, but worked on enough
platforms that one encountered it occasionally.  By now
one hopes that such code has been changed to be portable.
--



Mon, 06 Sep 2004 14:32:44 GMT  
 Aliasing through union, C++ vs. C

comp.lang.c.moderated:

Quote:
> It is my understanding that the C++ Standard makes
> it "undefined behaviour" to store one type in a member
> of a union, and then to refer to the contents of the
> union through a different member of the union, unless
> the latter is a char or unsigned char type.

Correct.

Quote:
> What I would like to know is whether C does or ever
> did make this "defined" behaviour?

No.

Quote:
> In other words, was this idiom ever correct, or is it
> just a habit programmers have gotten into because it
> "always seemed to work."

> MV

The behavior is undefined in C if you access a member of a union by a
different type than the one you used to store it, unless that access
is by the type unsigned char.  Note that even though neither standard
specifically states it, access via a signed char type, or via plain
char if plain char happens to be signed, might still result in
undefined behavior, because both languages allow trap representations
in signed character types.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
--



Mon, 06 Sep 2004 14:32:46 GMT  
 Aliasing through union, C++ vs. C

Quote:
> It is my understanding that the C++ Standard makes it "undefined
> behaviour" to store one type in a member of a union, and then to
> refer to the contents of the union through a different member of the
> union, unless the latter is a char or unsigned char type.
> What I would like to know is whether C does or ever did make this
> "defined" behaviour?

It probably depends on what you define as C.  No ISO standard has ever
permitted it.  On the other hand, K&R 1 don't even have the concept of
"undefined behavior", and in the early days, it was one of the
accepted ways of type punning.

Quote:
> In other words, was this idiom ever correct, or is it just a habit
> programmers have gotten into because it "always seemed to work."

Given the amount of existing code which depends on it, it's a pretty
good bet that no implementation will dare break it.

--

Beratung in objektorientierer Datenverarbeitung --
                             -- Conseils en informatique oriente objet
Ziegelhttenweg 17a, 60598 Frankfurt, Germany, Tl.: +49 (0)69 19 86 27
--



Mon, 06 Sep 2004 14:34:53 GMT  
 Aliasing through union, C++ vs. C

Quote:

> It is my understanding that the C++ Standard makes
> it "undefined behaviour" to store one type in a member
> of a union, and then to refer to the contents of the
> union through a different member of the union, unless
> the latter is a char or unsigned char type.

> What I would like to know is whether C does or ever
> did make this "defined" behaviour?

> In other words, was this idiom ever correct, or is it
> just a habit programmers have gotten into because it
> "always seemed to work."
>From the rather up-to-date "C: A Reference Manual"

(http://www.careferencemanual.com/), p.165:

"5.7.4 (Mis)using Union Types
Unions are used in a nonportable fashion any time a union component is
referenced when the last assignment to the union was not through the
same component."

So there is no difference between C and C++ in this point.

--

--



Mon, 06 Sep 2004 14:34:58 GMT  
 Aliasing through union, C++ vs. C
Does even this code makes an undefined behaviour?

union Union
{
  int* a_pointer;
  const int* a_const_pointer;

Quote:
}

void AFunction(const int*);

//...

int x;
Union an_union;

an_union.a_pointer = &x;              // Assign a_pointer
AFunction(an_union.a_const_pointer);  // Uses a_const_pointer
--



Tue, 07 Sep 2004 12:11:03 GMT  
 Aliasing through union, C++ vs. C

Quote:



> > It is my understanding that the C++ Standard makes
> > it "undefined behaviour" to store one type in a member
> > of a union, and then to refer to the contents of the
> > union through a different member of the union, unless
> > the latter is a char or unsigned char type.
[...]
> > In other words, was this idiom ever correct, or is it
> > just a habit programmers have gotten into because it
> > "always seemed to work."

> It is a useful trick for platform-dependant code.  As soon
> as you move to a different platform, you'll have to revisit
> the code to allow for the issues mentioned above.

On a similar note...

Is the following "legal"?

    union foobar
        {
        long foo;
        unsigned char bar[sizeof(long)];
        };

Can you then legally (ie: with defined behavior) access the bytes of
"long foo" via bar[] ?  If you copied the bytes out of bar[] and then
later copied them back, are you guaranteed to have the same value in
foo as before?

(Yes, I know that the actual data in the bytes is system-dependent.)

--

+---------+----------------------------------+-----------------------------+
| Kenneth |     kenbrody at spamcop.net      | "The opinions expressed     |
|    J.   |                                  |  herein are not necessarily |
|  Brody  |    http://www.hvcomputer.com     |  those of fP Technologies." |
+---------+----------------------------------+-----------------------------+
GCS (ver 3.12) d- s+++: a C++$(+++) ULAVHSC^++++$ P+>+++ L+(++) E-(---)

    DI+(++++) D---() G e* h---- r+++ y?
--



Tue, 07 Sep 2004 12:12:32 GMT  
 Aliasing through union, C++ vs. C

 > It is my understanding that the C++ Standard makes
 > it "undefined behaviour" to store one type in a member
 > of a union, and then to refer to the contents of the
 > union through a different member of the union, unless
 > the latter is a char or unsigned char type.
 >
 > What I would like to know is whether C does or ever
 > did make this "defined" behaviour?
 >
 > In other words, was this idiom ever correct, or is it
 > just a habit programmers have gotten into because it
 > "always seemed to work."

Well, it seems to work and is very usefull (for low level programming),
eg:

#include <iostream>
using namespace std;

union IER{
         unsigned char data;
         struct{
         unsigned data_avail_interrupt:1; // 1 enable, 0 disable
         unsigned THRE_interrupt:1; // 1 enable, 0 disable
         unsigned line_status_report:1; // 1 enable, 0 disable
         unsigned modem_status_change:1;// 1 enable, 0 disable
         unsigned reserved:4; // always 0
         };
         IER(unsigned da=0, unsigned thre=0, unsigned lsr=0, unsigned msc=0)
         : data_avail_interrupt(da),
           THRE_interrupt(thre),
           line_status_report(lsr),
           modem_status_change(msc),
           reserved(0)
         {
         }

Quote:
};

int main()
{
         IER p(1,0,1);
         cout<<hex<<(int)p.data<<'\n';

Quote:
}

Greetings, Bane.


      [ about comp.lang.c++.moderated. First time posters: do this! ]
--



Tue, 07 Sep 2004 12:12:34 GMT  
 Aliasing through union, C++ vs. C

[snip]

Quote:
> On a similar note...

> Is the following "legal"?

>     union foobar
>         {
>         long foo;
>         unsigned char bar[sizeof(long)];
>         };

> Can you then legally (ie: with defined behavior) access the bytes of
> "long foo" via bar[] ?

Yes, but as you note below it's *highly* system-dependant.

Quote:
>  If you copied the bytes out of bar[] and then
> later copied them back, are you guaranteed to have the same value in
> foo as before?

I don't know about "guaranteed", but it's highly likely that you will on
most platforms.

Quote:
> (Yes, I know that the actual data in the bytes is system-dependent.)

Extremely.

Geoff

--
Geoff Field,    Professional geek, amateur stage-levelling gauge.


au
My band's web page: http://www.geocities.com/southernarea/
--



Tue, 07 Sep 2004 22:18:09 GMT  
 Aliasing through union, C++ vs. C

Quote:



> > > It is my understanding that the C++ Standard makes
> > > it "undefined behaviour" to store one type in a member
> > > of a union, and then to refer to the contents of the
> > > union through a different member of the union, unless
> > > the latter is a char or unsigned char type.
> Is the following "legal"?

>     union foobar
>         {
>         long foo;
>         unsigned char bar[sizeof(long)];
>         };

> Can you then legally (ie: with defined behavior) access the bytes of
> "long foo" via bar[] ?  If you copied the bytes out of bar[] and then
> later copied them back, are you guaranteed to have the same value in
> foo as before?

Yes. This is the special case --- it is always possible to access union data
through an unsigned char or array of unsigned char member. Since only PODs
can be union members, and copying the memory occupied by a POD away and back
again preserves its value, you are guaranteed to have the same value in foo
as before.

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optical Components Ltd
The opinions expressed in this message are not necessarily those of my
employer
--



Tue, 07 Sep 2004 22:19:02 GMT  
 Aliasing through union, C++ vs. C

[...]

Quote:
> Is the following "legal"?
>     union foobar
>         {
>         long foo;
>         unsigned char bar[sizeof(long)];
>         };

Yes, it's a well-defined type of type definition.

Quote:
> Can you then legally (ie: with defined behavior) access the bytes of
> "long foo" via bar[] ?  

Yes.  But the result you get is implementation-defined.  There may be
garbage bits in bar[] that you wouldn't have been able to see in foo;

Quote:
> If you copied the bytes out of bar[] and then later copied them
> back, are you guaranteed to have the same value in foo as before?

AFAIK: no.  Because the moment you read foo after having written to
bar[], you're causing undefined behaviour.  Your machine may
rightfully jump into your face the instant you do that.

--

Even if all the snow were burnt, ashes would remain.
--



Tue, 07 Sep 2004 22:19:13 GMT  
 
 [ 28 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Newbie: separate big .cs file into small .cs files

2. VS.NET: Adding one .cs file to more then one project

3. Adding cs-files from another directory in VS.NET

4. Need C++ text for non cs major course

5. C++ vs C vs Visual C++

6. How to show/call Form2.cs from Form1.cs ?

7. Structures vs. Unions

8. Struct vs Union?

9. Unions VS Structs

10. Include code in other Cs files

11. Reuse of cs files, namespace, arch advice pls

12. word - automatic numbering/bold/underline/italics

 

 
Powered by phpBB® Forum Software