--- A question about struct --- 
Author Message
 --- A question about struct ---

Hi all:

I am trying to re-learn 'C' the right way and I have some questions
regarding the implementation of variables and structures etc.

I sort of know that when the compiler begins to analyze source code,
it creates a list of all unique identifiers or tags that it
finds. All these tags are then analyzed and associated with addresses
internally. The address allocation for each variable depends on the
size of the variable. Therefore:

long x;
long y;

means the symbol "x" is associated with some address, say 1000.

The next symbol "y" would have the address 1004, and so on..

Whenever the compiler sees the symbol "x" in an expression,
it emits code to read from the address associated with x
or write to the address associated with x. ( I'll use the
convention that [someaddress] means to read or write to the address
"someaddress", i.e. [20]  means read/or/write(depending on context) 20

so in my source code:

x = 10  means [1000] <- 10
and
y = x means   [1004]  <- [1000]

Quote:
>From what I gather, the important thing to remember is that x,y etc

are actually addresses ( inside a compiler's table ) and can therefore be
considered as "first-level" pointers. ( or hidden pointers, whenever we
use x,y etc in an expression, the dereferencing is done automatically by
the compiler. )

Also, suppose we declare:

long *z;
z = &x;

Say the compiler stores z in address 1008, then x's address is fetched from
the compiler's internal table and then assigned to z. ( at address 1008 )

i.e [1008] <- 1000 , ( x's address is 1000 )

*z:

*z can be considred a "second-level" pointer, because whenever the compiler
sees *z in an expression, it emits code to do the following:

   [   [1008]  ]  , i.e read/write to the address (the value) at
                    address 1008.

In this case,   [ [1008] ] => [ 1000 ], because [1008] = the address of
                                                x = 1000

Now we come to arrays and structures:

Suppose I have an array:

long myarray[10];

When the compiler sees myarray[i] in a expression, it emits code to read/write
to the address associated with "myarray" in the compiler table ( say 1100 )
plus an offset of ( i * size of an array item in bytes ). Therefore

myarray[5]  means    [ 1100 + 5 * 4 ] or [ 1120 ]

i.e read/write ( depending on the context ) at the address 1120.

This is similar to x in an expression meaning read/write [1000] where the
address of x is 1000. ( The only difference is that there is obviously
no scaling in this case ). HOWEVER:

myarray, by itself, *IS* the address associated with myarray in the compiler
table. that is myarrray ( by itself ) in an expression, is replaced with
1100.

i.e

long *ptr; suppose the compiler stores ptr at address 1200

ptr = myarray means  [1200] <- 1100

This is strange. If I use x, y etc in an expression, the compiler
emits code to [1000], [1004] etc where x is located at 1000, y at 1004.

But if I use myarray in an expression, the compiler Does Not emit code
to [1100], where myarray[i] is located at 1100.

To actually mean [1100] I have to say myarray[0]

To summarize: myarray = &myarray[0]. This is not that wierd when I
think about it. K & R could have defined "myarray" to mean myarray[0]
or [1100] in this case, as opposed to meaning 1100 ( the address itself)
But they chose not to, I'm guessing, to make getting the syntax for array
addresses easier.

This is what K & R mean by saying a[i] is implemented as
* ( a + i ) i.e *(1100 + i) or [1100 + i].

if we have,   long *ptr  ( and ptr is located at 1200  by the compiler)

ptr = a means
[1200]  <-  1100

then ptr[i] means:

*( ptr + i )  means * ( [ptr] + i )
              means  [  [ptr] + i )
              means  [  [1200]  + i ]
              means  [  1000 + i ], same

as a[i] in the end result but notice that extra indirection.

Structures.

This is what is causing me endless confusion. If you have read so far then
please help me out here !  :

Structures are conceptually the same as arrays. They are a collection
of things in memory. The only difference is that the things can be
of different size. ( in arrays, all things in a given array are of the
same size ).

So the implementation, syntax of structures etc should have been
the same as that of arrays. Yet this is not the case.

Suppose I have a structure defined as:

struct foo
{
   long m;
   long n;

Quote:
} mystruct;

        ( Also suppose the compiler stores mystruct at address 1300 )

Then "mystruct" by itself does not mean the address associated with
mystruct.

i.e if I have

struct foo *ptr;

then I CANNOT say  ptr = mystruct. ( I could in the case of arrays. )
Why is this the case , why does "mystruct" not mean the address associates
with mystruct ? Why were structs and arrays implemented differently ? (
They are conceptually the SAME ).

Also if I do say:
ptr = &mystruct

then I would expect *ptr to give me the value of the first element
in the structure i.e  [1300].

And it does ! ( on one compiler )
and it Doesn't ! ( on another compiler ).

So if mystruct.m = 10 then

*ptr gives me 10 ( Value of first element of struct ) on the first compiler
*ptr gives me an ADDRESS ( &mystruct + 30 = 1330 ) on the second compiler.

Both compilers are ANSI C! The first compiler ( macintosh codewarrior )
makes sense i.e if pointer contains address then *ptr gives me value
at that address. )

The second compiler ( gcc on UltraSPARC ) makes no sense at all i.e
ptr contains an address but *ptr gives me *another* address in memory.

Also if I declare 2 such structures, I would expect *( ptr + 1 ) to give
me the value of the first element in the second instance of the structure
but in this case, both compilers give me an address for *( ptr + 1 ).

Also the syntax (*ptr).m to access element m in the structure makes no
sense. Remember, ptr = &mystruct.

Therefore *ptr should be the VALUE of the first element of the structure
and not the address. (*ptr).m implies that (*ptr) is an addres even
though ptr contains and adress itself....

Can someone please explain ?

Thanks for your help........

Hursh

-----------------------------------------------------------------------

P.S 2: I was going through the book "Deep C secrets" by Mr. Peter Van
Der Linden. It is excellent. I also saw his name on a posting recently
so I just wanted to say, great book! When is the Macintosh and PowerPC
specific chapter coming out though?



Sat, 19 Jun 1999 03:00:00 GMT  
 --- A question about struct ---

Quote:
>Can someone please explain ?

The big problem is trying to understand and explain C in terms of
specific implementations.  High level languages are not defined that
way, in fact you *will* see a lot of things that act differently on
different compilers and systems, precisely because implementation is
not defined.  You can even find some C systems that do not use
addresses the way you described, so that "x" does *not* refer to an
address that holds the value of that variable (in particular,
implementations that aren't compilers, or compilers for non von-Neuman
architectures, say a Lisp machine version of C).

So therefore, "mystruct" is a *variable*, and should be thought of
that way, it is not an address.  It is a single unit that has several
components.  The components are referred to via "mystruct.x", but the
entire unit is referred to as "mystruct".

With an array, the fact that "myarray" when passed as an argument to a
function actually passes "&myarray[0]" is a convenience for programming
only.

Arrays and structs don't act differently, except when you peer at the
implementation and assume that structs are like arrays!  In C, an
array and a struct are not the same, despite what assembler code gets
generated by the compiler.  Just because both do "address + offset" is
irrelevant.

The way to think of it is, an array is a collection of variables of
the same type.  A struct is a single variable.  Thus, when you pass
"mystruct" to a function, the entire variable is passed, just like a
variable of type int or char or whatever (early C compilers often
didn't support passing or returning structures by the way).  An array
however, is a different matter - when you pass it, it is inconvenient
to pass the entire array (inefficient for one, and you don't know the
size of the array either).  Thus, when you pass "myarray", C conveniently
treats it as if you passed a pointer to the first element instead.
It's not inconsistent, because this is only done for collections, and
a struct is not a collection.

--
Darin Johnson



Mon, 21 Jun 1999 03:00:00 GMT  
 --- A question about struct ---

Oops, should add a little to clarify later questions.

Quote:
>Suppose I have a structure defined as:

>struct foo
>{
>   long m;
>   long n;
>} mystruct;
>struct foo *ptr;
>then I CANNOT say  ptr = mystruct. ( I could in the case of arrays. )
>Why is this the case , why does "mystruct" not mean the address associates
>with mystruct ? Why were structs and arrays implemented differently ? (
>They are conceptually the SAME ).

Because "mystruct" is of type "struct foo", and ptr is of type "struct foo *".
Thus they can't be assigned to each other.

Mystruct does not mean the address, because C doesn't use addresses,
only implementations of C use addresses.  Yes, implementations may
have both ptr and mystruct generate the same address in some
instances, but that address is a number without a context!  The
context is tossed out in the process of generating assembler, but
C won't let you ignore the context in the high level code.

And structs and arrays are conceptually very very different.

Quote:
>Also if I do say:
>ptr = &mystruct

>then I would expect *ptr to give me the value of the first element
>in the structure i.e  [1300].

>And it does ! ( on one compiler )
>and it Doesn't ! ( on another compiler ).

Nope, that's not what it should do!  "*ptr" will give you "mystruct",
it does not give you the first component of mystruct.  Remember,
"*ptr" has type "struct foo", and the first component of "mystruct"
is not that type.

Quote:
>*ptr gives me 10 ( Value of first element of struct ) on the first compiler
>*ptr gives me an ADDRESS ( &mystruct + 30 = 1330 ) on the second compiler.

Erk, BOTH are wrong - but I suspect you're giving this all out of context.
Show the exact code.  *ptr should give mystruct.  But if you printed it out
as:

  printf("%ld\n", *ptr);

then the answer you get depends upon the implementation!  Thus, you
can get either 10 or an address, depending upon the compiler.  This is
I suspect what you actually did.  What is happening is that on most
machines, this will put "mystruct" on the stack (how this is done is
irrelevant to C), and then printf looks at the top of stack assuming
it is an integer.  But the top of stack is not an integer at all!

Quote:
>Also if I declare 2 such structures, I would expect *( ptr + 1 ) to give
>me the value of the first element in the second instance of the structure
>but in this case, both compilers give me an address for *( ptr + 1 ).

This is somewhat confusing here.  If you say *(ptr+1), then you are
doing pointer arithmetic, as if ptr pointed to an ARRAY of structures.
But earlier you just said "ptr = &mystruct".  And that's not an array
of structures...  And I suspect, you have the same problem as before
if you try to printf things.

Quote:
>Also the syntax (*ptr).m to access element m in the structure makes no
>sense. Remember, ptr = &mystruct.

Ok, since "ptr = &mystruct", this means that *ptr refers to mystruct.
And "(*ptr).m" refers to the same thing as "(mystruct).m".

In other words, the first thing to do when evaluating this expression
is to evaluate what is inside the parentheses first.  Inside the
parentheses is "*ptr".  And "*ptr" has the type "struct foo" and refers
to "mystruct".

Quote:
>Therefore *ptr should be the VALUE of the first element of the structure
>and not the address. (*ptr).m implies that (*ptr) is an addres even
>though ptr contains and adress itself....

"*ptr" is not an address, it is a structure.  Remember, addresses are
things that assembler deals with, C deals with variables.

Since "*ptr" is of type "struct foo", this means that "*ptr" can not
possibly be a "long" (the type of "m").

The whole confusion it seems is in trying to think of these things in
terms of addresses.  In most implementations of C, "mystruct" and
"mystruct.m" have the same starting address, yet in C these are two
entirely different things.

When compiled, the address generated for "ptr" and "&mystruct" and
"&mystruct.m" may be equal, but this does not mean they're equal
in C (you can't do "ptr == &mystruct.m").

--------

As an analogy, say I live at 577 Elm Avenue.  That's my address.  And
the address of my roommates, the cat, and so forth.  It is also the
address of a house.  The house is like a structure, and the things
inside are the components (I hate saying element here, since to me that
implies an array or collection).

Now, "mystruct" would be the house.  "&mystruct" would be a piece of
paper saying "the house at 577 Elm".  "ptr" would be a photocopy of
that piece of paper.

"mystruct.m" would be myself say.  Then "&mystruct.m" would be a piece
of paper saying "the geeky guy at 577 Elm".

If we look at "*ptr" that can only refer to the house at 577 Elm, it
can't refer to the geeky guy, because the piece of paper says "house".
To refer to me, you would have to have a typecast (unportable in this
situation by the way), as in crossing house "house" and scribbling in
"geeky guy" instead.

To get to the final point, your confusion seems to be that you're only
putting "577 Elm" down on that piece of paper.  And indeed, in an
implementation, that may be what happens.  If you tell someone to
paint "the house at 577 Elm", they won't bother writing down that they
should paint the house and not the cat.  That's part of the context.
So the "assembler" at this point, just writes down "577 Elm"; the
person coming to visit me also only writes down "577 Elm", the
"compiler" remembers the context, but doesn't put it out in assembler.
Now if you remove all the context, and assume that these are all the
same, you end up with someone painting the cat, and my friend standing
outside chatting with the house.  And when it's all over, you can't
just excuse it all by saying "at least they all got to the right
address" :-)

And that's what happens if you do: printf("%ld", *ptr).

--
Darin Johnson



Mon, 21 Jun 1999 03:00:00 GMT  
 --- A question about struct ---

Quote:

>Hi all:
>I am trying to re-learn 'C' the right way and I have some questions
>regarding the implementation of variables and structures etc.

[ Descriptions of low level addressing of variables deleted. ]

I am unsure why you have to approach the subject from a compiler
implementor's perspective!

While large text was on how access of variables may be implemented
at a lower level, the question asked really is: Why is the syntax
for arrays and structures different, although they are conceptually
collective objects kept in memory?

Quote:
>Suppose I have a structure defined as:
>struct foo
>{
>   long     m;
>   long n;
>} mystruct;
>    ( Also suppose the compiler stores mystruct at address 1300 )
>Then "mystruct" by itself does not mean the address associated with
>mystruct.

This is because mystruct only defines a type, not a variable; that is,
it gives a view of how variables of that type should be like.

Quote:
>Why were structs and arrays implemented differently ? (
>They are conceptually the SAME ).

Arrays and structures (or records) are not conceptually the same.  For
an array, it is conceptually a sequence of objects of the same type
in memory.  Its representation (or picture) is already known.  Hence
there is no need to define it.  However, this is not the case for a
structure, which is rather a concept that defines a collective object
of different types.  The exact form of a particular structure is not
yet defined.

Quote:
>Also if I do say:
>ptr = &mystruct
>then I would expect *ptr to give me the value of the first element
>in the structure i.e  [1300].
>And it does ! ( on one compiler )
>and it Doesn't ! ( on another compiler ).

The statement ptr = &mystruct invokes an undefined behavior, that in
fact anything could happen!  mystruct typically only has its existence
at compile time, but not at run time.  The reason was explained.

Lin
--

Department of Computer Science       http://yallara.cs.rmit.edu.au/~lin/
RMIT, GPO Box 2476V, Melbourne 3001, Australia              



Mon, 21 Jun 1999 03:00:00 GMT  
 --- A question about struct ---

Quote:

> >struct foo
> >{
> >   long        m;
> >   long n;
> >} mystruct;

> >       ( Also suppose the compiler stores mystruct at address 1300 )

> >Then "mystruct" by itself does not mean the address associated with
> >mystruct.

> This is because mystruct only defines a type, not a variable; that is,
> it gives a view of how variables of that type should be like.

Eh?  The above declaration, unless I'm going prematurely senile,
declares a structure type containing two longs, gives it the tag `foo',
and then declares an object of type `struct foo' named `mystruct'.

Quote:
> The statement ptr = &mystruct invokes an undefined behavior, that in
> fact anything could happen!  mystruct typically only has its existence
> at compile time, but not at run time.  The reason was explained.

The statement `ptr = &mystruct' is valid, and its behaviour is well-
defined.  `mystruct' has object type, its address may be taken and
assigned to a pointer to the correct type.  If `ptr' is declared as
`void *ptr' or `struct foo *ptr' then this is all hunky dory.
--
[mdw]

`When our backs are against the wall, we shall turn and fight.'
                -- John Major



Tue, 22 Jun 1999 03:00:00 GMT  
 --- A question about struct ---



Quote:
>Hi all:

>I am trying to re-learn 'C' the right way and I have some questions
>regarding the implementation of variables and structures etc.

>I sort of know that when the compiler begins to analyze source code,
>it creates a list of all unique identifiers or tags that it
>finds. All these tags are then analyzed and associated with addresses
>internally. The address allocation for each variable depends on the
>size of the variable. Therefore:

>long x;
>long y;

>means the symbol "x" is associated with some address, say 1000.

>The next symbol "y" would have the address 1004, and so on..

Correct for the most part, but there's a possible dangerous assumption
here. There is no requirement for the variables to be stored
sequentially or contiguously. If they are declared in common, y often
_will_ be stored immediately following x, but there's no guarantee.

<SNIP>

- Show quoted text -

Quote:
>Now we come to arrays and structures: Suppose I have an array:
>long myarray[10];
>When the compiler sees myarray[i] in a expression, it emits code to read/write
>to the address associated with "myarray" in the compiler table ( say 1100 )
>plus an offset of ( i * size of an array item in bytes ). Therefore
>myarray[5]  means    [ 1100 + 5 * 4 ] or [ 1120 ]
>i.e read/write ( depending on the context ) at the address 1120.
>This is similar to x in an expression meaning read/write [1000] where the
>address of x is 1000. ( The only difference is that there is obviously
>no scaling in this case ). HOWEVER:
>myarray, by itself, *IS* the address associated with myarray in the compiler
>table. that is myarrray ( by itself ) in an expression, is replaced with
>1100.
>i.e
>long *ptr; suppose the compiler stores ptr at address 1200
>ptr = myarray means  [1200] <- 1100
>This is strange. If I use x, y etc in an expression, the compiler
>emits code to [1000], [1004] etc where x is located at 1000, y at 1004.
>But if I use myarray in an expression, the compiler Does Not emit code
>to [1100], where myarray[i] is located at 1100.
>To actually mean [1100] I have to say myarray[0]
>To summarize: myarray = &myarray[0]. This is not that wierd when I
>think about it. K & R could have defined "myarray" to mean myarray[0]
>or [1100] in this case, as opposed to meaning 1100 ( the address itself)
>But they chose not to, I'm guessing, to make getting the syntax for array
>addresses easier.

Allowing myarray to substitute for myarray[0] (vice *myarray[0]) would
be even stranger if you thing about it. Consider:

int myarray[10];

myarray = 5;

This code would be visually very misleading. I appear to be assigning
a value to the entire array, but I'm actually only assigning to a
single int.

Or how about this:

if(sizeof(myarray) == sizeof(myarray[0]))
        DoSomething();

myarray has type array_of_int, not type int.

- Show quoted text -

Quote:

>This is what K & R mean by saying a[i] is implemented as
>* ( a + i ) i.e *(1100 + i) or [1100 + i].

>if we have,   long *ptr  ( and ptr is located at 1200  by the compiler)

>ptr = a means
>[1200]  <-  1100

>then ptr[i] means:

>*( ptr + i )  means * ( [ptr] + i )
>          means  [  [ptr] + i )
>          means  [  [1200]  + i ]
>          means  [  1000 + i ], same

>as a[i] in the end result but notice that extra indirection.
>Structures.
>This is what is causing me endless confusion. If you have read so far then
>please help me out here !  :
>Structures are conceptually the same as arrays. They are a collection
>of things in memory. The only difference is that the things can be
>of different size. ( in arrays, all things in a given array are of the
>same size ).

Structures are NOT conceptually the same as arrays. They share some
superficial simularities, but are fundamentally quite different. This
is true in the way the compiler deals with them, and also with the way
they are normally used.

Physically, arrays are guaranteed to be a set of identical data types
contiguously located in memory. Structs are a collection of possibly
differing data types. Many compilers will pad structures with empty
bytes in order to align the data members on byte boundries. This makes
calculating the location of a member of a struct much less
straight-forward than calculating the location of a member of an
array.

Conceptually, arrays are _usually_ a collection of multiple examples
of a single object. Structs are _usually_ a collection of differing
data types which make up or describe a single object.

Quote:
>So the implementation, syntax of structures etc should have been
>the same as that of arrays. Yet this is not the case.

The sytax is different because structs and arrays are different.

Quote:
>Suppose I have a structure defined as:
>struct foo
>{
>   long     m;
>   long n;
>} mystruct;
>    ( Also suppose the compiler stores mystruct at address 1300 )
>Then "mystruct" by itself does not mean the address associated with
>mystruct.
>i.e if I have
>struct foo *ptr;
>then I CANNOT say  ptr = mystruct. ( I could in the case of arrays. )
>Why is this the case , why does "mystruct" not mean the address associates
>with mystruct ? Why were structs and arrays implemented differently ? (
>They are conceptually the SAME ).

I'm not sure why K&R made this design decision. I do know that in my
experience I've used array names as addresses much, much more often
than I've needed to use &StrunctName.

Quote:

>Also if I do say:
>ptr = &mystruct
>then I would expect *ptr to give me the value of the first element
>in the structure i.e  [1300].

Why would you expect that? ptr is a pointer to a structure. Why should
dereferencing it give you some other data type?

Quote:
>And it does ! ( on one compiler )
>and it Doesn't ! ( on another compiler ).

>So if mystruct.m = 10 then
>*ptr gives me 10 ( Value of first element of struct ) on the first compiler
>*ptr gives me an ADDRESS ( &mystruct + 30 = 1330 ) on the second compiler.

When you say it "gives" you this value, what do you mean? How are you
examining the value? Are you printing it out with printf() or what?

Without knowing what you're doing, my initial guess would be that this
is a difference in the evaluation implementation, not in the
dereferencing implementation. ptr is a pointer to a structure. *ptr
should dereference to that structure. My guess is that you'd see the
same differences if you examined the structures themselves the same
way. IOW, if *ptr "gives" you 10, then mystruct should also "give" you
10. If *ptr "gives" you 1330, then mystruct should "give" you 1330.

Quote:
>Both compilers are ANSI C! The first compiler ( macintosh codewarrior )
>makes sense i.e if pointer contains address then *ptr gives me value
>at that address. )

Most likely, the method you're using to examine the variables is not
ANSI.

Quote:
>The second compiler ( gcc on UltraSPARC ) makes no sense at all i.e
>ptr contains an address but *ptr gives me *another* address in memory.

>Also if I declare 2 such structures, I would expect *( ptr + 1 ) to give
>me the value of the first element in the second instance of the structure
>but in this case, both compilers give me an address for *( ptr + 1 ).

Once again, I'm at a loss as to why you expect this behavior. ptr is a
pointer to a structure, not a pointer to a data member. The compiler
knows the size of mystruct. It will increment the pointer by the size
of mystruct.

Quote:
>Also the syntax (*ptr).m to access element m in the structure makes no
>sense. Remember, ptr = &mystruct.
>Therefore *ptr should be the VALUE of the first element of the structure
>and not the address. (*ptr).m implies that (*ptr) is an addres even
>though ptr contains and adress itself....

int x;
int *p;

x is an integer. p is a pointer to an integer. Dereferencing p gives
you an integer.

p = &x;
*p = 5; /* Same as x = 5 */

mystruct is a structure. ptr is a pointer to a structure.
Dereferencing ptr gives you a structure.

ptr = &mystruct;
(*ptr).m = 5; /* Same as mystruct.m = 5 */

Of course, I'd seldom if ever use this syntax. The -> operator is
generally clearer.

One thing that may make it clearer is to realize that an array is
multiple variables. A struct is a SINGLE variable.

Regards,

Dan



Tue, 22 Jun 1999 03:00:00 GMT  
 
 [ 6 post ] 

 Relevant Pages 

1. Question about struct declarations

2. Question with struct/union

3. question about struct

4. newbie question: returning structs from functions

5. newbie question on structs

6. Newbie question about struct

7. Question on struct elements

8. question on struct member declaration

9. question on struct

10. newbie question: passing struct as argument to function ?

11. beginner question about structs and functions

12. Question on struct

 

 
Powered by phpBB® Forum Software