address of structure == address of first member? (long)
Author |
Message |
Carl Paukst #1 / 13
|
 address of structure == address of first member? (long)
In my H&S, 5.7.3 (p. 108) says "... C compilers are constrained to assign components increasing memory addresses in a strict left-to-right order, with the first component starting at the beginning address of the structure itself." Fine. I want to write a general-purpose search function. I have several arrays of (different) structures; each structure contains a pointer to a char string identifier as its first member. The search function takes a character (string) pointer to a search argument, a pointer to one of the arrays of structures, the size the structure entry, and the number of entries in the table. It finds the table entry containing the (pointer to) string specified, and returns a pointer to the matching table entry. By H&S, that part of the code which gets the pointer to the string to do the compare must work. And the following compiles and runs clean under gcc (1.30) under Ultrix and another (semi-)ANSI(draft)compliant compiler on a SYS V system. What other portability problems will I run into with this code? How can I fix them? 1 #include <stdio.h> 2 3 struct mumble { 4 char *dopey; 5 float sneezy; 6 int doc; 7 }; 8 9 struct fumble { 10 char *bashful; 11 char grumpy; 12 float sleepy; 13 }; 14 15 const struct mumble m[10] = { 16 { "abc", 1.3, 4 }, 17 { "def", 2.5, 2 }, 18 { "ghi", 2.6, 1 }, 19 { "ghj", 2.67, 0 }, 20 { "zzzkl", 9.4, -2 } 21 }; 22 23 const struct fumble f[12] = { 24 { "annapolis", 'b', 907.4 }, 25 { "brainerd", 'c', 821.0 }, 26 { "collinston", 'd', 1243.6}, 27 { "detroit", 'e', 3.2 }, 28 { "evanston", 'f', 1.1054 }, 29 { "fargo", 'g', 1204.0 }, 30 { "gonzaga", 'z', 3.1415926 }, 31 { "zulu", 'a', 92203.1 } 32 }; 33 34 main() 35 { 36 void * mybsearch(const char *, void const *, const int, const int); 37 struct mumble *pm; 38 struct fumble *pf; 39 40 pm = (struct mumble *)mybsearch ("ghj", 41 (void *)m, 42 sizeof(m)/sizeof(struct mumble), 43 sizeof(struct mumble)); 44 45 if (0 == pm) 46 { 47 printf("Not found in m!\n"); 48 exit(1); 49 } 50 printf("Value for ghj is %f\n",pm->sneezy); 51 52 pf = (struct fumble *)mybsearch ("evanston", 53 (void *)f, 54 sizeof(f)/sizeof(struct fumble), 55 sizeof(struct fumble)); 56 57 if (0 == pf) 58 { 59 printf("Not found in f!\n"); 60 exit(1); 61 } 62 printf("Value for evanston is %f\n",pf->sleepy); 63 64 printf("OK, found both!\n"); 65 exit(0); 66 } 67 68 /** 69 ** simple-minded binary search routine: 70 ** Find a table entry which matches the desired string. 71 ** Table entries are assumed to be structures, the first 72 ** member of which is a pointer to a character string key. 73 **/ 74 75 void * mybsearch (const char * key, /* value to match against */ 76 void const * table, /* first table entry */ 77 const int count, /* number of table entries */ 78 const int size) /* size (char's) of each entry */ 79 { 80 int l, m, r; 81 int code; 82 for (l = 0, r = count - 1; l <= r;) 83 { 84 m = (l + r) / 2; 85 86 code = strcmp (key, *(char **)((char *)table + (m * size))); 87 88 if (code == 0) /* found it! */ 89 return ((void *)((char *)table + (m * size))); 90 91 if (code < 0) 92 r = m - 1; /* search left half */ 93 else 94 l = m + 1; /* search right half */ 95 } 96 97 return ((void *) 0); /* not found */ 98 } 99 Line 86 bothers me. Is it permissible to cast a void* into a char* and do arithmetic on it like this? Line 89 would have the same problem (if any). On machines (environments?) with non-byte-sized chars, is there a better way to do the necessary arithmetic? Or what else have I missed? -- Carl Paukstis +1 509 927 5600 x5321 |"I met a girl who sang the blues | and asked her for some happy news
...uunet!iscuvc!carlp | - Don MacLean
|
Mon, 19 Apr 1993 19:45:00 GMT |
|
 |
Doug Gwy #2 / 13
|
 address of structure == address of first member? (long)
Quote:
>What other portability problems will I run into with this code?
1. Use of prototypes constrains you to a modern compiler. 2. You implicitly declare exit() as returning int, which is incorrect. 3. sizeof results in a size_t but mybsearch() wants an int. Although the type is automatically converted in the scope of the prototype, if you tried to use a really large struct the size might not fit. 4. You really ought to #include <string.h> to declare strcmp(). Quote: >86 code = strcmp (key, *(char **)((char *)table + (m * size))); >Line 86 bothers me. Is it permissible to cast a void* into a char* and do >arithmetic on it like this? Line 89 would have the same problem (if any).
Line 86 is okay, but you really don't need to cast to a (char**) then dereference to get the (char*) key. That's extra work that amounts to a no-op. Quote: >On machines (environments?) with non-byte-sized chars, is there a better >way to do the necessary arithmetic? Or what else have I missed?
That seems about as good as any, if you don't want to tailor individual versions to specific systems.
|
Mon, 19 Apr 1993 09:29:00 GMT |
|
 |
Carl Paukst #3 / 13
|
 address of structure == address of first member? (long)
(Maybe I should take this to private mail, but it seems a couple of points here may be of general interest..)
Quote:
>>What other portability problems will I run into with this code? ... >2. You implicitly declare exit() as returning int, which is incorrect.
Hmmm. Since exit() _doesn't_ return, I never paid any attention to this. Call me sloppy, I guess. I'll have to remember that. ... Quote: >4. You really ought to #include <string.h> to declare strcmp().
Oops! Actually, I thought of this after I nicely formatted the listing, and didn't want to mess up the line numbers. (You believe that? :-) Quote: >>... void * table ... >>86 code = strcmp (key, *(char **)((char *)table + (m * size))); >Line 86 is okay, but you really don't need to cast to a (char**) then >dereference to get the (char*) key. That's extra work that amounts to >a no-op.
Huh? The type of the expression ((char *)table + (m * size)) is "pointer to char", no? And the usage on line 86 requires "pointer to pointer to char". If I dereference the subexpression above, e.g. *((char *)table + (m * size)), the result type is "char", not "pointer to char" as required by strcmp(). Or am I just being dense? Anyway, thanks to Doug (and Chris Torek in private mail) who both sort of confirmed that this code is reasonably portable. -- Carl Paukstis +1 509 927 5600 x5321 |"The right to be heard does not | automatically include the right
...uunet!iscuva!carlp | - H. H. Humphrey
|
Mon, 19 Apr 1993 16:59:00 GMT |
|
 |
Doug Gwy #4 / 13
|
 address of structure == address of first member? (long)
->>86 code = strcmp (key, *(char **)((char *)table + (m * size))); ->Line 86 is okay, but you really don't need to cast to a (char**) then ->dereference to get the (char*) key. That's extra work that amounts to ->a no-op. - Huh? The type of the expression ((char *)table + (m * size)) is - "pointer to char", no? And the usage on line 86 requires "pointer to - pointer to char". If I dereference the subexpression above, e.g. - *((char *)table + (m * size)), the result type is "char", not "pointer - to char" as required by strcmp(). Or am I just being dense? That's not what I said! You already had a (char *). You then cast it to (char **), and then dereferenced that using the * operator. There was no need for these extra steps when the original (char *) was just what you needed. (strcmp takes (char *) arguments.)
|
Mon, 19 Apr 1993 06:38:00 GMT |
|
 |
Jim Vlc #5 / 13
|
 address of structure == address of first member? (long)
A little note on Carl Paukstis' problems with a (char **) cast and dereferencing of a (void *) pointer. Just to refresh your memories, a brief insert from Doug Gwyn's last posting on the matter:
Quote: *Paukstis) writes:
*->>86 code = strcmp (key, *(char **)((char *)table + (m * size))); *->Line 86 is okay, but you really don't need to cast to a (char**) then *->dereference to get the (char*) key. That's extra work that amounts to *->a no-op. *- Huh? The type of the expression ((char *)table + (m * size)) is *- "pointer to char", no? And the usage on line 86 requires "pointer to *- pointer to char". If I dereference the subexpression above, e.g. *- *((char *)table + (m * size)), the result type is "char", not "pointer *- to char" as required by strcmp(). Or am I just being dense? * *That's not what I said! * *You already had a (char *). You then cast it to (char **), and then *dereferenced that using the * operator. There was no need for these *extra steps when the original (char *) was just what you needed. *(strcmp takes (char *) arguments.) Carl is in fact completely right on this one, and the way in which he did the operation is exactly the right way. Consider: ``table'' is a pointer to an array of structures, each of size ``size'', and whose first element is a pointer to char (char *). He needs to pass one of these pointers to strcmp(). Since it is the pointers which are contained in ``table,'' not the actual strings to be compared, obviously a dereference needs to be performed to be able to pass the correct value to strcmp(). Since the routine to which ``table'' is passed has no idea what kind of structures comprise this array, it is given the size of each structure as ``size.'' Hence, casting ``table'' to (char *) and adding m*size to it are perfectly correct; this then yields a pointer to the m-th array entry (structure-wise) in ``table.'' Since the address of the first element of any structure is also the address of the structure itself (suitably cast), this value is then also the address of the first element in the structure -- this so happens to be a pointer to char. Hence, the cast to (char **) is appropriate. One then dereferences this to get the actual pointer value, passes it to strcmp(), and voila! People here paid too much attention to the pointer types, and not enough attention to the values! The original cast to (char *) is only done to be able to correctly perform the necessary pointer arithmetic, for in fact ``table'' is in no sense really a pointer to char. I'm suprised that the notion of *(char **) amounting to a no-op managed to breeze by Doug Gwyn! -- Jim Vlcek
!{harvard,rutgers}!mit-eddie!mit-caf!vlcek
|
Mon, 19 Apr 1993 05:58:00 GMT |
|
 |
Chris Tor #6 / 13
|
 address of structure == address of first member? (long)
[Amazing that no one else has jumped on this yet.] Quote:
>>86 code = strcmp (key, *(char **)((char *)table + (m * size)));
[much edited]
Quote: >You already had a (char *). You then cast it to (char **), and then >dereferenced that using the * operator. There was no need for these >extra steps when the original (char *) was just what you needed.
But the original `char *' is not what he needs, and this does not compile
we write: char *p; p = (char *)table + (m * size); we have p pointing at (not `to': I am reserving that for a pointer of the appropriate type) an object of `struct <something>'. This structure is defined as: struct <something> { char *name; ... }; To be completely type-correct, we should convert `p' into a pointer to the correct structure type-name, and follow that to its `name' field. The problem, of course, is that in this general-purpose search routine, we have no idea *which* structure `p' is standing for [to end the sentence a preposition with]. There are two alternatives: define a structure that contains only a name; or cheat. struct name { char *name; }; result = strcmp(key, ((struct name *)p)->name); or result = strcmp(key, *(char **)p); The former is in some sense superiour, particularly under K&R rules, where any pair of structures that have an initial common set of names and types are, in that common part, compatible. (This rule is left over from the days before unions, and does not appear in the dpANS.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
|
Mon, 19 Apr 1993 23:35:00 GMT |
|
 |
Doug Gwy #7 / 13
|
 address of structure == address of first member? (long)
Quote:
>To be completely type-correct, we should convert `p' into a pointer to >the correct structure type-name, and follow that to its `name' field. >The problem, of course, is that in this general-purpose search routine, >we have no idea *which* structure `p' is standing for [to end the >sentence a preposition with]. There are two alternatives: define a >structure that contains only a name; or cheat.
Technically it is not necessary to do this via a purely "type correct" method; there are sufficient requirements imposed by the very latest proposed C standard to ensure that any addressable data object forms a linear address subspace within itself, such that (char *)-arithmetic suffices to correctly address its pieces. This guarantee was necessary to make memcpy() etc. usable for their intended purposes. Cheating is guaranteed to suffice. A pointer to a structure converted to (char *) must compare equal to a pointer to the first member of the structure converted to (char *), and in this particular example casting that back to (char **) will produce a valid pointer to the first structure whose type is (char *). Then * will pick up the contents of that (char *) for use with strcmp(). (I made a mistake about this in my previous posting. strcmp() does need the (char *) itself, not just its address, so the additional *(char**) is necessary. Sorry.) I admit to being surprised fairly recently when I found that this "cheating" was supposed to be officially sanctioned, but it's useful. The wording about "same representation and alignment" was also fixed so that one can safely pass a nonnegative int to a function expecting an unsigned int, or a non-qualified type to a function with parameter declared as a qualified version of that type, etc. This (along with a fix to the definition for fprintf() %u format) seems to complete the work of making C object representations behave as usefully as possible without undue burden for implementors on segmented, ones- complement, etc. architectures. A further nice thing is that the K&R 2nd Ed. example where a pointer to a function of a slightly different type than expected was passed (for a sorting comparison function, I think) is now guaranteed to actually work the way K&R used it. This had been bothering me..
|
Mon, 19 Apr 1993 12:03:00 GMT |
|
 |
Doug Gwy #8 / 13
|
 address of structure == address of first member? (long)
-People here paid too much attention to the pointer types, and not -enough attention to the values! The original cast to (char *) is only -done to be able to correctly perform the necessary pointer arithmetic, -for in fact ``table'' is in no sense really a pointer to char. I'm -suprised that the notion of *(char **) amounting to a no-op managed to -breeze by Doug Gwyn! Yes, I apologize. I was concentrating on the types issue and didn't pay attention to the fact that the struct contained a pointer to the string to be compared not the string (char array) itself. Sorry. The *(char**) is indeed necessary to pick up the pointer to the string. If you check out the wording in the articles to which I was responding, it should be easy to see how I was misled: Quote: >*- Huh? The type of the expression ((char *)table + (m * size)) is >*- "pointer to char", no? And the usage on line 86 requires "pointer to >*- pointer to char". If I dereference the subexpression above, e.g. >*- *((char *)table + (m * size)), the result type is "char", not "pointer >*- to char" as required by strcmp(). Or am I just being dense?
The referenced discussion is all about types and not about semantics. It is true, as I said, that starting from a (char *) one doesn't need to apply *(char**) to it to produce a (char *), but one DOES need to do that to access the pointer member of the array in order to find the string in question. If the operation had been on the struct member itself then the original (char *) would already have been suitable for accessing it via a function taking (char *) pointers.
|
Mon, 19 Apr 1993 12:57:00 GMT |
|
 |
Wayne A. Thro #9 / 13
|
 address of structure == address of first member? (long)
Quote:
>>>>86 code = strcmp (key, *(char **)((char *)table + (m * size))); >>>Line 86 is okay, but you really don't need to cast to a (char**) then >>>dereference to get the (char*) key. That's extra work that amounts to >>>a no-op. >> Huh? [... It sure seems necessary to me ...] > You already had a (char *). You then cast it to (char **), and then > dereferenced that using the * operator. There was no need for these > extra steps when the original (char *) was just what you needed. > (strcmp takes (char *) arguments.)
Oooooh, Doug, you ought to know better. The stuff you count as "extra work that amounts to a no-op" is not a no-op at all, but the dereference of a crucial pointer. Look at those structure definitions again. The first elements of those structures are not (char []), but (char *). What is gotten by ((char *)table + (m*size)) is a pointer to this pointer (but of the wrong type (1)). The fact that this pointer to the desired pointer happens to be typed in the same way as the desired pointer itself is just a coincidence, required for reasons of doing byte-granular pointer arithmetic in C. Hence, the cast and dereference is in NO WAY a no-op, and is correct and necessary for Carl's code to work properly. The alternate (and more portable) way to do what Carl is trying to do is to pass in routines to do subscripting and comparison. But I suspect the porting problems of Carl's code as it is are miniscule. (1) To pick nits, that's a pointer to a structure containing the desred pointer as its first element, which is what the subject of this discussion thread is all about. -- In Loglan, time flies like an arrow precisely the way fruit flies like a banana; there isn't any other way of saying it. --- D.E. Cortesi in Dr Dobbs Journal -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
|
Mon, 19 Apr 1993 19:03:00 GMT |
|
|
|