Is i = *(int*) *str; /*char* str */ defined? 
Author Message
 Is i = *(int*) *str; /*char* str */ defined?

I ran across the following hash function:

   unsigned hash (char *str) {
      unsigned ret_val = 0;
      int i = 0;

      while (*str) {
         i = *( int*)str;
         ret_val ^= i;
         ret_val <<= 1;
         str ++;
      }
      return ret_val;
   }

and I am concerned that i = *(int*) str; is not defined. In fact,
on one hardware platform, this function returns different values
for the same string. Not always, but occasionally. Could it be
the hardware (a recent HP workstation)? That doesn't seem possible.



Tue, 04 Jan 2000 03:00:00 GMT  
 Is i = *(int*) *str; /*char* str */ defined?

Quote:

> I ran across the following hash function:

>    unsigned hash (char *str) {
>       unsigned ret_val = 0;
>       int i = 0;

>       while (*str) {
>          i = *( int*)str;
>          ret_val ^= i;
>          ret_val <<= 1;
>          str ++;
>       }
>       return ret_val;
>    }

> and I am concerned that i = *(int*) str; is not defined. In fact,
> on one hardware platform, this function returns different values
> for the same string. Not always, but occasionally. Could it be
> the hardware (a recent HP workstation)? That doesn't seem possible.

You should be concerned.  This is garbage.

First, on many platforms you will get a runtime error because str does
not meet the aligment requirements for an int*.

On others you will get a runtime error because sizeof(int) > 2 so on
the penultimate character (or earlier) you will have an access
violation as it tries to use memory beyond the end of the array.

On others (your HP workstation apparently is one of these) you will be
very unlucky and it will not bomb out.  Instead you will get
unreliable values as it accesses memory beyond the string.  You end up
hashing not only the string, but a few bytes of whatever happens to
follow it in memory.

About the only machines this will work on is low end systems in which
sizeof(int) == 2 and there are no alignment requirements or for
systems in which sizeof(int) == 1.  I don't know of any of the latter,
but the standard does permit it.

This code results in undefined behavior (unless sizeof(int) == 1), so
it's not the fault of your HP -- any thing it does is permitted by the
standard.

Michael M Rubenstein



Tue, 04 Jan 2000 03:00:00 GMT  
 Is i = *(int*) *str; /*char* str */ defined?

|> I ran across the following hash function:
|>
|>    unsigned hash (char *str) {
|>       unsigned ret_val = 0;
|>       int i = 0;
|>
|>       while (*str) {
|>          i = *( int*)str;
|>          ret_val ^= i;
|>          ret_val <<= 1;
|>          str ++;
|>       }
|>       return ret_val;
|>    }
|>
|> and I am concerned that i = *(int*) str; is not defined.

   "Morally", it's not. You'll have to wait for one of the Standard
   mavens to happen by to get references to the Standard that prove
   this assertion, but ... Basically, if sizeof(int) = n > 1, then
   the code above invokes undefined behavior on its last n-2 trips
   through the loop (since every one of those passes uses 'str' to
   access bytes at least one character position _beyond_ where 'str'
   is [safely] known to be valid).

|>                                                          In fact,
|> on one hardware platform, this function returns different values
|> for the same string. Not always, but occasionally. Could it be
|> the hardware (a recent HP workstation)? That doesn't seem possible.
|>

   It's highly unlikely that this is caused by the hardware. 'hash()'
   picks up whatever garbage is found in memory just past the end
   of its input string and incorporates it in the return value. Since
   garbage is unpredictable, one should not be surprised when this
   function turns out to be nondeterministic.

|>
|>

--
 Ed Hook                              |       Copula eam, se non posit
 Computer Sciences Corporation        |         acceptera jocularum.
 NASA Langley Research Center         | Me? Speak for my employer?...<*snort*>



Tue, 04 Jan 2000 03:00:00 GMT  
 Is i = *(int*) *str; /*char* str */ defined?

|> I ran across the following hash function:
|>
|>    unsigned hash (char *str) {
|>       unsigned ret_val = 0;
|>       int i = 0;
|>
|>       while (*str) {
|>          i = *( int*)str;
|>          ret_val ^= i;
|>          ret_val <<= 1;
|>          str ++;
|>       }
|>       return ret_val;
|>    }
|>
|> and I am concerned that i = *(int*) str; is not defined. In fact,
|> on one hardware platform, this function returns different values
|> for the same string. Not always, but occasionally. Could it be
|> the hardware (a recent HP workstation)? That doesn't seem possible.

While it is permissible to convert a pointer to an arbitrary object
type to a pointer to some *other* arbitrary object type when using
an appropriate cast (it yields a valid pointer), the language does not
say what happens when you try to dereference it.  For example, when
I attempt to use the above code on an SGI O2000 running IRIX64 6.4,
the result (which I pretty much expected) is:

    l7iasdev 167% testit
    Bus error (core dumped)

which is a perfectly reasonable result of invoking undefined behavior.

Of course, it's fairly easy to come up with reasons for this result.
The language states that (char *) has the least strict alignment
requirements of the object pointer types.  This being the case, it's
very likely that str is not pointing to a location that is properly
aligned on an "int boundary", if you will.  This would be the
obvious explanation to the above error.

Furthermore, there's not even any guarantee that (int *) and (char *)
are of the same *size*.  The good news is that well-written code (of
which the above-quoted hash function is not) rarely needs to rely on
the behavior of dereferencing a pointer that has been converted from
some other pointer type. [*]

The main thing to remember is that just because a pointer is valid
doesn't make dereferencing it a safe, legal, moral, or wholesome thing
to do.  As another somewhat common example, consider

    char c[1];
    char *p = c + 1;

Here, p is a valid pointer; it points "one past" the end of c, and this
is required by the language to be representable and valid.  However,
dereferencing it invokes undefined behavior.

Regards,

[*] Except, of course, in the common case of (something *) to (void *)
    back to (something *), which *is* safe and well-defined because
    the language requires that the final pointer be identical to the
    original, before its conversion to (void *).

--
Chris Engebretson --- Hughes STX Corporation | Ph#: (605)594-6829
USGS EROS Data Center, Sioux Falls, SD 57198 | Fax: (605)594-6490
Landsat 7 IAS Engineering Team -- http://ltpwww.gsfc.nasa.gov/IAS

Opinions here are not those of Hughes Aircraft, STX, or the USGS.



Tue, 04 Jan 2000 03:00:00 GMT  
 
 [ 4 post ] 

 Relevant Pages 

1. char **str vs. char *str[]

2. char *str vs. char str[]

3. Basic Question - char *str V/S int *

4. c standard regarding char str[ ], str[50] & *str

5. _T(str) vs (CString)str?

6. Dialogs: m_myCtrl.GetWindowText(str) or GetDlgItemText(IDC_MY_CTRL, str)?

7. char *str vs. char s

8. char* and char str[2]

9. Int to Str

10. int to str

11. #define str(x) #x

12. #define str[a-z].* ANSI legal?

 

 
Powered by phpBB® Forum Software