reverse string - UB 
Author Message
 reverse string - UB

A few days back this was posted as a solution for revsering a string
#include <stdio.h>
char str[] = "ABCDE";
void rev(char *);
int main(void) {
  printf("Before: %s\n", str);
  rev(str);
  printf(" After: %s\n", str);
  return 0;
Quote:
}

void rev(char *s) {
  char *d = s;
  char c;
  while (*d) ++d;
  --d;
  while (d > s) {
    c = *s;
    *s++ = *d;
    *d-- = c;    
  }

Quote:
}

it was pointed out that if *s = '\0' then it is invokes UB.am not able
to get how? i was thinking that d > s would be false and the loop
would terminate. is it not guaranteed?
-sridhar


Tue, 05 Apr 2005 00:27:27 GMT  
 reverse string - UB

Quote:
> A few days back this was posted as a solution for revsering a string
> #include <stdio.h>
> char str[] = "ABCDE";
> void rev(char *);
> int main(void) {
>   printf("Before: %s\n", str);
>   rev(str);
>   printf(" After: %s\n", str);
>   return 0;
> }
> void rev(char *s) {
>   char *d = s;
>   char c;
>   while (*d) ++d;
>   --d;
>   while (d > s) {
>     c = *s;
>     *s++ = *d;
>     *d-- = c;    
>   }
> }
> it was pointed out that if *s = '\0' then it is invokes UB.am not able
> to get how? i was thinking that d > s would be false and the loop
> would terminate. is it not guaranteed?
> -sridhar

If *s == '\0', the while loop will be skipped, thus keeping d pointing
at the same address as s. --d will then make d point to the previous
byte, which quite probably does not belong to your program's memory
space. There's the undefined behaviour.

--

| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste       W++ B OP+                     |
\----------------------------------------- Finland rules! ------------/
"And according to Occam's Toothbrush, we only need to optimise the most frequent
instructions."
   - Teemu Kerola



Tue, 05 Apr 2005 00:30:36 GMT  
 reverse string - UB

Quote:
> A few days back this was posted as a solution for revsering a string
> #include <stdio.h>
> char str[] = "ABCDE";
> void rev(char *);
> int main(void) {
>   printf("Before: %s\n", str);
>   rev(str);
>   printf(" After: %s\n", str);
>   return 0;
> }
> void rev(char *s) {
>   char *d = s;
>   char c;
>   while (*d) ++d;
>   --d;
>   while (d > s) {
>     c = *s;
>     *s++ = *d;
>     *d-- = c;    
>   }
> }

> it was pointed out that if *s = '\0' then it is invokes UB.am not able
> to get how? i was thinking that d > s would be false and the loop
> would terminate. is it not guaranteed?
> -sridhar

I'm not quite sure about this, but I'ld expect that this invoked
UB because pointer comparision is only defined for pointers
pointing into the same array (or exactly one poition past the end
of the array). But if s pointed to an "" (empty string), than d
would at the point where d and s are compared point to some
address befor the actual array and here we go UB...

Probably somebody else can assure my assumption.
--

"LISP  is worth learning for  the profound enlightenment  experience
you will have when you finally get it; that experience will make you
a better programmer for the rest of your days."   -- Eric S. Raymond



Tue, 05 Apr 2005 00:38:32 GMT  
 reverse string - UB

Quote:

>> A few days back this was posted as a solution for revsering a string
>> #include <stdio.h>
>> char str[] = "ABCDE";
>> void rev(char *);
>> int main(void) {
>>   printf("Before: %s\n", str);
>>   rev(str);
>>   printf(" After: %s\n", str);
>>   return 0;
>> }
>> void rev(char *s) {
>>   char *d = s;
>>   char c;
>>   while (*d) ++d;
>>   --d;
>>   while (d > s) {
>>     c = *s;
>>     *s++ = *d;
>>     *d-- = c;    
>>   }
>> }

>> it was pointed out that if *s = '\0' then it is invokes UB.am not able
>> to get how? i was thinking that d > s would be false and the loop
>> would terminate. is it not guaranteed?
>> -sridhar

> If *s == '\0', the while loop will be skipped, thus keeping d pointing
> at the same address as s. --d will then make d point to the previous
> byte, which quite probably does not belong to your program's memory
> space. There's the undefined behaviour.

As long as d isn't dereferenced, I don't see how this is UB.

--

"LISP  is worth learning for  the profound enlightenment  experience
you will have when you finally get it; that experience will make you
a better programmer for the rest of your days."   -- Eric S. Raymond



Tue, 05 Apr 2005 00:39:45 GMT  
 reverse string - UB

Quote:


>>> A few days back this was posted as a solution for revsering a string
>>> #include <stdio.h>
>>> char str[] = "ABCDE";
>>> void rev(char *);
>>> int main(void) {
>>>   printf("Before: %s\n", str);
>>>   rev(str);
>>>   printf(" After: %s\n", str);
>>>   return 0;
>>> }
>>> void rev(char *s) {
>>>   char *d = s;
>>>   char c;
>>>   while (*d) ++d;
>>>   --d;
>>>   while (d > s) {
>>>     c = *s;
>>>     *s++ = *d;
>>>     *d-- = c;    
>>>   }
>>> }

>>> it was pointed out that if *s = '\0' then it is invokes UB.am not able
>>> to get how? i was thinking that d > s would be false and the loop
>>> would terminate. is it not guaranteed?
>>> -sridhar

>> If *s == '\0', the while loop will be skipped, thus keeping d pointing
>> at the same address as s. --d will then make d point to the previous
>> byte, which quite probably does not belong to your program's memory
>> space. There's the undefined behaviour.
> As long as d isn't dereferenced, I don't see how this is UB.

IHNRTS, but AFAIK even forming a non-null pointer into non-accessible
memory invokes UB. But then, EICBW.
I suppose the real gurus on this newsgroup (Heathfield, Pfaff, Pop,
Torek and a bunch of others) can shed more light on this.

--

| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste       W++ B OP+                     |
\----------------------------------------- Finland rules! ------------/
"'It can be easily shown that' means 'I saw a proof of this once (which I didn't
understand) which I can no longer remember'."
   - A maths teacher



Tue, 05 Apr 2005 00:52:48 GMT  
 reverse string - UB

Quote:


> >> A few days back this was posted as a solution for revsering a string
> >> #include <stdio.h>
> >> char str[] = "ABCDE";
> >> void rev(char *);
> >> int main(void) {
> >>   printf("Before: %s\n", str);
> >>   rev(str);
> >>   printf(" After: %s\n", str);
> >>   return 0;
> >> }
> >> void rev(char *s) {
> >>   char *d = s;
> >>   char c;
> >>   while (*d) ++d;
> >>   --d;
> >>   while (d > s) {
> >>     c = *s;
> >>     *s++ = *d;
> >>     *d-- = c;
> >>   }
> >> }

> >> it was pointed out that if *s = '\0' then it is invokes UB.am not able
> >> to get how? i was thinking that d > s would be false and the loop
> >> would terminate. is it not guaranteed?
> >> -sridhar

> > If *s == '\0', the while loop will be skipped, thus keeping d pointing
> > at the same address as s. --d will then make d point to the previous
> > byte, which quite probably does not belong to your program's memory
> > space. There's the undefined behaviour.

> As long as d isn't dereferenced, I don't see how this is UB.

It results in UB. The pointer arithmetic is defined only in an array
object including one past its last element.

int a[10];
int *p;

p = a+10;    /* okay */
*p;          /* UB */
p = a-1;     /* UB */

--

Dept. of Physics, Univ. of Seoul



Tue, 05 Apr 2005 01:11:33 GMT  
 reverse string - UB

Quote:



>>> A few days back this was posted as a solution for revsering a string
>>> #include <stdio.h>
>>> char str[] = "ABCDE";
>>> void rev(char *);
>>> int main(void) {
>>>   printf("Before: %s\n", str);
>>>   rev(str);
>>>   printf(" After: %s\n", str);
>>>   return 0;
>>> }
>>> void rev(char *s) {
>>>   char *d = s;
>>>   char c;
>>>   while (*d) ++d;
>>>   --d;
>>>   while (d > s) {
>>>     c = *s;
>>>     *s++ = *d;
>>>     *d-- = c;    
>>>   }
>>> }

>>> it was pointed out that if *s = '\0' then it is invokes UB.am not able
>>> to get how? i was thinking that d > s would be false and the loop
>>> would terminate. is it not guaranteed?
>>> -sridhar

>> If *s == '\0', the while loop will be skipped, thus keeping d pointing
>> at the same address as s. --d will then make d point to the previous
>> byte, which quite probably does not belong to your program's memory
>> space. There's the undefined behaviour.

> As long as d isn't dereferenced, I don't see how this is UB.

According to my understanding of section 6.5.6 of the standard, it is
undefined behavior even if d is not dereferenced. Specifically:

| (8) When an expression that has integer type is added to or
| subtracted from a pointer, the result has the type of the pointer
| operand. [...] If both the pointer operand and the result point to
                                                 ^^^^^^^^^^
| elements of the same array object, or one past the last element of
| the array object, the evaluation shall not produce an overflow;
| otherwise, the behavior is undefined. If the result points one past
| the last element of the array object, it shall not be used as the
| operand of a unary * operator that is evaluated.

Since the result of the expression --d points before the object, I
think it invokes UB.

Martin



Tue, 05 Apr 2005 01:36:47 GMT  
 reverse string - UB

[ snip ]

Quote:
>>If *s == '\0', the while loop will be skipped, thus keeping d pointing
>>at the same address as s. --d will then make d point to the previous
>>byte, which quite probably does not belong to your program's memory
>>space. There's the undefined behaviour.

> As long as d isn't dereferenced, I don't see how this is UB.

At the risk of grossly oversimplifying things, pointer arithmetic
in C is only permitted within array objects.  (Regions of memory
set aside by *alloc() are considered to be arrays for this purpose,
and non-array objects are special cases of "arrays of one element.")

With that in mind, you can use pointer arithmetic to walk through
an array, and you are also allowed to use said arithmetic to create
a pointer that points *one element past* the end of an array
(though you are not allowed to dereference such a pointer, for
obvious reasons.)  With any pointer arithmetic beyond that, all
bets are off.  This includes cases where you have a pointer that
points to one element "before" the start of an array, which is what
we are dealing with in this case.

Incidentally, this is the reason that the following (ugly) code is
not legal C:

     int a[10];
     int *ptr_a = a - 1;

     /* ptr_a can now be used to treat a as a one-based array */
     ptr_a[1] = ...

This "clever" trick was commonly-used in early editions of the
book "Numerical Recipes in C."  This book is a great reference for
mathematical algorithms, but not a particularly good reference for
the C programming language.

Regards,

--
Chris Engebretson - Raytheon Systems Company | Ph#: (605)594-6829
USGS EROS Data Center, Sioux Falls, SD 57198 | Fax: (605)594-6940
Software Project Lead -- Landsat 7 Level 1 Sustaining Engineering



Tue, 05 Apr 2005 01:26:50 GMT  
 reverse string - UB

Quote:
> Incidentally, this is the reason that the following (ugly) code is
> not legal C:
>      int a[10];
>      int *ptr_a = a - 1;
>      /* ptr_a can now be used to treat a as a one-based array */
>      ptr_a[1] = ...
> This "clever" trick was commonly-used in early editions of the
> book "Numerical Recipes in C."  This book is a great reference for
> mathematical algorithms, but not a particularly good reference for
> the C programming language.

We once had a glorious flamewar about that thing... it was called
"Different array indexing" or something like that.

--

| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste       W++ B OP+                     |
\----------------------------------------- Finland rules! ------------/
"And according to Occam's Toothbrush, we only need to optimise the most frequent
instructions."
   - Teemu Kerola



Tue, 05 Apr 2005 01:39:39 GMT  
 reverse string - UB



Quote:



> >>> A few days back this was posted as a solution for revsering a string
> >>> #include <stdio.h>
> >>> char str[] = "ABCDE";
> >>> void rev(char *);
> >>> int main(void) {
> >>>   printf("Before: %s\n", str);
> >>>   rev(str);
> >>>   printf(" After: %s\n", str);
> >>>   return 0;
> >>> }
> >>> void rev(char *s) {
> >>>   char *d = s;
> >>>   char c;
> >>>   while (*d) ++d;
> >>>   --d;
> >>>   while (d > s) {
> >>>     c = *s;
> >>>     *s++ = *d;
> >>>     *d-- = c;
> >>>   }
> >>> }

> >>> it was pointed out that if *s = '\0' then it is invokes UB.am not able
> >>> to get how? i was thinking that d > s would be false and the loop
> >>> would terminate. is it not guaranteed?
> >>> -sridhar

> >> If *s == '\0', the while loop will be skipped, thus keeping d pointing
> >> at the same address as s. --d will then make d point to the previous
> >> byte, which quite probably does not belong to your program's memory
> >> space. There's the undefined behaviour.

> > As long as d isn't dereferenced, I don't see how this is UB.

> IHNRTS, but AFAIK even forming a non-null pointer into non-accessible
> memory invokes UB. But then, EICBW.
> I suppose the real gurus on this newsgroup (Heathfield, Pfaff, Pop,
> Torek and a bunch of others) can shed more light on this.

----------------------------------------------------------------

ISO/IEC 9899:1999 (E)

  6.5.3.1 Prefix increment and decrement operators

  Constraints

1 The operand of the prefix increment or decrement operator
  shall have qualified or unqualified real or pointer type
  and shall be a modifiable lvalue.

  Semantics

2 The value of the operand of the prefix ++ operator is
  incremented. The result is the new value of the operand
  after incrementation. The expression ++E is equivalent to   <<=====
  (E+=1). See the discussions of additive operators and
  compound assignment for information on constraints, types,
  side effects, and conversions and the effects of
  operations on pointers.

3 The prefix -- operator is analogous to the prefix ++
  operator, except that the value of the operand is
  decremented.

  Forward references: additive operators (6.5.6), compound
  assignment (6.5.16.2).

  6.5.6 Additive operators

8 When an expression that has integer type is added to or
  subtracted from a pointer, the result has the type of the
  pointer operand. If the pointer operand points to an
  element of an array object, and the array is large enough,
  the result points to an element offset from the original
  element such that the difference of the subscripts of the
  resulting and original array elements equals the integer
  expression. In other words, if the expression P points to
  the i-th element of an array object, the expressions (P)+N
  (equivalently, N+(P)) and (P)-N (where N has the value n)
  point to, respectively, the i+n-th and i-n-th elements of
  the array object, provided they exist. Moreover, if the
  expression P points to the last element of an array object,
  the expression (P)+1 points one past the last element of
  the array object, and if the expression Q points one past
  the last element of an array object, the expression (Q)-1
  points to the last element of the array object. If both the   <<======
  pointer operand and the result point to elements of the same
  array object, or one past the last element of the array
  object, the evaluation shall not produce an overflow;
  otherwise, the behavior is undefined. If the result points
  one past the last element of the array object, it shall not
  be used as the operand of a unary * operator that is evaluated.

  6.5.16.2 Compound assignment

  Constraints

1 For the operators += and -= only, either the left operand
  shall be a pointer to an object type and the right shall
  have integer type, or the left operand shall have
  qualified or unqualified arithmetic type and the right
  shall have arithmetic type.

2 For the other operators, each operand shall have
  arithmetic type consistent with those allowed by the
  corresponding binary operator.

  Semantics

3 A compound assignment of the form E1 op =E2 differs from
  the simple assignment expression E1 = E1 op (E2) only in
  that the lvalue E1 is evaluated only once.

----------------------------------------------------------------

-Mike



Tue, 05 Apr 2005 01:54:08 GMT  
 reverse string - UB

Quote:



> [ snip ]

>>> If *s == '\0', the while loop will be skipped, thus keeping d pointing
>>> at the same address as s. --d will then make d point to the previous
>>> byte, which quite probably does not belong to your program's memory
>>> space. There's the undefined behaviour.

>> As long as d isn't dereferenced, I don't see how this is UB.

> At the risk of grossly oversimplifying things, pointer arithmetic
> in C is only permitted within array objects.  

Also, if you have a pointer to any object, you can cast
this pointer to char* (or maybe you need a char pointer...)
and increment it. This does not allow you to treat the
object as a char array, but you can walk the array char by
char (or byte by byte as the case may be).

--
Thomas.



Tue, 05 Apr 2005 01:05:30 GMT  
 reverse string - UB

Quote:

> A few days back this was posted as a solution for revsering a string
> #include <stdio.h>
> char str[] = "ABCDE";
> void rev(char *);
> int main(void) {
>   printf("Before: %s\n", str);
>   rev(str);
>   printf(" After: %s\n", str);
>   return 0;
> }
> void rev(char *s) {
>   char *d = s;
>   char c;
>   while (*d) ++d;
>   --d;
>   while (d > s) {
>     c = *s;
>     *s++ = *d;
>     *d-- = c;
>   }
> }

> it was pointed out that if *s = '\0' then it is invokes UB.am not able
> to get how? i was thinking that d > s would be false and the loop
> would terminate. is it not guaranteed?

No, it isn't.

If *s is 0, then while(*d) ++d; won't affect d at all, and --d will
therefore decrement the pointer, so that it points to the byte *before*
the start of s. Okay, enough of the empty string. Let's focus on the
behaviour question now, which is: "is the behaviour of a program defined
by the Standard if we point a pointer to the place just *before* the
start of an object?", and to answer that question, let's pick a string
we can see: "hello".

Now, s is a pointer to a character, but in practice we can deduce that
it's a pointer to the first element of an array of characters. Let's
draw that:

+------+------+------+------+------+------+------+------+------+
|      |      |      |      |      |      |      |      |      |
| ???  |  h   |  e   |  l   |  l   |  o   | \0   | ???  | ???  |
|      |      |      |      |      |      |      |      |      |
+------+------+------+------+------+------+------+------+------+
          ^
          |
          s

Now, the Standard defines the behaviour if d points at the same place as
s, or at any place up to the end of that array, or one past the end:

+------+------+------+------+------+------+------+------+------+
|      |      |      |      |      |      |      |      |      |
| ???  |  h   |  e   |  l   |  l   |  o   | \0   | ???  | ???  |
|      |      |      |      |      |      |      |      |      |
+------+------+------+------+------+------+------+------+------+
          ^
  [1]     |                                               [2]
          s
        <<<---- behaviour defined for d pointing ---->>>
                    to any of these characters.

But if d points to the character marked [1], or the character marked
[2], the Standard does not define the behaviour of the program.
Therefore, the behaviour is undefined.

I can't give you a citation because there isn't one, except possibly
this extract from the section on Conformance:

If a shall or shall not requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior is
otherwise indicated in this International Standard by the words
undefined behavior or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they all
describe behavior that is undefined.

HTH.

--

"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton



Tue, 05 Apr 2005 01:43:04 GMT  
 reverse string - UB


Quote:


>[ snip ]

>>>If *s == '\0', the while loop will be skipped, thus keeping d pointing
>>>at the same address as s. --d will then make d point to the previous
>>>byte, which quite probably does not belong to your program's memory
>>>space. There's the undefined behaviour.

>> As long as d isn't dereferenced, I don't see how this is UB.

>At the risk of grossly oversimplifying things, pointer arithmetic
>in C is only permitted within array objects.  (Regions of memory
>set aside by *alloc() are considered to be arrays for this purpose,
>and non-array objects are special cases of "arrays of one element.")

This brings up an interesting question: is one allowed to produce a
pointer "one past" a non-array object?  E.g.,

    int x;
    int * p = &x;

    ++p;  /*  allowed?  */

(I realize that one can't dereference the pointer, a separate issue.)

--Ben

--



Tue, 05 Apr 2005 02:22:44 GMT  
 reverse string - UB

Quote:

> This brings up an interesting question: is one allowed to produce a
> pointer "one past" a non-array object?  E.g.,

>    int x;
>    int * p = &x;

>    ++p;  /*  allowed?  */

> (I realize that one can't dereference the pointer, a separate issue.)

Yes. Section 6.5.6:

| (7) For the purposes of these operators, a pointer to an object that
| is not an element of an array behaves the same as a pointer to the
| first element of an array of length one with the type of the object
| as its element type.

Martin



Tue, 05 Apr 2005 02:49:58 GMT  
 reverse string - UB

...

Quote:

> As long as d isn't dereferenced, I don't see how this is UB.

Thank you all, for clearing this up. I thought that only
comparison of pointers was restricted to array bounds but not
all pointer arithmetic.

Thanx again.

--

"LISP  is worth learning for  the profound enlightenment  experience
you will have when you finally get it; that experience will make you
a better programmer for the rest of your days."   -- Eric S. Raymond



Tue, 05 Apr 2005 15:35:28 GMT  
 
 [ 25 post ]  Go to page: [1] [2]

 Relevant Pages 

1. String literals and UB

2. Reversing strings in place with pointers

3. reversing strings

4. REVERSE A STRING ?

5. Smarter compare (string reverse?)

6. Reversing string with pointers

7. Help - reverse string

8. Reversing a string

9. Reversing a string in place with pointers

10. Reversing strings in place with pointers

11. Parsing in reverse through a string?

12. Reversing a string word by word

 

 
Powered by phpBB® Forum Software