String literals and UB 
Author Message
 String literals and UB

Is there any undefined behaviour in the last two lines of this code?

char *a = "aaa\0bbb";
char *b = "aaa";
char *c = "bbb";

if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]);
if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]);

If the conditions are true, the first should print "b bbb\n" and the second
should print "c aaa\n".

--
Simon.



Wed, 19 May 2004 02:23:11 GMT  
 String literals and UB

Quote:

> Is there any undefined behaviour in the last two lines of this code?

No.

Quote:
> char *a = "aaa\0bbb";
> char *b = "aaa";
> char *c = "bbb";

> if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]);
> if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]);

> If the conditions are true, the first should print "b bbb\n" and the second
> should print "c aaa\n".

--
 pete


Thu, 20 May 2004 20:35:30 GMT  
 String literals and UB


Quote:
> Is there any undefined behaviour in the last two lines of this code?

> char *a = "aaa\0bbb";
> char *b = "aaa";
> char *c = "bbb";

Correct, but a string literal being read-only, it should be written:

char const *a = "aaa\0bbb";
char const *b = "aaa";
char const *c = "bbb";

and personnaly, I would do ...

char const *const a = "aaa\0bbb";
char const *const b = "aaa";
char const *const c = "bbb";

... until further notice.

Quote:
> if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]);

Correct.

Quote:
> if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]);

Correct.

Quote:
> If the conditions are true, the first should print "b bbb\n" and the
> second should print "c aaa\n".

BC 3.1 with "duplicate strings merged" option deactivated:
D:\CLC\B\BIBER>bc proj.prj
[nothing]

BC 3.1 with "duplicate strings merged" option activated:
D:\CLC\B\BIBER>bc proj.prj
b bbb
c aaa

Test code:
#include <stdio.h>
int main (void)
{
   char const *const a = "aaa\0bbb";
   char const *const b = "aaa";
   char const *const c = "bbb";

   if (a == b)
   {
      printf ("b %c%c%c \n", b[4], b[5], b[6]);
   }

   if (a + 4 == c)
   {
      printf ("c %c%c%c \n", c[-4], c[-3], c[-2]);
   }
   return 0;

Quote:
}

--
-ed- emdel at noos.fr
c.l.c.-FAQ http://www.eskimo.com/~scs/C-faq/top.html
C-library: http://www.dinkumware.com/htm_cl/index.html
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/


Thu, 20 May 2004 23:43:24 GMT  
 String literals and UB


Quote:


>> Is there any undefined behaviour in the last two lines of this code?

>> char *a = "aaa\0bbb";
>> char *b = "aaa";
>> char *c = "bbb";

>Correct, but a string literal being read-only, it should be written:
>char const *a = "aaa\0bbb";

Strictly "can be" rather than "should be" I think. The use of const
here is not required but could be considered a handy safety mechanism.

Mind you from reviewing your postings in the past you seem to be
slightly paranoid about using const everywhere...

Quote:
>and personnaly, I would do ...

>char const *const a = "aaa\0bbb";

....as this demonstrates !!  

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>



Fri, 21 May 2004 00:31:32 GMT  
 String literals and UB

Quote:

>Is there any undefined behaviour in the last two lines of this code?

>char *a = "aaa\0bbb";
>char *b = "aaa";
>char *c = "bbb";

>if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]);
>if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]);

>If the conditions are true, the first should print "b bbb\n" and the second
>should print "c aaa\n".

I see no undefined behavior here, because if c == a+4, then c[-4]
is the same as a[0].

I would be somewhat surprised if a compiler managed to make b and
c *both* equal to a and a+4.  When I wrote up an algorithm to merge
strings -- not for a C compiler, but for the same kind of effect
that a C compiler might achieve by sharing "hello world" with plain
"world" -- I did it by working backwards from the terminating '\0'.
Since all string literals end in this 0, it is a simple matter of
matching the strings backwards until one of them ends.  The algorithm
is therefore linear in the number of string literals involved (or
better, if you make a tree or hash table from the reversed-strings).

This algorithm will make c point to a+4, but will make a separate
string literal for b to point to, because {'a', 'a', 'a', 0} does
not match backwards from {0, 'b', ...}.
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)





Fri, 21 May 2004 00:10:43 GMT  
 String literals and UB


Quote:
>Is there any undefined behaviour in the last two lines of this code?

Yes, if the conditions are true you are accessing outside the bounds of
the defined objects.

Quote:
>char *a = "aaa\0bbb";
>char *b = "aaa";
>char *c = "bbb";

>if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]);
>if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]);

>If the conditions are true, the first should print "b bbb\n" and the second
>should print "c aaa\n".

That will probably be the case on most implementations. However it is
perfectly within a compiler's rights to store bounds information with a
pointer i.e. b and c can "know" that they are pointing to 4 byte objects
and cause traps if they are used to access outside this range. A compiler
can even use some sort of "short" indexing if it knows the size of
the object is small enough and going outside the range of this can
produce incorrect results.

--
-----------------------------------------


-----------------------------------------



Thu, 20 May 2004 22:03:29 GMT  
 String literals and UB


Quote:
> Mind you from reviewing your postings in the past you seem to be
> slightly paranoid about using const everywhere...

>>and personnaly, I would do ...

>>char const *const a = "aaa\0bbb";

> ....as this demonstrates !!  

Yes I am, and it saved my life a great number of times!

--
-ed- emdel at noos.fr
c.l.c.-FAQ http://www.eskimo.com/~scs/C-faq/top.html
C-library: http://www.dinkumware.com/htm_cl/index.html
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/



Fri, 21 May 2004 01:58:21 GMT  
 String literals and UB


Quote:


>> Mind you from reviewing your postings in the past you seem to be
>> slightly paranoid about using const everywhere...

>>>and personnaly, I would do ...

>>>char const *const a = "aaa\0bbb";

>> ....as this demonstrates !!  

>Yes I am, and it saved my life a great number of times!

Probably depends how much you use string literals. I don't use them
too much, tend to remember they're literal anyway, and thus don't need
to protect myself.

In fact its fair to say I don't use const that much at all in C.
Generally if I have a need for a real constant I use a macro, and I'm
not big on writing functions whose arguments are not allowed to be
modified.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>



Fri, 21 May 2004 05:11:58 GMT  
 String literals and UB


Quote:

>I see no undefined behavior here, because if c == a+4, then c[-4]
>is the same as a[0].


Quote:

>Yes, if the conditions are true you are accessing outside the bounds of
>the defined objects.

Argh! Who to believe ???

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>



Fri, 21 May 2004 05:13:35 GMT  
 String literals and UB


Quote:
> Probably depends how much you use string literals. I don't use them
> too much, tend to remember they're literal anyway, and thus don't need
> to protect myself.

I use them a lot (sort of AT commands interpreter, things like that...),
traces...

Quote:
> In fact its fair to say I don't use const that much at all in C.
> Generally if I have a need for a real constant I use a macro, and I'm
> not big on writing functions whose arguments are not allowed to be
> modified.

In C, const doesn't mean constant but read-only. I consider it like a
design checker.

--
-ed- emdel at noos.fr
c.l.c.-FAQ http://www.eskimo.com/~scs/C-faq/top.html
C-library: http://www.dinkumware.com/htm_cl/index.html
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/



Fri, 21 May 2004 05:25:35 GMT  
 String literals and UB


Quote:


>> Generally if I have a need for a real constant I use a macro, and I'm
>> not big on writing functions whose arguments are not allowed to be
>> modified.

>In C, const doesn't mean constant but read-only.

- a constant can't be changed
- a const object can't be changed.... hmmm ....
Difficult to define the obvious behavioural difference between those
to someone writing C.

I understand what you mean tho, and since I rarely use readonly
objects I rarely use const..

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>



Fri, 21 May 2004 05:37:03 GMT  
 String literals and UB

Quote:





>>> Generally if I have a need for a real constant I use a macro, and I'm
>>> not big on writing functions whose arguments are not allowed to be
>>> modified.

>>In C, const doesn't mean constant but read-only.

>- a constant can't be changed
>- a const object can't be changed.... hmmm ....
>Difficult to define the obvious behavioural difference between those
>to someone writing C.

An auto const object is instantiated and destroyed each time the block
is executed afresh, and can be initialized with a different value each
time.


Fri, 21 May 2004 06:19:31 GMT  
 String literals and UB

Quote:




>>>In C, const doesn't mean constant but read-only.

>>- a constant can't be changed
>>- a const object can't be changed.... hmmm ....
>>Difficult to define the obvious behavioural difference between those
>>to someone writing C.
> An auto const object is instantiated and destroyed each time the block
> is executed afresh, and can be initialized with a different value each
> time.

and don't forget volatile const objects, which can change values all over
the place.

--
 /"\                                                 m i k e   b u r r e l l

  X        AGAINST HTML MAIL,
 / \      AND NEWS TOO, dammit



Fri, 21 May 2004 06:45:16 GMT  
 String literals and UB

Quote:





>>>>In C, const doesn't mean constant but read-only.

>>>- a constant can't be changed
>>>- a const object can't be changed.... hmmm ....
>>>Difficult to define the obvious behavioural difference between those
>>>to someone writing C.

>> An auto const object is instantiated and destroyed each time the block
>> is executed afresh, and can be initialized with a different value each
>> time.

AFAICT this is a new constant being created each time, with
coincidentally the same name.

Quote:
>and don't forget volatile const objects, which can change values all over
>the place.

But then with volatile all bets are off aren't they ?

Anyway I'm well aware that const and constant are quite different. My
point was, its a little hard to explain to someone, without diving
into deep detail, such as above.....
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>



Fri, 21 May 2004 08:14:19 GMT  
 String literals and UB

Quote:


>>I see no undefined behavior here, because if c == a+4, then c[-4]
>>is the same as a[0].


>>Yes, if the conditions are true you are accessing outside the bounds of
>>the defined objects.



Quote:
>Argh! Who to believe ???

Lawrence Kirby is talking about the latitude compilers have to
make assumptions about pointers.  I am not sure it applies in
this case -- in fact, I am fairly sure it does *not* apply.

The "usual" (although I have never observed it myself, which some
might say makes it rare at best :-) ) case where this might matter
is in pointers that point to array objects.  Consider:

    unsigned char chessboard[8][8];
    unsigned char (*boardp)[8] = &chessboard[0];

Each board position in boardp[i][j], where 0 <= (i,j) <= 8, denotes
a square on the board.  We "know" (from C arithmetic and object
properties) that chessboard[][] is exactly 64 contiguous bytes,
and something like memcpy() can copy the board to or from a 64-byte
buffer elsewhere.

This implies that boardp[0][k], where 0 <= k < 64, "ought" in some
sense to access the k'th byte in that flat region of memory.  The
C standard implies, however, that a compiler is allowed to "know"
-- via boardp's type -- that boardp[0] itself is only 8 bytes long,
and therefore assume that k is between 0 and 7 inclusive -- and in
turn, generate code that fails at runtime if k is outside this
range.

We might say, then, that &boardp[i][0], for any valid i, is allowed
to be viewed as having a "limitation" of pointing to (the first
of) exactly 8 "char"s.  This "limitation", as I call it, arises
not from the actual target object(s) involved, but from the type
of the variable "boardp".

In this case, however, we had something like this:

    char *a = "aaa\0bbb";
    char *b = "aaa";
    char *c = "bbb";

This produces one to three array objects (depending on how many
string literals are merged) and three pointers, each of type
"char *".  I think the only limitation, as it were, that a compiler
may attach to all three pointer objects is that they point to (the
first of zero or more) "char"(s).

I have cross-posted this to comp.std.c, where the readers of Standard
Tea Leaves may ponder the fine print and attempt to unscrew the
inscrutable. :-)
--
In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)





Fri, 21 May 2004 08:11:59 GMT  
 
 [ 24 post ]  Go to page: [1] [2]

 Relevant Pages 

1. string concatination with string literals in C#

2. reverse string - UB

3. How to?: An enumeration that uses string literals instead of numerical ones

4. string literals

5. Backslash-space in string literals?

6. OK! cannot change String Literals

7. Utility to find duplicate string literals?

8. Backslashes inside string literals (was: Review: A New Kind of Science)

9. Accessibility of string literals

10. Storage Duration of String Literals

11. returning string literals

12. String literals.....

 

 
Powered by phpBB® Forum Software