Author |
Message |
Simon Bibe #1 / 24
|
 String literals and UB
Is there any undefined behaviour in the last two lines of this code? char *a = "aaa\0bbb"; char *b = "aaa"; char *c = "bbb"; if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]); if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]); If the conditions are true, the first should print "b bbb\n" and the second should print "c aaa\n". -- Simon.
|
Wed, 19 May 2004 02:23:11 GMT |
|
 |
pete #2 / 24
|
 String literals and UB
Quote:
> Is there any undefined behaviour in the last two lines of this code?
No. Quote: > char *a = "aaa\0bbb"; > char *b = "aaa"; > char *c = "bbb"; > if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]); > if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]); > If the conditions are true, the first should print "b bbb\n" and the second > should print "c aaa\n".
-- pete
|
Thu, 20 May 2004 20:35:30 GMT |
|
 |
Emmanuel Delahay #3 / 24
|
 String literals and UB
Quote: > Is there any undefined behaviour in the last two lines of this code? > char *a = "aaa\0bbb"; > char *b = "aaa"; > char *c = "bbb";
Correct, but a string literal being read-only, it should be written: char const *a = "aaa\0bbb"; char const *b = "aaa"; char const *c = "bbb"; and personnaly, I would do ... char const *const a = "aaa\0bbb"; char const *const b = "aaa"; char const *const c = "bbb"; ... until further notice. Quote: > if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]);
Correct. Quote: > if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]);
Correct. Quote: > If the conditions are true, the first should print "b bbb\n" and the > second should print "c aaa\n".
BC 3.1 with "duplicate strings merged" option deactivated: D:\CLC\B\BIBER>bc proj.prj [nothing] BC 3.1 with "duplicate strings merged" option activated: D:\CLC\B\BIBER>bc proj.prj b bbb c aaa Test code: #include <stdio.h> int main (void) { char const *const a = "aaa\0bbb"; char const *const b = "aaa"; char const *const c = "bbb"; if (a == b) { printf ("b %c%c%c \n", b[4], b[5], b[6]); } if (a + 4 == c) { printf ("c %c%c%c \n", c[-4], c[-3], c[-2]); } return 0; Quote: }
-- -ed- emdel at noos.fr c.l.c.-FAQ http://www.eskimo.com/~scs/C-faq/top.html C-library: http://www.dinkumware.com/htm_cl/index.html FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
|
Thu, 20 May 2004 23:43:24 GMT |
|
 |
Mark McIntyr #4 / 24
|
 String literals and UB
Quote:
>> Is there any undefined behaviour in the last two lines of this code? >> char *a = "aaa\0bbb"; >> char *b = "aaa"; >> char *c = "bbb"; >Correct, but a string literal being read-only, it should be written: >char const *a = "aaa\0bbb";
Strictly "can be" rather than "should be" I think. The use of const here is not required but could be considered a handy safety mechanism. Mind you from reviewing your postings in the past you seem to be slightly paranoid about using const everywhere... Quote: >and personnaly, I would do ... >char const *const a = "aaa\0bbb";
....as this demonstrates !! -- Mark McIntyre CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
|
Fri, 21 May 2004 00:31:32 GMT |
|
 |
Chris Tore #5 / 24
|
 String literals and UB
Quote:
>Is there any undefined behaviour in the last two lines of this code? >char *a = "aaa\0bbb"; >char *b = "aaa"; >char *c = "bbb"; >if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]); >if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]); >If the conditions are true, the first should print "b bbb\n" and the second >should print "c aaa\n".
I see no undefined behavior here, because if c == a+4, then c[-4] is the same as a[0]. I would be somewhat surprised if a compiler managed to make b and c *both* equal to a and a+4. When I wrote up an algorithm to merge strings -- not for a C compiler, but for the same kind of effect that a C compiler might achieve by sharing "hello world" with plain "world" -- I did it by working backwards from the terminating '\0'. Since all string literals end in this 0, it is a simple matter of matching the strings backwards until one of them ends. The algorithm is therefore linear in the number of string literals involved (or better, if you make a tree or hash table from the reversed-strings). This algorithm will make c point to a+4, but will make a separate string literal for b to point to, because {'a', 'a', 'a', 0} does not match backwards from {0, 'b', ...}. -- In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
|
Fri, 21 May 2004 00:10:43 GMT |
|
 |
Lawrence Kir #6 / 24
|
 String literals and UB
Quote: >Is there any undefined behaviour in the last two lines of this code?
Yes, if the conditions are true you are accessing outside the bounds of the defined objects. Quote: >char *a = "aaa\0bbb"; >char *b = "aaa"; >char *c = "bbb"; >if (a == b) printf ("b %c%c%c \n", b[4], b[5], b[6]); >if (a+4 == c) printf ("c %c%c%c \n", c[-4], c[-3], c[-2]); >If the conditions are true, the first should print "b bbb\n" and the second >should print "c aaa\n".
That will probably be the case on most implementations. However it is perfectly within a compiler's rights to store bounds information with a pointer i.e. b and c can "know" that they are pointing to 4 byte objects and cause traps if they are used to access outside this range. A compiler can even use some sort of "short" indexing if it knows the size of the object is small enough and going outside the range of this can produce incorrect results. -- -----------------------------------------
-----------------------------------------
|
Thu, 20 May 2004 22:03:29 GMT |
|
 |
Emmanuel Delahay #7 / 24
|
 String literals and UB
Quote: > Mind you from reviewing your postings in the past you seem to be > slightly paranoid about using const everywhere... >>and personnaly, I would do ... >>char const *const a = "aaa\0bbb"; > ....as this demonstrates !!
Yes I am, and it saved my life a great number of times! -- -ed- emdel at noos.fr c.l.c.-FAQ http://www.eskimo.com/~scs/C-faq/top.html C-library: http://www.dinkumware.com/htm_cl/index.html FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
|
Fri, 21 May 2004 01:58:21 GMT |
|
 |
Mark McIntyr #8 / 24
|
 String literals and UB
Quote:
>> Mind you from reviewing your postings in the past you seem to be >> slightly paranoid about using const everywhere... >>>and personnaly, I would do ... >>>char const *const a = "aaa\0bbb"; >> ....as this demonstrates !! >Yes I am, and it saved my life a great number of times!
Probably depends how much you use string literals. I don't use them too much, tend to remember they're literal anyway, and thus don't need to protect myself. In fact its fair to say I don't use const that much at all in C. Generally if I have a need for a real constant I use a macro, and I'm not big on writing functions whose arguments are not allowed to be modified. -- Mark McIntyre CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
|
Fri, 21 May 2004 05:11:58 GMT |
|
 |
Mark McIntyr #9 / 24
|
 String literals and UB
Quote: >I see no undefined behavior here, because if c == a+4, then c[-4] >is the same as a[0].
Quote:
>Yes, if the conditions are true you are accessing outside the bounds of >the defined objects.
Argh! Who to believe ??? -- Mark McIntyre CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
|
Fri, 21 May 2004 05:13:35 GMT |
|
 |
Emmanuel Delahay #10 / 24
|
 String literals and UB
Quote: > Probably depends how much you use string literals. I don't use them > too much, tend to remember they're literal anyway, and thus don't need > to protect myself.
I use them a lot (sort of AT commands interpreter, things like that...), traces... Quote: > In fact its fair to say I don't use const that much at all in C. > Generally if I have a need for a real constant I use a macro, and I'm > not big on writing functions whose arguments are not allowed to be > modified.
In C, const doesn't mean constant but read-only. I consider it like a design checker. -- -ed- emdel at noos.fr c.l.c.-FAQ http://www.eskimo.com/~scs/C-faq/top.html C-library: http://www.dinkumware.com/htm_cl/index.html FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/
|
Fri, 21 May 2004 05:25:35 GMT |
|
 |
Mark McIntyr #11 / 24
|
 String literals and UB
Quote:
>> Generally if I have a need for a real constant I use a macro, and I'm >> not big on writing functions whose arguments are not allowed to be >> modified. >In C, const doesn't mean constant but read-only.
- a constant can't be changed - a const object can't be changed.... hmmm .... Difficult to define the obvious behavioural difference between those to someone writing C. I understand what you mean tho, and since I rarely use readonly objects I rarely use const.. -- Mark McIntyre CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
|
Fri, 21 May 2004 05:37:03 GMT |
|
 |
Kaz Kylhe #12 / 24
|
 String literals and UB
Quote:
>>> Generally if I have a need for a real constant I use a macro, and I'm >>> not big on writing functions whose arguments are not allowed to be >>> modified. >>In C, const doesn't mean constant but read-only. >- a constant can't be changed >- a const object can't be changed.... hmmm .... >Difficult to define the obvious behavioural difference between those >to someone writing C.
An auto const object is instantiated and destroyed each time the block is executed afresh, and can be initialized with a different value each time.
|
Fri, 21 May 2004 06:19:31 GMT |
|
 |
mike burrel #13 / 24
|
 String literals and UB
Quote:
>>>In C, const doesn't mean constant but read-only. >>- a constant can't be changed >>- a const object can't be changed.... hmmm .... >>Difficult to define the obvious behavioural difference between those >>to someone writing C. > An auto const object is instantiated and destroyed each time the block > is executed afresh, and can be initialized with a different value each > time.
and don't forget volatile const objects, which can change values all over the place. -- /"\ m i k e b u r r e l l
X AGAINST HTML MAIL, / \ AND NEWS TOO, dammit
|
Fri, 21 May 2004 06:45:16 GMT |
|
 |
Mark McIntyr #14 / 24
|
 String literals and UB
Quote:
>>>>In C, const doesn't mean constant but read-only. >>>- a constant can't be changed >>>- a const object can't be changed.... hmmm .... >>>Difficult to define the obvious behavioural difference between those >>>to someone writing C. >> An auto const object is instantiated and destroyed each time the block >> is executed afresh, and can be initialized with a different value each >> time.
AFAICT this is a new constant being created each time, with coincidentally the same name. Quote: >and don't forget volatile const objects, which can change values all over >the place.
But then with volatile all bets are off aren't they ? Anyway I'm well aware that const and constant are quite different. My point was, its a little hard to explain to someone, without diving into deep detail, such as above..... -- Mark McIntyre CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
|
Fri, 21 May 2004 08:14:19 GMT |
|
 |
Chris Tore #15 / 24
|
 String literals and UB
Quote:
>>I see no undefined behavior here, because if c == a+4, then c[-4] >>is the same as a[0].
>>Yes, if the conditions are true you are accessing outside the bounds of >>the defined objects.
Quote: >Argh! Who to believe ???
Lawrence Kirby is talking about the latitude compilers have to make assumptions about pointers. I am not sure it applies in this case -- in fact, I am fairly sure it does *not* apply. The "usual" (although I have never observed it myself, which some might say makes it rare at best :-) ) case where this might matter is in pointers that point to array objects. Consider: unsigned char chessboard[8][8]; unsigned char (*boardp)[8] = &chessboard[0]; Each board position in boardp[i][j], where 0 <= (i,j) <= 8, denotes a square on the board. We "know" (from C arithmetic and object properties) that chessboard[][] is exactly 64 contiguous bytes, and something like memcpy() can copy the board to or from a 64-byte buffer elsewhere. This implies that boardp[0][k], where 0 <= k < 64, "ought" in some sense to access the k'th byte in that flat region of memory. The C standard implies, however, that a compiler is allowed to "know" -- via boardp's type -- that boardp[0] itself is only 8 bytes long, and therefore assume that k is between 0 and 7 inclusive -- and in turn, generate code that fails at runtime if k is outside this range. We might say, then, that &boardp[i][0], for any valid i, is allowed to be viewed as having a "limitation" of pointing to (the first of) exactly 8 "char"s. This "limitation", as I call it, arises not from the actual target object(s) involved, but from the type of the variable "boardp". In this case, however, we had something like this: char *a = "aaa\0bbb"; char *b = "aaa"; char *c = "bbb"; This produces one to three array objects (depending on how many string literals are merged) and three pointers, each of type "char *". I think the only limitation, as it were, that a compiler may attach to all three pointer objects is that they point to (the first of zero or more) "char"(s). I have cross-posted this to comp.std.c, where the readers of Standard Tea Leaves may ponder the fine print and attempt to unscrew the inscrutable. :-) -- In-Real-Life: Chris Torek, Wind River Systems (BSD engineering)
|
Fri, 21 May 2004 08:11:59 GMT |
|
|
Page 1 of 2
|
[ 24 post ] |
|
Go to page:
[1]
[2] |
|