Question about character arrays & pointers
Author |
Message |
Stephen Orze #1 / 8
|
 Question about character arrays & pointers
Lets say that I want to declare two character arrays as: char s[] = "Str1"; char t[] = "Longer string"; and a pointer to char for each, which points to the first char in the array: char *p = s; char *q = t; I was wondering why it seems I can store more chars onto the end of s. Here is a routine that concatenates t onto the end of s using pointer arithmetic to access the array elements: int i, j = 0; while ( *(p+i) != '\0') i++; /* set i to width of s */ while ( *(p+i++) = *(q+j++) ); /* copy t into s */ Why doesn't this cause a segmentation fault? I thought that if I tried to access p beyond the \0 in s, especially here since im assigning it a value, that I would get a segmentation fault. Why is it that I can store values into memory locations that I have not set aside storage for? What is going on in memory and in the pointers that allows this to happen? If someone could give me some insight into this problem I would really appreciate it. Some of you might recognize this as one of the exercises from K&R2. It is ex.5-3 on p. 107. Here is a more complete version of what I was trying: int main() { char s[] = "Str1"; char t[] = "Longer string"; char *p = s; char *q = t; int i; int j; i=0; j=0; while ( *(p+i) != '\0' ) i++; /* set i to width of s */ while ( *(p+i++) = *(q+j++) ) printf("s = %s t = %s\n", s, t); return 0; Quote: }
Thanks, Steve
|
Mon, 23 Aug 2004 14:05:15 GMT |
|
 |
Daniel Fo #2 / 8
|
 Question about character arrays & pointers
Quote: > Lets say that I want to declare two character arrays as: > char s[] = "Str1"; > char t[] = "Longer string"; > and a pointer to char for each, which points to the first char in the > array: > char *p = s; > char *q = t; > I was wondering why it seems I can store more chars onto the end of s.
You can merrily write past the end of the string and not get a segmentation fault... until you happen to touch memory your program isn't allowed to modify. Don't expect to get a segmentation fault the moment you write past the bounds of a particular object. In fact, don't expect anything at all; it is undefined behavior. -Daniel
|
Mon, 23 Aug 2004 14:17:37 GMT |
|
 |
The Magical Pon #3 / 8
|
 Question about character arrays & pointers
Quote:
> Lets say that I want to declare two character arrays as: > char s[] = "Str1"; > char t[] = "Longer string"; > and a pointer to char for each, which points to the first char in the > array: > char *p = s; > char *q = t; > <snip> > int i, j = 0; > while ( *(p+i) != '\0') i++; /* set i to width of s */ while ( > *(p+i++) = *(q+j++) ); /* copy t into s */ > Why doesn't this cause a segmentation fault?
I am The Magical Pony! If you access a random memory location, you might get a segmentation fault. But, if the random memory locaiton you access just happens to be allocated by your program everything is just fine. And dandy. An easy way to test this in the real world is to get five baseballs and a bat. Go to the mall and drop all the baseballs in random spots. Now, blindfold yourself, begin screaming (which is optional), and starting running around swinging the bat. You will find that chances are you jack somebody up good, but once in a while you'll actually hit one of your balls. (The baseballs.) In this case, running off the end of the storage for s probably puts you into the storage for t, or for some other automatic variable. Something is most probably allocated continguously with s and you are trouncing it with the code, but as far as the OS is concerned there's nothing wrong. Pony.
|
Mon, 23 Aug 2004 14:18:50 GMT |
|
 |
Daniel Fo #4 / 8
|
 Question about character arrays & pointers
Quote:
> An easy way to test this in the real world is to get five baseballs and a > bat. Go to the mall and drop all the baseballs in random spots. Now, > blindfold yourself, begin screaming (which is optional), and > starting running around swinging the bat. You will find that chances are > you jack somebody up good, but once in a while you'll actually hit one of > your balls. (The baseballs.)
What the hell are you smoking? -Daniel
|
Mon, 23 Aug 2004 14:47:18 GMT |
|
 |
Stephen Orze #5 / 8
|
 Question about character arrays & pointers
Quote:
> > Lets say that I want to declare two character arrays as: > > char s[] = "Str1"; > > char t[] = "Longer string"; > > and a pointer to char for each, which points to the first char in the > > array: > > char *p = s; > > char *q = t; > > <snip> > > int i, j = 0; > > while ( *(p+i) != '\0') i++; /* set i to width of s */ while ( > > *(p+i++) = *(q+j++) ); /* copy t into s */ > > Why doesn't this cause a segmentation fault? > I am The Magical Pony! > If you access a random memory location, you might get a segmentation > fault. But, if the random memory locaiton you access just happens to be > allocated by your program everything is just fine. And dandy. > An easy way to test this in the real world is to get five baseballs and a > bat. Go to the mall and drop all the baseballs in random spots. Now, > blindfold yourself, begin screaming (which is optional), and > starting running around swinging the bat. You will find that chances are > you jack somebody up good, but once in a while you'll actually hit one of > your balls. (The baseballs.) > In this case, running off the end of the storage for s probably puts you > into the storage for t, or for some other automatic variable. Something > is most probably allocated continguously with s and you are trouncing it > with the code, but as far as the OS is concerned there's nothing wrong. > Pony.
I think I understand what's going on now. So when my main() starts there is extra storage allocated onto the local stack. Lets just say that in memory, the beginning of t is at the end of s. For simplicity, say that s starts at address 0x0000 and t starts at 0x0100 using byte-addressing. When I go through the first while loop, p should point to 0x0004 because it doesn't count the null character. The first time through the second while copies the data at 0x0100 into 0x0005, the second time through copies 0x0101 into 0x0006, and so on. The reason there is no segfault is because there is extra room on the stack that is not allocated by any variable. But say t was more than 0xFF (256) bytes. Once I try to copy something into 0x0100 then I get a segmentation fault, if I hadn't trounced over some other variable's storage first. Is this where the undefined behavior comes from? Is there a better way to write this code so that I can avoid all possible segmentation faults or undefined behavior? Allocating enough storage for a character array comes to mind, but is there any other way to do it? Steve
|
Tue, 24 Aug 2004 03:55:13 GMT |
|
 |
l01yu #6 / 8
|
 Question about character arrays & pointers
Stephen Orzel rambled on saying: Quote: > I think I understand what's going on now. So when my main() starts there > is > extra storage allocated onto the local stack. Lets just say that in > memory, > the beginning of t is at the end of s. For simplicity, say that s starts > at address > 0x0000 and t starts at 0x0100 using byte-addressing. When I go through > the first while loop, p should point to 0x0004 because it doesn't count > the null > character. The first time through the second while copies the data at > 0x0100 > into 0x0005, the second time through copies 0x0101 into 0x0006, and so on. > The reason there is no segfault is because there is extra room on the > stack that is not allocated by any variable. > But say t was more than 0xFF (256) bytes. Once I try to copy something > into 0x0100 then I get a segmentation fault, if I hadn't trounced over > some > other variable's storage first.
Almost, you would probably be able to write over the contents of your variable t so that if you had a of more than xFF bytes you would re-write the first elements of the t array in your while loop. You would only get a segmentation fault if you tried to write over read-only memory, that is, memory not designated for your program to use and modify. The reason it is called undefined behaviour is because the C standard does not specify how the memory is defined, therefore one implementation may be diferent from another, it is not possible to say then how the OS will behave. Behaviour is undefined. Try writing code with bounds checking to see if you are at the end of the array. If you want to copy the first elements of the longer array over the smaller then test using strlen(s) so you could have /* Not tested. */ for(i=0;i<(strlen(s)-1);i++) { s[i] = t[i]; Quote: }
s[(strlen(s)-1)] = '\0'; -- [root]# rm -rf /* You know it makes sense!
|
Tue, 24 Aug 2004 06:42:10 GMT |
|
 |
Lawrence Kir #7 / 8
|
 Question about character arrays & pointers
... Quote: >I think I understand what's going on now. So when my main() starts there is >extra storage allocated onto the local stack. Lets just say that in memory, >the beginning of t is at the end of s. For simplicity, say that s starts at >address >0x0000 and t starts at 0x0100 using byte-addressing. When I go through the >first while loop, p should point to 0x0004 because it doesn't count the null >character. The first time through the second while copies the data at >0x0100 >into 0x0005, the second time through copies 0x0101 into 0x0006, and so on. >The reason there is no segfault is because there is extra room on the stack >that is not allocated by any variable.
Or perhaps you are accessing or corrupting memory that is used by other variables or internally by C's runtime system. Quote: >But say t was more than 0xFF (256) bytes. Once I try to copy something >into 0x0100 then I get a segmentation fault, if I hadn't trounced over some >other variable's storage first. Is this where the undefined behavior comes >from?
Undefined behaviour simply means that the C language no longer makes any requirements about how your program will behave; the program is in error and the implementation is at liberty to crash lock up, start acting funny in random ways or apparently continue executing the program as if nothing untoward had happened (or anything else for that matter). The state of the progream might have been terminally corrupted or it might not, strange effects might startb happening immediately or at some arbitrary point in the future. Quote: > Is there a better way to write this code so that I can avoid all >possible >segmentation faults or undefined behavior? Allocating enough storage >for a character array comes to mind, but is there any other way to do it?
Check your boundaries and ensure that you don't exceed them. -- -----------------------------------------
-----------------------------------------
|
Wed, 25 Aug 2004 23:21:17 GMT |
|
 |
Umesh P Nai #8 / 8
|
 Question about character arrays & pointers
On Thu, 07 Mar 2002 06:05:15 GMT, Stephen Orzel
Quote: > Lets say that I want to declare two character arrays as: > char s[] = "Str1"; > char t[] = "Longer string"; > and a pointer to char for each, which points to the first > char in the array: > char *p = s; > char *q = t; > I was wondering why it seems I can store more chars onto the > end of s. Here is a routine that > concatenates t onto the end of s using pointer arithmetic to > access the array elements: > int i, j = 0; > while ( *(p+i) != '\0') i++; /* set i to width of s > */ while ( *(p+i++) = *(q+j++) ); /* copy t into s */ > Why doesn't this cause a segmentation fault?
<snip> The result of what you are doing is not guaranteed to be a segmentation fault. It is "undefined behavior". Can be anything from not causing any obvious problems to crashing your system. Your compiler, which got only a pointer without any information on how much memory is allocated for it, cannot give an error when you do it. Your runtime system may give an error, but not necessarily. You need to make sure this will not happen. That is one painful thing in C. - Umesh -- Umesh P Nair Remove 'z's from my e-mail ID
|
Mon, 23 Aug 2004 14:49:38 GMT |
|
|
|