Author |
Message |
Michael L. Hostbae #1 / 9
|
 strip string
Hi, How would I strip fx. all html tags from a char string ? I would like to do it without including the regexp library. any help much appreciated.. -- Regards, Michael L. Hostbaek -= So long, and thanks for all the fish.. =-
|
Sun, 27 Jun 2004 18:11:24 GMT |
|
 |
Micah Cowa #2 / 9
|
 strip string
Quote: > Hi, > How would I strip fx. all html tags from a char string ? > I would like to do it without including the regexp library. > any help much appreciated..
Here is one algorithm: 1. Find the first '<' 2. Now find the first occurance of '>' after that. 3. Copy each character after the '>' to where the last '<' was, until another '<' is encountered, at which point go back to step 2. 4. If a '<' occurs without a following '>', emit an error. Micah
|
Sun, 27 Jun 2004 19:05:11 GMT |
|
 |
k.. #3 / 9
|
 strip string
Quote:
>> Hi, >> How would I strip fx. all html tags from a char string ? >> I would like to do it without including the regexp library. >> any help much appreciated.. > Here is one algorithm: > 1. Find the first '<' > 2. Now find the first occurance of '>' after that. > 3. Copy each character after the '>' to where the last '<' was, until > another '<' is encountered, at which point go back to step 2. > 4. If a '<' occurs without a following '>', emit an error.
<a href="http://madhouse.nosuch.co.uk/<spoo>/fruitbat">grungewallah</a> -- Chris "it could be a RISC OS machine, <name> for system variables" Dollin C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html
|
Sun, 27 Jun 2004 19:20:00 GMT |
|
 |
Michael L. Hostbae #4 / 9
|
 strip string
Micah Cowan tried to tell us something, and all I got was: Quote:
> Here is one algorithm: > 1. Find the first '<' > 2. Now find the first occurance of '>' after that. > 3. Copy each character after the '>' to where the last '<' was, until > another '<' is encountered, at which point go back to step 2. > 4. If a '<' occurs without a following '>', emit an error.
hmm.. and how would I actually do that ? Say, my string is: "<pre>this is a test</pre>" Thank you. -- Regards, Michael L. Hostbaek -= So long, and thanks for all the fish.. =-
|
Sun, 27 Jun 2004 19:24:41 GMT |
|
 |
k.. #5 / 9
|
 strip string
Quote: > Micah Cowan tried to tell us something, and all I got was:
>> Here is one algorithm: >> 1. Find the first '<' >> 2. Now find the first occurance of '>' after that. >> 3. Copy each character after the '>' to where the last '<' was, until >> another '<' is encountered, at which point go back to step 2. >> 4. If a '<' occurs without a following '>', emit an error. > hmm.. and how would I actually do that ? Say, my string is: > "<pre>this is a test</pre>"
You have to do *some* of the work yourself. Your C book will surely tell you what a string is, how to access characters in a string, how to compare characters, how to assign characters, and what existing library functions are available in C? Try coding Micah's algorithm up using those primitive operations. If you have specific problems, post them & your code attempt (and don the fireproof undergarments) and we'll try and help you improve them. I suggest writing a function void a{*filter*}icah( char *dest, char *source ) { ... } where `source` is the source string and `dest` points to a character array big enough to hold the stripped string (ie at least as big as the source string), and fill in the ...; you can then test it in `main` either by passing strings from the command line or by specific test strings eg a{*filter*}icah( result, "<pre>this is a test</pre>" ); printf( "result for test 42: %s\n", result ); Stir to taste. -- Chris "bootstrap" Dollin C FAQs at: http://www.*-*-*.com/
|
Sun, 27 Jun 2004 19:36:16 GMT |
|
 |
Default Use #6 / 9
|
 strip string
Quote:
> Micah Cowan tried to tell us something, and all I got was:
> > Here is one algorithm: > > 1. Find the first '<' > > 2. Now find the first occurance of '>' after that. > > 3. Copy each character after the '>' to where the last '<' was, until > > another '<' is encountered, at which point go back to step 2. > > 4. If a '<' occurs without a following '>', emit an error. > hmm.. and how would I actually do that ? Say, my string is: > "<pre>this is a test</pre>"
Take a look at the standard functions strchr(), strcpy(), strcat(), strncpy() etc. That should give you some ideas on how to implement the algorithm (those are the ones I used to write my version). Fancier things can be added later, like converting <br> tags to new lines and things like that, if you desire. Brian Rodenborn
|
Mon, 28 Jun 2004 01:00:38 GMT |
|
 |
Micah Cowa #7 / 9
|
 strip string
Quote:
> >> Hi, > >> How would I strip fx. all html tags from a char string ? > >> I would like to do it without including the regexp library. > >> any help much appreciated.. > > Here is one algorithm: > > 1. Find the first '<' > > 2. Now find the first occurance of '>' after that. > > 3. Copy each character after the '>' to where the last '<' was, until > > another '<' is encountered, at which point go back to step 2. > > 4. If a '<' occurs without a following '>', emit an error. > <a href="http://madhouse.nosuch.co.uk/<spoo>/fruitbat">grungewallah</a>
That is a very malformed tag. Since that snippet is not valid SGML/XML, it invokes "undefined" behavior - not in the context of the C standard, but in the context of this program's expectations. However, if you would like a more robust implementation, you could include some minor state-based information which would cause an error to be emitted if it found a tag within a tag. Micah
|
Mon, 28 Jun 2004 14:48:55 GMT |
|
 |
Micah Cowa #8 / 9
|
 strip string
Quote:
> > Micah Cowan tried to tell us something, and all I got was:
> >> Here is one algorithm: > >> 1. Find the first '<' > >> 2. Now find the first occurance of '>' after that. > >> 3. Copy each character after the '>' to where the last '<' was, until > >> another '<' is encountered, at which point go back to step 2. > >> 4. If a '<' occurs without a following '>', emit an error. > > hmm.. and how would I actually do that ? Say, my string is: > > "<pre>this is a test</pre>" > You have to do *some* of the work yourself. Your C book will surely > tell you what a string is, how to access characters in a string, how > to compare characters, how to assign characters, and what existing > library functions are available in C? > Try coding Micah's algorithm up using those primitive operations. > If you have specific problems, post them & your code attempt (and > don the fireproof undergarments) and we'll try and help you improve > them.
Most people here don't like to provide code for questions, until you provide at least a first attempt written in C. (Often, after that point, people get more prolific). We don't feel that you learn C very well by getting complete-package answers thrust at you - you learn it by doing your homework (both literally and figuratively), and at least trying. I'll even give you a few hints, though: 1. Find the first '<' strchr() will be handy for this. 2. Now find the first occurance of '>' after that. and this. 3. Copy each character after the '>' to where the last '<' was, until another '<' is encountered, at which point go back to step 2. You could use strchr() combined with strcpy() for this, but I would prefer individual char-by-char evaluation and copy using simple assignment. HTH, Micah
|
Mon, 28 Jun 2004 14:57:17 GMT |
|
 |
k.. #9 / 9
|
 strip string
Quote:
>> >> Hi, >> >> How would I strip fx. all html tags from a char string ? >> >> I would like to do it without including the regexp library. >> >> any help much appreciated.. >> > Here is one algorithm: >> > 1. Find the first '<' >> > 2. Now find the first occurance of '>' after that. >> > 3. Copy each character after the '>' to where the last '<' was, until >> > another '<' is encountered, at which point go back to step 2. >> > 4. If a '<' occurs without a following '>', emit an error. >> <a href="http://madhouse.nosuch.co.uk/<spoo>/fruitbat">grungewallah</a> > That is a very malformed tag. Since that snippet is not valid > SGML/XML, it invokes "undefined" behavior - not in the context of the > C standard, but in the context of this program's expectations.
Yes, afterwards I thought, "does HTML demand that attribute strings use entity references for the magic characters?" and decided that (a) my brain wasn't ready to discover that it did (it can only take so much strain), and (b) no matter what the standard said, a realistic program running over real data would probably find strings such as the one I wrote. Quote: > However, if you would like a more robust implementation, you could > include some minor state-based information which would cause an error > to be emitted if it found a tag within a tag.
The nice thing about the SGML familty of notations is that humans and machines are equally inept at handling it. -- Chris "equality before the law, and after lunch" Dollin C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html
|
Mon, 28 Jun 2004 17:18:46 GMT |
|
|
|