strip string 
Author Message
 strip string

Hi,

How would I strip fx. all html tags from a char string ?

I would like to do it without including the regexp library.

any help much appreciated..

--
Regards,
Michael L. Hostbaek
-= So long, and thanks for all the fish.. =-



Sun, 27 Jun 2004 18:11:24 GMT  
 strip string

Quote:
> Hi,

> How would I strip fx. all html tags from a char string ?

> I would like to do it without including the regexp library.

> any help much appreciated..

Here is one algorithm:

1. Find the first '<'
2. Now find the first occurance of '>' after that.
3. Copy each character after the '>' to where the last '<' was, until
   another '<' is encountered, at which point go back to step 2.
4. If a '<' occurs without a following '>', emit an error.

Micah



Sun, 27 Jun 2004 19:05:11 GMT  
 strip string


Quote:

>> Hi,

>> How would I strip fx. all html tags from a char string ?

>> I would like to do it without including the regexp library.

>> any help much appreciated..

> Here is one algorithm:

> 1. Find the first '<'
> 2. Now find the first occurance of '>' after that.
> 3. Copy each character after the '>' to where the last '<' was, until
>    another '<' is encountered, at which point go back to step 2.
> 4. If a '<' occurs without a following '>', emit an error.

<a href="http://madhouse.nosuch.co.uk/<spoo>/fruitbat">grungewallah</a>

--
Chris "it could be a RISC OS machine, <name> for system variables" Dollin
C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html



Sun, 27 Jun 2004 19:20:00 GMT  
 strip string
Micah Cowan tried to tell us something, and all I got was:

Quote:

>  Here is one algorithm:

>  1. Find the first '<'
>  2. Now find the first occurance of '>' after that.
>  3. Copy each character after the '>' to where the last '<' was, until
>     another '<' is encountered, at which point go back to step 2.
>  4. If a '<' occurs without a following '>', emit an error.

hmm.. and how would I actually do that ? Say, my string is:
 "<pre>this is a test</pre>"

Thank you.

--
Regards,
Michael L. Hostbaek
-= So long, and thanks for all the fish.. =-



Sun, 27 Jun 2004 19:24:41 GMT  
 strip string


Quote:
> Micah Cowan tried to tell us something, and all I got was:

>>  Here is one algorithm:

>>  1. Find the first '<'
>>  2. Now find the first occurance of '>' after that.
>>  3. Copy each character after the '>' to where the last '<' was, until
>>     another '<' is encountered, at which point go back to step 2.
>>  4. If a '<' occurs without a following '>', emit an error.

> hmm.. and how would I actually do that ? Say, my string is:
>  "<pre>this is a test</pre>"

You have to do *some* of the work yourself. Your C book will surely
tell you what a string is, how to access characters in a string, how
to compare characters, how to assign characters, and what existing
library functions are available in C?

Try coding Micah's algorithm up using those primitive operations.
If you have specific problems, post them & your code attempt (and
don the fireproof undergarments) and we'll try and help you improve
them.

I suggest writing a function

    void a{*filter*}icah( char *dest, char *source ) { ... }

where `source` is the source string and `dest` points to a
character array big enough to hold the stripped string (ie at
least as big as the source string), and fill in the ...; you can
then test it in `main` either by passing strings from the command
line or by specific test strings eg

    a{*filter*}icah( result, "<pre>this is a test</pre>" );
    printf( "result for test 42: %s\n", result );

Stir to taste.

--
Chris "bootstrap" Dollin
C FAQs at: http://www.*-*-*.com/



Sun, 27 Jun 2004 19:36:16 GMT  
 strip string

Quote:

> Micah Cowan tried to tell us something, and all I got was:

> >  Here is one algorithm:

> >  1. Find the first '<'
> >  2. Now find the first occurance of '>' after that.
> >  3. Copy each character after the '>' to where the last '<' was, until
> >     another '<' is encountered, at which point go back to step 2.
> >  4. If a '<' occurs without a following '>', emit an error.

> hmm.. and how would I actually do that ? Say, my string is:
>  "<pre>this is a test</pre>"

Take a look at the standard functions strchr(), strcpy(), strcat(),
strncpy() etc.

That should give you some ideas on how to implement the algorithm (those
are the ones I used to write my version).

Fancier things can be added later, like converting <br> tags to new
lines and things like that, if you desire.

Brian Rodenborn



Mon, 28 Jun 2004 01:00:38 GMT  
 strip string

Quote:




> >> Hi,

> >> How would I strip fx. all html tags from a char string ?

> >> I would like to do it without including the regexp library.

> >> any help much appreciated..

> > Here is one algorithm:

> > 1. Find the first '<'
> > 2. Now find the first occurance of '>' after that.
> > 3. Copy each character after the '>' to where the last '<' was, until
> >    another '<' is encountered, at which point go back to step 2.
> > 4. If a '<' occurs without a following '>', emit an error.

> <a href="http://madhouse.nosuch.co.uk/<spoo>/fruitbat">grungewallah</a>

That is a very malformed tag.  Since that snippet is not valid
SGML/XML, it invokes "undefined" behavior - not in the context of the
C standard, but in the context of this program's expectations.
However, if you would like a more robust implementation, you could
include some minor state-based information which would cause an error
to be emitted if it found a tag within a tag.

Micah



Mon, 28 Jun 2004 14:48:55 GMT  
 strip string

Quote:



> > Micah Cowan tried to tell us something, and all I got was:

> >>  Here is one algorithm:

> >>  1. Find the first '<'
> >>  2. Now find the first occurance of '>' after that.
> >>  3. Copy each character after the '>' to where the last '<' was, until
> >>     another '<' is encountered, at which point go back to step 2.
> >>  4. If a '<' occurs without a following '>', emit an error.

> > hmm.. and how would I actually do that ? Say, my string is:
> >  "<pre>this is a test</pre>"

> You have to do *some* of the work yourself. Your C book will surely
> tell you what a string is, how to access characters in a string, how
> to compare characters, how to assign characters, and what existing
> library functions are available in C?

> Try coding Micah's algorithm up using those primitive operations.
> If you have specific problems, post them & your code attempt (and
> don the fireproof undergarments) and we'll try and help you improve
> them.

Most people here don't like to provide code for questions, until you
provide at least a first attempt written in C.  (Often, after that
point, people get more prolific).  We don't feel that you learn C very
well by getting complete-package answers thrust at you - you learn it
by doing your homework (both literally and figuratively), and at least
trying.

I'll even give you a few hints, though:

  1. Find the first '<'

strchr() will be handy for this.

  2. Now find the first occurance of '>' after that.

and this.

  3. Copy each character after the '>' to where the last '<' was, until
     another '<' is encountered, at which point go back to step 2.

You could use strchr() combined with strcpy() for this, but I would
prefer individual char-by-char evaluation and copy using simple
assignment.

HTH,
Micah



Mon, 28 Jun 2004 14:57:17 GMT  
 strip string


Quote:




>> >> Hi,

>> >> How would I strip fx. all html tags from a char string ?

>> >> I would like to do it without including the regexp library.

>> >> any help much appreciated..

>> > Here is one algorithm:

>> > 1. Find the first '<'
>> > 2. Now find the first occurance of '>' after that.
>> > 3. Copy each character after the '>' to where the last '<' was, until
>> >    another '<' is encountered, at which point go back to step 2.
>> > 4. If a '<' occurs without a following '>', emit an error.

>> <a href="http://madhouse.nosuch.co.uk/<spoo>/fruitbat">grungewallah</a>

> That is a very malformed tag.  Since that snippet is not valid
> SGML/XML, it invokes "undefined" behavior - not in the context of the
> C standard, but in the context of this program's expectations.

Yes, afterwards I thought, "does HTML demand that attribute strings
use entity references for the magic characters?" and decided that (a)
my brain wasn't ready to discover that it did (it can only take so
much strain), and (b) no matter what the standard said, a realistic
program running over real data would probably find strings such as the
one I wrote.

Quote:
> However, if you would like a more robust implementation, you could
> include some minor state-based information which would cause an error
> to be emitted if it found a tag within a tag.

The nice thing about the SGML familty of notations is that humans and
machines are equally inept at handling it.

--
Chris "equality before the law, and after lunch" Dollin
C FAQs at: http://www.faqs.org/faqs/by-newsgroup/comp/comp.lang.c.html



Mon, 28 Jun 2004 17:18:46 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. Strip non-printables from string

2. Function to strip \n and \r from strings

3. Stripping spaces from a string

4. Strip non-printables from string

5. stripping newlines from strings

6. using strtok() to strip chars from string

7. how do i take a string of 80 chars and strip it to a 20 chars string????

8. Why strip comment (was: Want a way to strip comments ...)

9. URL parameters stripped before getting to aspx

10. Horrible user control bug stripped all user controls from my project

11. Printf- - Need to strip characters from printed answer.

12. stripping binary characters

 

 
Powered by phpBB® Forum Software