remove begin and end tags and its content in between 
Author Message
 remove begin and end tags and its content in between

Hi,

Is there a way to search for a specific html tags and remove the begin, end
tags include its content (begin tag can go multiple lines to the end tag)?

example:

replace:

this text is before the begin title tags and <title>dddfdfdkfdjfdjfldjfl
dkfdlfjdlf jdkjfdk
djkfd
dfdkfdkf</title> this text is
at the end of the
end title tag

to:

this text is before the begin title tags and this text is
at the end of the
end title tag

jj
Thank you



Mon, 22 Sep 2003 23:38:54 GMT  
 remove begin and end tags and its content in between

Quote:
> Hi,

> Is there a way to search for a specific html tags and remove the begin,
end
> tags include its content (begin tag can go multiple lines to the end tag)?

> example:

> replace:

> this text is before the begin title tags and <title>dddfdfdkfdjfdjfldjfl
> dkfdlfjdlf jdkjfdk
> djkfd
> dfdkfdkf</title> this text is
> at the end of the
> end title tag

> to:

> this text is before the begin title tags and this text is
> at the end of the
> end title tag

Not clear exactly what tags qualify here, but, assuming you want
to remove everything that IS between ANY tags, and only leave
what is not, this might work:

import sgmllib

class afilter(sgmllib.SGMLParser):
    def __init__(self):
        sgmllib.SGMLParser.__init__(self)
        self.inTag = 0
        self.data = []
    def unknown_starttag(self, tag, attributes):
        self.inTag += 1
    def unknown_endtag(self, tag):
        self.inTag -= 1
    def handle_data(self, data):
        if self.inTag: return
        self.data.append(data)

if __name__=='__main__':
    sometext =  """
this text is before the begin title tags and <title>dddfdfdkfdjfdjfldjfl
dkfdlfjdlf jdkjfdk
djkfd
dfdkfdkf</title> this text is
at the end of the
end title tag"""
    filt = afilter()
    filt.feed(sometext)
    filt.close()
    print ''.join(filt.data)

The only difference wrt your desired output in your example is
that, of course, TWO spaces will be between 'and' and 'this',
since that is the number of spaces outside of tags in the
string being processed:-).

Alex



Tue, 23 Sep 2003 18:24:18 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Tagged-begin

2. remove blank lines before lines beginning with string

3. Cals Begin and End Balance

4. prob. with BEGIN/END

5. Smalltalk on the small end (was: Advice requested: GUI project beginning)

6. /BEGIN/ .. /END/ file reading

7. How to write filesearch with begin-pattern till end-pattern

8. begin/end doesn't behave as expected

9. Braces and begin/end

10. Suggestion for addition to Begin/End syntax

11. Are begin/end blocks atomic?

12. Beginning of End of Ada Mandate

 

 
Powered by phpBB® Forum Software