Help, Pattern used in Regex.Matches(...)
Author |
Message |
Ting C #1 / 5
|
Help, Pattern used in Regex.Matches(...)
Folks, I have tried millions of times to define a pattern to find out all tags in HTML/XML including '<', '/', '>', and the first word in the tag. For example, <book attribute> <chapter1><title attribute>The Beginning</title>It was the worst of times</chapter1> </book> the Regex.Matches() should do <book>, <chapter1>, <title>, </title>, </chapter1> and </book> I just can't figure it out. What should the pattern look like? Many thanks TC
|
Sun, 21 Nov 2004 21:34:18 GMT |
|
|
Ken Alverso #2 / 5
|
Help, Pattern used in Regex.Matches(...)
Quote: > Folks, I have tried millions of times to define a pattern to find out > all tags in HTML/XML including '<', '/', '>', and the first word in > the tag. > For example, > <book attribute> > <chapter1><title attribute>The Beginning</title>It was the worst of > times</chapter1> > </book> > the Regex.Matches() should do <book>, <chapter1>, <title>, </title>, > </chapter1> and </book> > I just can't figure it out. What should the pattern look like?
That'll strip off the <>s, but it will leave the / on closing tags. Ken
|
Sun, 21 Nov 2004 22:02:04 GMT |
|
|
John Reynold #3 / 5
|
Help, Pattern used in Regex.Matches(...)
That, or just use the System.Xml provided XmlTextReader instead. It lets you read node by node and the name of each element is simply a property of the text reader (reader.Name). Cheers, John
Quote:
> > Folks, I have tried millions of times to define a pattern to find out > > all tags in HTML/XML including '<', '/', '>', and the first word in > > the tag. > > For example, > > <book attribute> > > <chapter1><title attribute>The Beginning</title>It was the worst of > > times</chapter1> > > </book> > > the Regex.Matches() should do <book>, <chapter1>, <title>, </title>, > > </chapter1> and </book> > > I just can't figure it out. What should the pattern look like?
> That'll strip off the <>s, but it will leave the / on closing tags. > Ken
|
Mon, 22 Nov 2004 00:09:27 GMT |
|
|
Ken Alverso #4 / 5
|
Help, Pattern used in Regex.Matches(...)
That's a good idea, except not all HTML (even "valid" HTML) is not valid XML. If you know you are working with XHTML input, that's an excellent solution. Ken
Quote: > That, or just use the System.Xml provided XmlTextReader instead. It lets > you read node by node and the name of each element is simply a property of > the text reader (reader.Name). > Cheers, > John
> > > Folks, I have tried millions of times to define a pattern to find out > > > all tags in HTML/XML including '<', '/', '>', and the first word in > > > the tag. > > > For example, > > > <book attribute> > > > <chapter1><title attribute>The Beginning</title>It was the worst of > > > times</chapter1> > > > </book> > > > the Regex.Matches() should do <book>, <chapter1>, <title>, </title>, > > > </chapter1> and </book> > > > I just can't figure it out. What should the pattern look like?
> > That'll strip off the <>s, but it will leave the / on closing tags. > > Ken
|
Mon, 22 Nov 2004 02:18:56 GMT |
|
|
Ting C #5 / 5
|
Help, Pattern used in Regex.Matches(...)
All works. Thanks again. TC
Quote: > That's a good idea, except not all HTML (even "valid" HTML) is not valid > XML. If you know you are working with XHTML input, that's an excellent > solution. > Ken
> > That, or just use the System.Xml provided XmlTextReader instead. It lets > > you read node by node and the name of each element is simply a property of > > the text reader (reader.Name). > > Cheers, > > John
> > > > Folks, I have tried millions of times to define a pattern to find out > > > > all tags in HTML/XML including '<', '/', '>', and the first word in > > > > the tag. > > > > For example, > > > > <book attribute> > > > > <chapter1><title attribute>The Beginning</title>It was the worst of > > > > times</chapter1> > > > > </book> > > > > the Regex.Matches() should do <book>, <chapter1>, <title>, </title>, > > > > </chapter1> and </book> > > > > I just can't figure it out. What should the pattern look like?
> > > That'll strip off the <>s, but it will leave the / on closing tags. > > > Ken
|
Mon, 22 Nov 2004 21:23:14 GMT |
|
|
|