I am trying to create a parser for XML aka HTML.  First I am scanning
through the document to verify all the tags are correct and nested
properly.  Then I want to use a tree data structure to store the HTML
document in C, and that is where I am running into problems.  If
anyone has any tips or suggestions on how the data structure should
look please let me know.  Thanks


        if this isn't a speed critical application, use Perl - a far nicer language
for such things (sorry if i offended anyone).

parsers, compilers whatever you want to call them are basically "state
        eh?  quite simple really an "HTML" tag acts a trigger to another expectant
state.  for example after an open "HTML" tag (i.e. <A NAME="here"> ) you would
expect either another open tag or a matched closing tag (i.e. </A>).  what you
need to do is keep track of nested tags, try using a stack like structure (e.g.
some sort of linked list) it doesn't really matter what as long it can be made
to operate in a "last in first out" manner.

        as i see it what your program would do is "push" (i.e. put onto the stack)
open "HTML" tags, then "pop" (i.e. take off the stack) using close "HTML" tags,
if the two tags "match" (i.e. are an appropriate pair) continue, if they don't
go eeek.

hope i was of some help,

--- --- ---
  .~.   the way of the Sacred Penguin is the path of
  /V\   the truly righteous...
 // \\  


Tue, 24 Apr 2001 03:00:00 GMT  
