ANN: REXML 2.5.4 
Author Message
 ANN: REXML 2.5.4

Hello everybody,

Sorry for the long silence.  I've been slowly accruing bugfixes and
whatnot, and got some time to make a release of the development
branch.  So much to do, so little time...

puts RAA.entry['rexml'].description()

REXML is an XML 1.0 compliant, reasonably fast, non-validating XML
parser that supports multiple encodings. It has an API that is
designed to be intuitive, straightforward, and terse. REXML includes a
tree model parser, a SAX2 streaming parser, and a pull parser. It also
includes a full XPath implementation. All of REXML's parsers pass 100%
of the Oasis XML non-validating tests.

Changelog:
* On request, changed the format of printed elements so that
whitespace appears before the close of empty tags; EG, "<tag/>" ->
"<tag />"
* Improved parsing speed by a chunk
* Bug #29: SAX2Parser was not processing XML declarations or
processing instructions.
* Bug #29: REXML pull parser and SAX2 parser both now report
:processing_instruction, rather than :instruction. This is less
consistent with REXML, which tends to be more minimal, but is more
consistent with the SAX2 parser API. Let me know if you disagree with
this decision before I go to 2.6.0.
* Bug #30: In some cases, REXML would refuse to delete an attribute.
This has been fixed.
* Bug #31: Ignored element fix applied.
* Fixed a whitespace parsing bug that resulted in some documents
causing parse errors.
* Fixed a tutorial error
* Added Shift_JIS encoding. This is the same as Shift-JIS, but correct
IANA registrated name is Shift_JIS. I've left Shift-JIS for backwards
compatibility.
* Fixed a non-conformance bug in XPath, WRT whitespace in predicates.
* Changed the unit tests to use the new test/unit API.



Fri, 05 Aug 2005 15:14:33 GMT  
 ANN: REXML 2.5.4

Quote:

> REXML is an XML 1.0 compliant, reasonably fast, non-validating XML
> parser that supports multiple encodings.

Hmm. It occurs to me that you don't need a validating parser.

You can parse and then validate.

Especially if instead of a DTD, you use James Clark's Relax NG schema.

ie. XML doc + Relax NG Schema + REXML + A bit of ruby magic and you have a
validating parser.

All that is missing is the "bit of ruby magic", which would probably be
quite small.

There are already DTD to Relax NG convertors.

Although personally I can't think why they haven't just thrown out DTD as
a sorry mistake and moved to Relax NG ages ago....

John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632

New Zealand

John's law :-

All advances in computing have arisen through the creation of an
additional level of indirection, the trick is to work out which
indirection is actually useful.



Sun, 07 Aug 2005 09:13:46 GMT  
 ANN: REXML 2.5.4

Quote:
> Although personally I can't think why they haven't just thrown out DTD as
> a sorry mistake and moved to Relax NG ages ago....

If you have endless free time, try following the xml-dev mailing list for all the{*filter*}details.  Largely it was due to a desire to
maintain some SGML compatibility.

James



Sun, 07 Aug 2005 09:48:43 GMT  
 ANN: REXML 2.5.4

Quote:

> Hmm. It occurs to me that you don't need a validating parser.

> You can parse and then validate.

Well, if you have a large XML source, you may want to abort on validation errors before loading the entire document.

James



Sun, 07 Aug 2005 09:50:20 GMT  
 ANN: REXML 2.5.4
I'm not following you here. Your schema sets up the rules you want the data
to follow, whether it's a DTD or Relax NG or W3C XML Schemas. Once it's
parsed, you would then have to see that the data matched your rules.

Instead of a "bit", it looks to me like you'd have to have code that would
translate from your schema's granmmar to Ruby and REXML and then you check
every node for the right sequence and the acceptable children.

Although a Relax NG schema might be easier to translate than a DTD, it still
seems significant.

Or am I missing something?

Roger Sperberg

Quote:


> Hmm. It occurs to me that you don't need a validating parser.

> You can parse and then validate.

> Especially if instead of a DTD, you use James Clark's Relax NG schema.

> ie. XML doc + Relax NG Schema + REXML + A bit of ruby magic
> and you have a validating parser.

> All that is missing is the "bit of ruby magic", which would probably be
> quite small.



Sun, 07 Aug 2005 23:28:20 GMT  
 ANN: REXML 2.5.4

Quote:

> I'm not following you here. Your schema sets up the rules you want the data
> to follow, whether it's a DTD or Relax NG or W3C XML Schemas. Once it's
> parsed, you would then have to see that the data matched your rules.

Well, there are three problems with writing a validating XML parser.

Problem 1, parse XML has been done, thats REXML and it does it very very
nicely thank you.

Problem 2, parse the DTD. That's yucky. Can be done, but it's no joy.
Solution. Don't do it. Feed the DTD through one of the several DTD to
Relax NG convertors, and then you can slurp it in using REXML. After all,
the really nifty thing about Relax is that it is in XML.

Problem 3, check that the XML conforms to the Schema (DTD). This is the
missing bit.

But not really hard.

It merely means you need to traverse the DOM, (which REXML makes really
easy), and at each point check whether this item is valid here. REXML
makes querying the Relax Schema easy too.

One of the marks of a good compiler is that it can make sensible fixups
and keep going. But for the purposes of validating XML you usually just
want to say, "Aye! It fits", or "Nay! It doesn't, expected XXX at line NNN col MM"

I bet it won't be a large program at all.

I would write it now, except I have other things to do first. I mentioned
it now, since I can forsee a future where I will need it, and will have to
write it.

I just hoped that by the time I need it, someone else would have done so
first. (Larry Wall of Perl says the primary virtues of a programmer are
Laziness, Impatience and Hubris. I don't have the pressing need right now,
so I'm not Impatient on this one, but Laziness tells me I will need it.
Hubris tells me my idea is a "Better way".)

John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632

New Zealand

John's law :-

All advances in computing have arisen through the creation of an
additional level of indirection, the trick is to work out which
indirection is actually useful.



Mon, 08 Aug 2005 04:32:08 GMT  
 ANN: REXML 2.5.4
...

Quote:
> Problem 3, check that the XML conforms to the Schema (DTD). This is the
> missing bit.
...
> It merely means you need to traverse the DOM, (which REXML makes really
> easy), and at each point check whether this item is valid here. REXML
> makes querying the Relax Schema easy too.

> One of the marks of a good compiler is that it can make sensible fixups
> and keep going. But for the purposes of validating XML you usually just
> want to say, "Aye! It fits", or "Nay! It doesn't, expected XXX at line NNN col MM"

> I bet it won't be a large program at all.

> I would write it now, except I have other things to do first. I mentioned
> it now, since I can forsee a future where I will need it, and will have to
> write it.

> I just hoped that by the time I need it, someone else would have done so
> first. (Larry Wall of Perl says the primary virtues of a programmer are
> Laziness, Impatience and Hubris. I don't have the pressing need right now,
> so I'm not Impatient on this one, but Laziness tells me I will need it.
> Hubris tells me my idea is a "Better way".)

John, shoot me an email, and we'll collaborate on this.

My plan, as it has stood for the past 5 months, is to write a RelaxNG
-> Ruby state machine generator, then slap a SAX2 interface on it.  To
validate, users will instantiate the validator and pass it to the
parser.  The parser will call its usual listener notification events;
as far as REXML will be concerned, the validator will be just another
listener.  The validator will just make sure that the events it
receives puts the state machine into a valid state.  I'll need to add
some hooks into REXML to support validation in tree parsing, unless I
can think of a less invasive solution, but I'm more than willing to do
that work to get validation.

I'm hoping that other people will follow the RelaxNG example, and
write DTD->FSM and W3C XML Schema->FSM converters.  I'd like this
mostly because I have zero interest in writing DTD /or/ W3C XSD
parsers, so if someone else doesn't do them, they probably will never
be done.  I'm happy to give contributors repository accounts and as
much support as they need to do this work.

I've started on this a number of times, and then gotten distracted
with other things (like bug fixes).  My main task right now is to
improve namespace handling in the streaming and pull parsers; I think
I *just* fixed the last namespace non-conformance in REXML in this
next release.  I've also got a dozen other "important" sub-projects,
such as trying to separate the various parsers so that people can trim
down REXML to a minimal API subset, for inclusion in their own
applications.  If REXML is ever bundled with Ruby, this won't be as
important, but I'd still like to do it to clean up the code base.  The
API documentation is in total chaos, I need to extend XPath support to
do more than return nodes, and then there's XPath 2.0 looming --
that'll be a huge job.

Anyway, if anyone is interested in collaborating on the RelaxNG
validator, let me know and we'll get started.



Fri, 12 Aug 2005 01:59:59 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. ANN: REXML 2.7.1

2. ANN: REXML 2.7.0 (3.0b)

3. ANN: REXML 2.4.8

4. ANN: REXML 2.5.8

5. ANN: REXML 2.5.7 and 2.4.7

6. ANN: REXML 2.4.5 and 2.5.3

7. ANN: REXML 2.4.2 and 2.5.2

8. ANN: REXML 2.3.7 (2.4.0frc)

9. ANN: REXML 2.3.4 and 2.2.2

10. ANN: REXML 2.3.3

11. ANN: REXML 2.3.2 (SAX2 and Pull parsing)

12. ANN: REXML 2.3.1 (development)

 

 
Powered by phpBB® Forum Software