Markus' ruby-parser 
Author Message
 Markus' ruby-parser

Hi Markus and folks,

I checked-out ruby-parser (aka RubyInRubyParser 0.1-alpha) from the ruby
cvs (repository /src; directory ruby-parser; this is not included in Ruby
1.7)

Note: I didn't choose Rulator/Rockit because I've been told (by Robert)
that it is still too experimental, but that was a while ago. If this is
not true anymore I'd like to know. But there's also a reason I chose
Markus' parser in particular... it is that it only supports MetaRuby's
RubyAST.rb, and it is also the only parser to support it now. However our
versions have forked and I want to merge them back together.

Note that I have the intention of bundling one parser with MetaRuby. I'd
like to know which one would be the best, and why.

I hope the cvs-head version is up-to-date, because i'm starting to modify
it now.

The migrate.rb file is missing; I presume it only contains this line:

        require "RubySchema-old.rb"

BEFORE MERGING:

The differences in the Contract file are:

  i renamed Contract to Assert; Contract is an alias; i will introduce a
   distinction in the future.

The differences in the Type file are:

  i renamed Tuple to Form (this is an important change)
  i changed Nameable#const_name_is

  i made Form extend Nameable
  i removed Boolean (already in Contract.rb)
  i added Template#specializations,#normalize_args
  i added calls to Nameable#const_name_is
  i added ChoiceType#base, MultiChoiceType, All
  you moved SymbolSubset over there
  i parenthesised some method calls for Ruby-1.7 compatibility.

The differences in the RubySchema file are:

  i added SymbolSubset #to_s, include Nameable
  i fixed RRescue, DefArgs, DefClass
  you moved SymbolSubset to Type.rb
  you fixed Ident, IdentSel
  you added VarSpecial, GVar, etc.; i added those differently;
    in particular, they are not Forms, they are directly SymbolSubsets.
    i don't know (or remember) why you did that.
    __FILE__,__LINE__ are called XVar.
    VarSpecial merged into GVar.
  you added Self<Form
  you changed Alias

DURING THE MERGE:

  updated your files until they could be replaced by my files.
  I still made a few further changes on both sides:

The differences in the Contract file are:

  i added DoCheckTypes relying on env var RUBY_CONTRACT (no more X11_LOG)

The differences in the Type file are:

  i removed DoCheckTypes and Fast.

The differences in the RubySchema file are:

  i rewrote the variables (LVar,etc) my way.
  i rewrote Alias; split into Alias (methods) and Alias2 (GVars)
  i fixed Rescue (added :var)

there are also cases where you allowed nil when [] was already allowed and
sufficient, and similar things. i still have difficulty convincing myself
of why a particular slot should be typed either Body or
Any.of(NilClass,Body), but things will evolve. I began using MExpr instead
of :args,:rest slot-pairs (Break,Next).

Please see metaruby CVS files:
  Contract.rb 1.3
  Type.rb 1.7
  RubySchema.rb 1.10

________________________________________________________________
Mathieu Bouchard                   http://www.*-*-*.com/ ~matju



Thu, 01 Jul 2004 13:51:46 GMT  
 Markus' ruby-parser

Quote:

> Note: I didn't choose Rulator/Rockit because I've been told (by Robert)
> that it is still too experimental, but that was a while ago. If this is
> not true anymore I'd like to know. But there's also a reason I chose
> Markus' parser in particular... it is that it only supports MetaRuby's
> RubyAST.rb, and it is also the only parser to support it now. However our
> versions have forked and I want to merge them back together.

> Note that I have the intention of bundling one parser with MetaRuby. I'd
> like to know which one would be the best, and why.

Let me just summarize the state of Rulator as compared to markus parser:

* Lexer is pretty well tested but not all constructs in the language has
been tested with the parser so some bugs still there. What is needed is to
write some more tests for the parser (singletons for example).

* Lexer is written with Rockit's parser combinators while Markus directly
translated Matz lexer. So there might be more deviances in Rulators lexer
although its shorter and may be simpler to change. Currently its probably
slower though.

* Rulator does not use RubySchema but the similar but simpler Terms I use
in Rockit. The main difference is higher resolution (more constructs,
ie. "a+b" is Plus[Id["a"], Id["b"]] instead of Expr[Id["a"], :+,
Id["b"]]), no typing (which IMHO, is ok for parsers but may be needed when
used to manually construct Ruby code) and that Terms can be
pattern-matched (this is useful when writing translation rules ie
converting from one language to another). Differences are not large though
so we should probably work to merge the two approaches.

* The parsers calls a handler object instead of directly constructing the
AST. By plugging in your own handler object you can turn it into an
event-based parser (ie. the handler is the event handler and the default
handler builds an AST). So Rulator can use RubySchema if someone writes a
handler that builds RubySchema's instead of RubyTerm's. Calling the
handler buys flexibility for a small performance penalty.

* Comments are added as attributes to the RubyTerms. They're pre_comment
for coments directly preceding a language construct post_comment in the
symmetric way.

So I think a fair summary would be:
* Markus parser is a bit more tested.
* Rulator's lexer is a bit more tested but not as close to matz lexer as
markus. Its shorter though so might be simpler to debug/maintain.
* Rulator is a bit more flexible but slower.
* They use similar but slightly different representation of the
AST's. The comunity should probably settle for one solution here since it
can be used in many situations (not only parsers). Matju and I have
already discussed how the resolution of RubySchema can actually be
increased if one wants to (Plus = Expr.of(:+) for example). IMHO, that
might be needed to get pretty-prints close to original, simpler
specification of translations (example: I have different code to
execute based on the operator of an expression; I shouldn't need to
access and compare it again) etc.

That's about it. Our intention has been to merge them but it hasn't
happened yet. My impetus for getting Rulator out slowed when Dave
found another solution for the RDoc parsing (and the deadline for my PhD
project started coming really close... ;-)).

Regards,

Robert



Fri, 02 Jul 2004 17:55:16 GMT  
 Markus' ruby-parser

Quote:


> > Note that I have the intention of bundling one parser with MetaRuby. I'd
> > like to know which one would be the best, and why.
> Let me just summarize the state of Rulator as compared to markus parser:

I'm sorry that I lost the log of my first irc conversation with you...

Quote:
> * Lexer is pretty well tested but not all constructs in the language has
> been tested with the parser so some bugs still there. What is needed is to
> write some more tests for the parser (singletons for example).

How about a shareable test suite ? The main issue I see is there are
several equivalent ways of parsing some expressions. For example,
semicolons parse to Body structures, which have no predefined
associativity, and for which flattening has no effect. If two different
parsers can give two slightly unequal - but equivalent - expressions, how
can all parsers have common tests? Or maybe this is a non-issue and all
parsers should give exactly the same output.

Quote:
> handler builds an AST). So Rulator can use RubySchema if someone writes a
> handler that builds RubySchema's instead of RubyTerm's. Calling the
> handler buys flexibility for a small performance penalty.

yeah, got that. i'll eventually get to that with a potential partial
merging of RubyTerm.

Quote:
> Matju and I have already discussed how the resolution of RubySchema
> can actually be increased if one wants to (Plus = Expr.of(:+) for
> example).

That would be Plus = M.of(:+), my mistake; also, a+b is
M[LVar[:a],:+,[LVar[:b]],nil,nil] instead of Expr[Id["a"],:+,Id["b"]].

But the increased resolution would only would work with case-exprs,
because Types in general only work with case-statements; you need Modules
if you need decentralized dispatch. plus, case-exprs are slow, while
method-dispatch is fast.

Quote:
> That's about it. Our intention has been to merge them but it hasn't
> happened yet.

I'd like to have both of them more tested and conform to a same spec.

Quote:
> (and the deadline for my PhD project started coming really close...
> ;-)).

How did it go?

________________________________________________________________
Mathieu Bouchard                   http://hostname.2y.net/~matju



Fri, 02 Jul 2004 21:48:57 GMT  
 Markus' ruby-parser

Quote:

> > * Lexer is pretty well tested but not all constructs in the language has
> > been tested with the parser so some bugs still there. What is needed is to
> > write some more tests for the parser (singletons for example).

> How about a shareable test suite ? The main issue I see is there are

Yes, that would be nice. Me and Rich Kilmer (who wrote some of the
tests) used RubyUnit while Marcus used a home-grown variant (based on some
of your stuff?).

 > several equivalent ways of parsing some expressions. For
example, > semicolons parse to Body structures, which have no predefined

Quote:
> associativity, and for which flattening has no effect. If two different
> parsers can give two slightly unequal - but equivalent - expressions, how
> can all parsers have common tests? Or maybe this is a non-issue and all
> parsers should give exactly the same output.

IMHO, we should strive for one parse for each program but it won't happen
until we have a common AST representation so we need to start there.

Quote:
> > Matju and I have already discussed how the resolution of RubySchema
> > can actually be increased if one wants to (Plus = Expr.of(:+) for
> > example).

> That would be Plus = M.of(:+), my mistake; also, a+b is
> M[LVar[:a],:+,[LVar[:b]],nil,nil] instead of Expr[Id["a"],:+,Id["b"]].

> But the increased resolution would only would work with case-exprs,
> because Types in general only work with case-statements; you need Modules
> if you need decentralized dispatch. plus, case-exprs are slow, while
> method-dispatch is fast.

Then we still need to solve the issue of resolution. Let me try and
summarize the pros and cons (correct me were I got it wrong):

The main benefit with having few but general constructs (low
resolution) is that there are fewer classes that you need to learn and
handle when extending etc. The drawback is that you may need to "parse
once more" ie. decide what kind of method call you have based on the
method id and arguments etc. Since the lower resolution implies more
general classes you have also lost information ie. was it written "a+b" or
"a.+(b)".

With higher resolution "a+b" would be Plus[Id["a"], Id["b"]] while the
latter would be Call[Id["a"], "+", Id["b"], nil, nil] so no info is
lost ie. we can pretty-print them back to their original form etc. But
there are any more classes so harder to grasp etc.

I think ideally we should merge these approaches so that Plus is a
subclass of Call (or M in RubySchema) that specializes it. The parsers
should generate the highest resolution constructs but the user need only
see them at the lower-resolution level (since Plus is-a Call).

This is my current view. I'd appreciate if you point out any
flaws/omissions or have opinions what we should do. I've probably forgot
some of Matju's previous good motivations/explanations; sorry...

Quote:
> > (and the deadline for my PhD project started coming really close...
> > ;-)).

> How did it go?

Still 6 weeks to go...

/Robert



Fri, 02 Jul 2004 22:31:24 GMT  
 Markus' ruby-parser

Quote:


> > How about a shareable test suite ? The main issue I see is there are
> Yes, that would be nice. Me and Rich Kilmer (who wrote some of the
> tests) used RubyUnit while Marcus used a home-grown variant (based on some
> of your stuff?).

It should be easy to convert everything to RubyUnit. I can take care of
that the day I finally take the time to learn RubyUnit.

Quote:
> IMHO, we should strive for one parse for each program but it won't happen
> until we have a common AST representation so we need to start there.

I meant even supposing that we agree on a common AST representation...
Some things may happen depending on what degree of commonality there
is. There is nothing preventing a RubySchema parser from introducing
slight divergences like unneccessary Body[[x]] instead of just x in some
places, etc.

Quote:
> The main benefit with having few but general constructs (low
> resolution) is that there are fewer classes that you need to learn and
> handle when extending etc. The drawback is that you may need to "parse
> once more" ie. decide what kind of method call you have based on the
> method id and arguments etc. Since the lower resolution implies more
> general classes you have also lost information ie. was it written "a+b" or
> "a.+(b)".

True. I may add BinOp (or "Op2") to RubySchema while keeping it out of
MicroRubySchema; this is because the former schema has a fidelity
requirement (and the latter hasn't)

I don't want to go to high resolution because it will flood the list of
RubySchema Forms (there are already 61 of them). I don't envision much
code that will work on Plus expressions: there will be all kinds of
software working at very-low ("Form") and low/medium resolutions, but the
Plus expressions will be limited quite specifically to: Optimising
Compilers That Use Type Inference Or Declarations And Lots Of Smart
Inlining To The Point It's As Fast As C. Correct me if i'm wrong.

Quote:
> > > (and the deadline for my PhD project started coming really close...
> > > ;-)).
> Still 6 weeks to go...

courage!

________________________________________________________________
Mathieu Bouchard                   http://hostname.2y.net/~matju



Sun, 04 Jul 2004 11:50:39 GMT  
 Markus' ruby-parser

Quote:



> > > How about a shareable test suite ? The main issue I see is there are
> > Yes, that would be nice. Me and Rich Kilmer (who wrote some of the
> > tests) used RubyUnit while Marcus used a home-grown variant (based on some
> > of your stuff?).

> It should be easy to convert everything to RubyUnit. I can take care of
> that the day I finally take the time to learn RubyUnit.

Great!

Check out Markus tests and the ones in

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/rockit/src/examples/ru...

Quote:
> > IMHO, we should strive for one parse for each program but it won't happen
> > until we have a common AST representation so we need to start there.

> I meant even supposing that we agree on a common AST representation...
> Some things may happen depending on what degree of commonality there
> is. There is nothing preventing a RubySchema parser from introducing
> slight divergences like unneccessary Body[[x]] instead of just x in some
> places, etc.

Ok, I think it would be unfortunate. If we strive for a common test I
think it should only allow one parse for each construct. If its not easily
done it should at least highlight the differences (so its
deterministic).

Quote:
> The main benefit with having few but general constructs (low
> > resolution) is that there are fewer classes that you need to learn and
> > handle when extending etc. The drawback is that you may need to "parse
> > once more" ie. decide what kind of method call you have based on the
> > method id and arguments etc. Since the lower resolution implies more
> > general classes you have also lost information ie. was it written "a+b" or
> > "a.+(b)".

> True. I may add BinOp (or "Op2") to RubySchema while keeping it out of
> MicroRubySchema; this is because the former schema has a fidelity
> requirement (and the latter hasn't)

I like the idea of splitting it into several layers of different
resolution. If its possible to do it fully in a class hierarchy so that
BinOp really inherits M (which I think is a bit too non-descriptive; any
chance we can find a slightly more descriptive one that is not
meaningless?) that would probably solve it. But don'tyou agree that the
RubyInRuby parsers should generate schemas/terms/trees at the highest
resolution by default?

Quote:
> I don't want to go to high resolution because it will flood the list of
> RubySchema Forms (there are already 61 of them). I don't envision much
> code that will work on Plus expressions: there will be all kinds of
> software working at very-low ("Form") and low/medium resolutions, but the
> Plus expressions will be limited quite specifically to: Optimising
> Compilers That Use Type Inference Or Declarations And Lots Of Smart
> Inlining To The Point It's As Fast As C. Correct me if i'm wrong.

No that was the application I had in mind... :-)

Its probably a good point although I'm not sure about it (for example I
think pretty-printing'/formatting Ruby code might be an important
application and it will need the highest possible resolution as not to
alter the code more than necessary).

Quote:
> > > > (and the deadline for my PhD project started coming really close...
> > > > ;-)).
> > Still 6 weeks to go...

> courage!

Thanks!

/Robert



Sun, 04 Jul 2004 19:05:00 GMT  
 Markus' ruby-parser

Quote:


> Check out Markus tests and the ones in
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/rockit/src/examples/ru...
> > slight divergences like unneccessary Body[[x]] instead of just x in some
> > places, etc.
> Ok, I think it would be unfortunate. If we strive for a common test I
> think it should only allow one parse for each construct.

Okay, let's try it then.

Quote:
> > True. I may add BinOp (or "Op2") to RubySchema while keeping it out of
> > MicroRubySchema; this is because the former schema has a fidelity
> > requirement (and the latter hasn't)
> I like the idea of splitting it into several layers of different
> resolution. If its possible to do it fully in a class hierarchy so that
> BinOp really inherits M

BinOp will be called Op2; unary operator expressions will be Op1. They
will inherit from Call, not from M, because M specifies fields, while Call
does not.

Quote:
> (which I think is a bit too non-descriptive; any chance we can find a
> slightly more descriptive one that is not meaningless?)

Originally I thought that M[] would be a *very* common expression type. It
occurs to me now that it won't be, at least not as much as the Var types.
Using the rule that very common names should be shorter, and very
unfrequent names should be longer (in a Huffman-coding kind of way...)
then these are more changes I'm making today:

LVar = LV
IVar = IV
CVar = CV
GVar = SV # now called special variables
XVar = XV # very special variables that won't vary...
M = Msg
SliceVar = Slice
LAtom = Lit # real literals (all other kinds aren't really)

Quote:
> that would probably solve it. But don'tyou agree that the
> RubyInRuby parsers should generate schemas/terms/trees at the highest
> resolution by default?

Well, medium resolution is already quite high. Maybe a file separate from
RubySchema could define all those higher resolution types in terms of
those already there? You would add types to that file as you'd need
them. The point of medium resolution is to have a good balance between
what are classes, and what are fields in those classes.

Other changes I made:
  * Expr,Arg,Statement,Literal,Primary merged as module Expr.
  * Var=Any.of changed to abstract class Var.

________________________________________________________________
Mathieu Bouchard                   http://hostname.2y.net/~matju



Tue, 06 Jul 2004 06:24:38 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Markus' ruby-parser (part 2)

2. Markus Dahm's OO Forth

3. Ruby Syntax Highlighting (and a Ruby Parser BUG)

4. Ruby parsers in Ruby

5. Ruby Tuesday - Sydney Ruby Users' Group Meeting

6. Write Ruby's Array in Ruby

7. Ruby/Glade doesn't work with Ruby 1.6.1

8. Not seeing today's ruby-talk ML items on comp.lang.ruby

9. 320x200 picture display MARKUS

10. 'pstruct' structure parser and python

11. mx.DateTime.Parser.DateFromString('crap')

12. 'Recursive descent parser'

 

 
Powered by phpBB® Forum Software