Announce: OpenToken 2.0 released 
Author Message
 Announce: OpenToken 2.0 released

Release 2.0 of OpenToken has now been placed on the website (
http://www.*-*-*.com/ ).
Highlights of this new version include an LALR(1) parser and an html
analyzer submitted by the ever-helpful Christoph Green. This is the
first version to include parsing capablity. The existing packages
underwent a major reorganization to accomodate the new
functionality. As some of the restructuring that was done is
incompatable with old code, the major revision has been bumped up to 2.
A partial list of changes is below:

   * Renamed the top level of the hierarchy from Token to OpenToken.
   * Moved the analyzer underneath the new OpenToken.Token hierarchy.
   * Renamed the Token recognizers from Token.* to
     OpenToken.Recognizer.*
   * Changed the text feeder procedure pointer into a text feeder
     object. This will allow full re-entrancy in analyzers that was
     thwarted by those global text feeders previously.
   * Updated the SLOC counter to read a list of files to process from a
     file. It also handles files with errors in them a bit better.
   * Added lalr(1) parsing capability and numerous packages to support
     it. A structure is in place to build other parsers as well.
   * Created a package hierarchy to support parse tokens. The word
     "Token" in OpenToken now refers to objects of this type, rather
     than to token recognizers.
   * An HTML lexer has been added to the language lexers
   * .Recognizer.Bracketed_Comment now works properly with
     single-character terminators.
   * Rewrote the text feeer and analyzer to minimize data copying.

With this release OpenToken now gains status as a viable replacement for
lex/yacc. In many ways it is more powerful, and there are plans to add
even more power to it. For those of you not already familiar with
OpenToken, I encourage you to vist the website and look around. But in
the meantime, here's a blurb about it from the readme file:

     The OpenToken package is a facility for performing token
     analysis and parsing within the Ada language. It is designed
     to provide all the functionality of a traditional lexical
     analyzer/parser generator, such as lex/yacc. But due to the
     magic of inheritance and runtime polymorphism it is
     implemented entirely in Ada as withed-in code. No
     precompilation step is required, and no messy tool-generated
     source code is created.

     Additionally, the technique of using classes of recognizers
     promises to make most token specifications as simple as making
     an easy to read procedure call. The most error prone part of
     generating analyzers, the token pattern matching, has been
     taken from the typical user's hands and placed into reusable
     classes. Over time I hope to see the addition of enough
     reusable recognizer classes that very few users will ever need
     to write a custom one. Parse tokens themselves also use this
     technique, so they ought to be just as reusable in principle,
     athough there currently aren't a lot of predefined parse
     tokens included in OpenToken.

     Ada's type safety features should also make misbehaving
     analyzers and parsers easier to debug. All this will hopefully
     add up to token analyzers and parsers that are much simpler
     and faster to create, easier to get working properly, and
     easier to understand.

--
T.E.D.


WWW  - http://www.*-*-*.com/ ;ICQ  - 10545591



Mon, 15 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released
The RPMs for Red Hat and SuSE GNU/Linux are available at http://www.gnuada.org

For glibc-2.1 based systems (RH 6.x, SuSE 6.{2,3}):
http://www.gnuada.org/rpms312p.html#OPENTOKEN

For glibc-2.0 based systems (RH 5.x, SuSE 6.{0,1}):
http://www.gnuada.org/rpms312p_0.html#OPENTOKEN

Cheers
Jrgen



Tue, 16 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> The RPMs for Red Hat and SuSE GNU/Linux are available at http://www.gnuada.org

Cool. I'll make note of that on the website. Thanks Jrgen.

--
T.E.D.


WWW  - http://www.telepath.com/dennison/Ted/TED.html  ICQ  - 10545591



Tue, 16 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> Release 2.0 of OpenToken has now been placed on the website

From a quick look at opentoken.ads, I see a declaration for an
EOF_Character, set to Ada.Characters.Latin_1.EOT. Does this mean
that OpenToken cannot parse binary files that happen to contain
this character? It's a rather odd choice in any case, given that
no system that I know of uses EOT as an end-of-file marker.


Fri, 19 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released


Quote:

> > Release 2.0 of OpenToken has now been placed on the website

> From a quick look at opentoken.ads, I see a declaration for an
> EOF_Character, set to Ada.Characters.Latin_1.EOT. Does this mean
> that OpenToken cannot parse binary files that happen to contain
> this character? It's a rather odd choice in any case, given that
> no system that I know of uses EOT as an end-of-file marker.

That's the marker that the OpenToken text feeders agree put on text to
indicate that there is no more text to read. If you have to parse text
which contains an EOT, its a simple matter to change EOF_Character to
something else.

As for parsing binaries; to my knowledge OT has not been used that way
before. However, I see only one real inpediment. EOF_Character is used
in OpenToken:
   o  In the line comment recognizer (line comments make no sense in
binaries anyway)
   o  In the Text_IO-based text feeder. Using this feeder also makes no
sense in binaries. You'd want to write one based on Sequential_IO or
something.
   o  In the End_Of_File token recognizer. This also makes no sense for
binaries, as a sentinel character which can be tokenized clearly won't
do the job.
   o  By you the user to make sure you don't attempt to read past the
end of the file after a token analysis or parse returns. In this case,
no problem for binaries exists. You just use a different method to
prevent reading past the end of the file.
   o  In the analyzer to prevent reading past the end of file when
matching a token. This *would* be a problem for you, unless none of your
"binary" tokens span an EOT. My suggestions for working around this
problem are follows:
Modify EOF_Character to be a variable so that it can be set by your
custom text feeder. Set it to some good terminating value normally. This
would be a byte value that cannot be anywhere in a token except at the
end. But when you read the last character from the file, you set it to
that value instead.

A better option with a bit more work would be the following:
Modify the root text_feeder package to have a primitive operation for
returning whether we are at the end of the input. Implement that routine
in your custom text feeder (as well as any others that you may use).
Modify the one line in the Analyzer that checks EOF_Character to intead
call that routine on its text feeder.

Proper binary support is not in OT because it has just never come up
before. But as you can see, it could be modified fairly easily to
support parsing binaries. But using a sentinel character for the end of
file has always seemed like a nice simplification. So what are the uses
of parsing binaries? I kinda thought that binaries are, by their very
nature, already parsed.

--
T.E.D.

http://www.telepath.com/~dennison/Ted/TED.html

Sent via Deja.com http://www.deja.com/
Before you buy.



Sat, 20 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> Proper binary support is not in OT because it has just never come up
> before. But as you can see, it could be modified fairly easily to
> support parsing binaries. But using a sentinel character for the end of
> file has always seemed like a nice simplification. So what are the uses
> of parsing binaries? I kinda thought that binaries are, by their very
> nature, already parsed.

Well, at one point I was writing code to parse Adobe PDF files.
They have a binary format, where arbitrary 8-bit bytes can appear,
and a structure which I think lends itself well to syntax-oriented
parsing.

In general, I like to avoid arbitrary restrictions in tools. Before
GNU, most classic UNIX utilities had arbitrary limits, especially
on line size. This led to unexpected and sometimes silent breakage
when the tools were fed files with lines which were too large. And
the tool reporting the problem isn't of much help, when I still have
that file I need to process and the tool won't work.

By the way, the normal C/C++ style for handling EOF is to have the
return type of the character reader be such that it can hold any
value of the character set, plus an out-of-band value representing
EOF. The usual is '#define EOF -1' and 'int getchar()'.



Sat, 20 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:
>By the way, the normal C/C++ style for handling EOF is to have the
>return type of the character reader be such that it can hold any
>value of the character set, plus an out-of-band value representing
>EOF. The usual is '#define EOF -1' and 'int getchar()'.

Which means that c: Character; c := getchar(); is illegal (at least
in Ada; in C it would get silently truncated.) It's a major pain
in C, and a well known source of bugs. How about making it like
procedure GetChar (EOF: in out boolean; Char: in out character);?

--

If you wish to strive for peace of soul then believe;
if you wish to be a devotee of truth, then inquire.
   -- Friedrich Nietzsche



Sat, 20 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> >By the way, the normal C/C++ style for handling EOF is to have the
> >return type of the character reader be such that it can hold any
> >value of the character set, plus an out-of-band value representing
> >EOF. The usual is '#define EOF -1' and 'int getchar()'.

> Which means that c: Character; c := getchar(); is illegal (at least
> in Ada; in C it would get silently truncated.) It's a major pain
> in C, and a well known source of bugs. How about making it like
> procedure GetChar (EOF: in out boolean; Char: in out character);?

There are a few other approaches I can think of to this issue

(1) Exceptions: raise a Not_Found when input is exhausted. Some people
    hate this because "Exceptions are only for error handling, not
    control flow!". OCaml (and SML too I think) use exceptions for this,
    and Ada sometimes does (try reading a file stream without using
    File_Type...)

(2) Provide a query on the sequence, like in Java, so you have code like

    while Has_More_Elements(Seq) loop
        Char := Get_Next_Element(Seq);
        ...
    end loop;

    I find this very readable.

(3) Provide an option type like in (OCa|S)ML which wraps returned elements
    and forces the reader to unwrap them, like this

    type Option_T is (Some, None);

    generic
        type Element_T is private;
    package Options is
      type Optional_T(Option : Option_T) is record
          case Option is
             when Some =>
                 Data : Element_T;
             when None  =>
                 null;
          end case;
      end record;
    end Options;

   loop
        Elem := Get_Next_Element(Seq);
        case Elem.Option is
            when Some => ...
            when None => exit;
        end case;
  end loop;

This is too inefficient for reading chars and is very verbose in Ada; much
less so in ML.

-- Brian



Sat, 20 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> (1) Exceptions: raise a Not_Found when input is exhausted. Some people
>     hate this because "Exceptions are only for error handling, not
>     control flow!". OCaml (and SML too I think) use exceptions for this,
>     and Ada sometimes does (try reading a file stream without using
>     File_Type...)

I don't like this much.
Exceptions are for error handling, not control flow :-)

Quote:
> (2) Provide a query on the sequence, like in Java, so you have code like
>     while Has_More_Elements(Seq) loop
>         Char := Get_Next_Element(Seq);
>         ...
>     end loop;
>     I find this very readable.

Unfortunately, when it comes to input, it is impossible on most systems
to divorce a test for end of input from the attempt to read the input.
This is the classic Pascal file input problem that made I/O in that
language so despised.

Quote:
> (3) Provide an option type like in (OCa|S)ML which wraps returned elements
>     and forces the reader to unwrap them, like this

This is the integer/character thing dressed up in high-falutin' clothes.
It probably adds more overhead than people would want. But it's a fine
technique.

I think the approach best suited for Ada is a procedure with two out
parameters, a boolean for end-of-file, and a character for the data.



Sat, 20 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> If language has well defined and constructed   exceptions
> mechanism  without much overhead then there is nothing
> wrong in using exceptions as condition signals or events.

Because of the way exceptions work, unwinding the stack and calling
finalizers on controlled objects along the way, it's rarely the case
that they are implemented "without much overhead". I understand that
some Ada compilers will watch for exceptions which are thrown and
caught locally, and turn them into efficient code, but that's not
something to rely on for portability. The general implementation
strategy for exceptions is to make their use as cheap as possible
as long as they are not actually thrown, but to allow considerable
overhead once they are thrown.


Sat, 20 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> > (1) Exceptions: raise a Not_Found when input is exhausted. Some people
> >     hate this because "Exceptions are only for error handling, not
> >     control flow!". OCaml (and SML too I think) use exceptions for this,
> >     and Ada sometimes does (try reading a file stream without using
> >     File_Type...)

> I don't like this much.
> Exceptions are for error handling, not control flow :-)

De gustibus non est disputandum. I've gotten used to this technique, and
lived to talk about it.

Quote:
> > (2) Provide a query on the sequence, like in Java, so you have code like
> >     while Has_More_Elements(Seq) loop
> >         Char := Get_Next_Element(Seq);
> >         ...
> >     end loop;
> >     I find this very readable.

> Unfortunately, when it comes to input, it is impossible on most systems
> to divorce a test for end of input from the attempt to read the input.

Exactly true, and as you say unfortunate too.

Quote:
> > (3) Provide an option type like in (OCa|S)ML which wraps returned elements
> >     and forces the reader to unwrap them, like this

> This is the integer/character thing dressed up in high-falutin' clothes.

True, though the high falutin thing is more general and much less prone to
error since it expresses the intent clearly. Its also easily expressible
in C++ (your favorite language?) and other languages which have some form
of parametric polymorphism and variant types. I suppose you can do it in
Eiffel too but faking variants (tagged unions) with classes is an extra
level of ugliness IMO.

I prefer using exceptions here, but I bet most Ada programmers agree with
you and would like a procedure with a two out params.

-- Brian



Sat, 20 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:


>> (1) Exceptions: raise a Not_Found when input is exhausted. Some people
>>     hate this because "Exceptions are only for error handling, not
>>     control flow!". OCaml (and SML too I think) use exceptions for this,
>>     and Ada sometimes does (try reading a file stream without using
>>     File_Type...)

>I don't like this much.
>Exceptions are for error handling, not control flow :-)

It seems to me that this is somewhat narrow view on that.

   More generally exceptions could  be viewed as mechanism
that gives  user a tool to signal  outside that some condition is
true and ability to handle this signal in the place that is not
known in advance.
  Or it may be viewed as some program event.  In the event driven
system we  have ability to choose the level/scope where this event
will be handled.

If language has well defined and constructed   exceptions
mechanism  without much overhead then there is nothing
wrong in using exceptions as condition signals or events.

Regards,
Vladimir Olensky



Sun, 21 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:


> > (1) Exceptions: raise a Not_Found when input is exhausted. Some people
> >     hate this because "Exceptions are only for error handling, not
> >     control flow!". OCaml (and SML too I think) use exceptions for this,
> >     and Ada sometimes does (try reading a file stream without using
> >     File_Type...)

> I don't like this much.
> Exceptions are for error handling, not control flow :-)

The Ada design team expressed a preference for shorter names when
possible ("task" rather than "process"), so why didn't they use "error"
instead of "exception"? The answer is that exceptions are for handling
exceptional situations; not all exceptional situations are errors.

--
Jeff Carter
"We call your door-opening request a silly thing."
Monty python & the Holy Grail



Sun, 21 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released


Quote:
> Well, at one point I was writing code to parse Adobe PDF files.
> They have a binary format, where arbitrary 8-bit bytes can appear,
> and a structure which I think lends itself well to syntax-oriented
> parsing.

You along with some emailers have convinced me. I'll make the change to
the analyzer I mentioned in the previous message. That should be
sufficient to allow binaries to be parsed.

--
T.E.D.

http://www.telepath.com/~dennison/Ted/TED.html

Sent via Deja.com http://www.deja.com/
Before you buy.



Sun, 21 Jul 2002 03:00:00 GMT  
 Announce: OpenToken 2.0 released

Quote:

> True, though the high falutin thing is more general and much less prone to
> error since it expresses the intent clearly. Its also easily expressible
> in C++ (your favorite language?) and other languages which have some form
> of parametric polymorphism and variant types. I suppose you can do it in
> Eiffel too but faking variants (tagged unions) with classes is an extra
> level of ugliness IMO.

Yup (C++). I've seen it referred to as "Fallible<T>". I've also seen
an amusing variant which forces you to test error return codes from
functions. The function returns a "MustRead<T>" object, which will
throw an exception if it is destructed before the value it holds is
extracted.


Sun, 21 Jul 2002 03:00:00 GMT  
 
 [ 34 post ]  Go to page: [1] [2] [3]

 Relevant Pages 

1. Announce: OpenToken 1.3.6 release

2. Announce: OpenToken 1.3 released

3. Announce: OpenToken 1.2.1 released

4. OpenToken 3.0b release

5. Announcement: OpenToken 1.1 released

6. Announce: OpenToken 1.2

7. Announcing the release of DFKI Oz 2.0

8. Announcing the release of DFKI Oz 2.0

9. ANNOUNCE: Octave Version 2.0 released

10. ANNOUNCE: CyberToolbox 2.0 for Java Released !

11. Announcing the release of DFKI Oz 2.0

12. ANNOUNCE - DDE 2.0 Bugfix 4 Release

 

 
Powered by phpBB® Forum Software