Updating AST's in a functional language (was: 8-queens contest) 
Author Message
 Updating AST's in a functional language (was: 8-queens contest)

Some time ago I posted a little contest: who could code the
8-queens puzzle in as less as possible statements. Several
interesting solutions were posted, and especially the advocates
of functional languages manifested themselves strongly. So here
comes a new problem, especially for them, and this time it's not
a game, but a Real Big Problem for me.

I have to parse source code written in a certain rather
unstructured language, and I want to fill several big AST's
(abstract syntax trees) with the information that the parser
reads. Thus, the AST's are updated in an order that is not
known beforehand, and with list-members of mixed type.

I am trying to do this now in Erlang, which is a marvellous
strict single-assignment untyped language. It is also pure,
with the exeption that a kind of global variables are provided
that can be updated destructively.

I can think of no way to perform the creation of the AST's
efficiently in Erlang. For each small component that I add to
an AST, I would be oblighed to copy the whole AST, and add the
new component during this copy. I suppose that e.g. in Haskell
it is as hard, and that only impure functional languages like
Lisp and Scheme can do it easily.

Functional language experts, please supply The Big Tric.

Regards,
Wouter Boeke



Sun, 29 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)
[...]

Quote:
> I have to parse source code written in a certain rather
> unstructured language, and I want to fill several big AST's
> (abstract syntax trees) with the information that the parser
> reads. Thus, the AST's are updated in an order that is not
> known beforehand, and with list-members of mixed type.
> I am trying to do this now in Erlang, which is a marvellous
> strict single-assignment untyped language. It is also pure,
> with the exeption that a kind of global variables are provided
> that can be updated destructively.
> I can think of no way to perform the creation of the AST's
> efficiently in Erlang. For each small component that I add to
> an AST, I would be oblighed to copy the whole AST, and add the
> new component during this copy. I suppose that e.g. in Haskell
> it is as hard, and that only impure functional languages like
> Lisp and Scheme can do it easily.

[...]

It may not be quite as bad as copying the whole AST, if these are
tree-like structures.  One need only copy the nodes in the path from
the root to the small changed component, leaving everything else
shared; roughly speaking, log N time instead of the constant update
time that mutable nodes would afford.

Otherwise, it might be possible to adopt a two-phase approach, first
accumulating the update sequence as some other form of data and then
reformatting into the required ASTs, but it is difficult to say from
the general description.  Is a small example possible?

--
# A E Bass                                      Tel: (01473) 645305

# Martlesham Heath, Ipswich, Suffolk, IP5 3RE   Do not rely on From: line
#                                               Opinions are my own



Sun, 29 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:

>I am trying to do this now in Erlang, which is a marvellous
>strict single-assignment untyped language. It is also pure,
>with the exeption that a kind of global variables are provided
>that can be updated destructively.

>I can think of no way to perform the creation of the AST's
>efficiently in Erlang. For each small component that I add to
>an AST, I would be oblighed to copy the whole AST, and add the
>new component during this copy.

As someone else noted, if you use a tree structure rather than a list,
then this copying need only be O(log N) rather than O(N).

But if you need it to be O(1) rather than O(log N), then
why not use that above-mentioned Erlang facility for global variables
that can be updated destructively?  Surely that is exactly
what it is there for?

Quote:
>I suppose that e.g. in Haskell
>it is as hard, and that only impure functional languages like
>Lisp and Scheme can do it easily.

In Haskell as defined by the Haskell report, yes, it is hard.
However, most (all?) real Haskell implementations provide
some means for destructive update, without abandoning purity.
For example Hugs and ghc provide the STRef and STArray types
in their ST module.

--

WWW: <http://www.cs.mu.oz.au/~fjh>  |   but source code lives forever"



Mon, 30 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:
> As someone else noted, if you use a tree structure rather than a list,
> then this copying need only be O(log N) rather than O(N).

My data structures are already trees of lists. Each list component itself
also can be a tree. Were functional languages not especially good with
respect to complicated data structures?

Quote:
> But if you need it to be O(1) rather than O(log N), then
> why not use that above-mentioned Erlang facility for global variables
> that can be updated destructively?  Surely that is exactly
> what it is there for?

Alas, only the top node of the tree can be assigned to a global variable.
The rest is "single assignment" like any normal Erlang variable.

Tony Bass offered as a solution:

Quote:
> It may not be quite as bad as copying the whole AST, if these are
> tree-like structures.  One need only copy the nodes in the path from
> the root to the small changed component, leaving everything else
> shared;

This is obviously true. However, it implies that for each tree component
a different update function must be written. Not very convenient.

Wouter Boeke



Mon, 30 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)


Quote:
>I have to parse source code written in a certain rather
>unstructured language, and I want to fill several big AST's
>(abstract syntax trees) with the information that the parser
>reads. Thus, the AST's are updated in an order that is not
>known beforehand, and with list-members of mixed type.

I had to solve this exact problem for a client who needed a
custom translator for his domain-specific language to C++.
The data structure I ended up choosing was a tree that was
formed by having a list of mutable structs, each element of
which contained a list of children.  Then I destructively
updated the structs when I needed to change the AST.  After
implementing this solution, I posted my experience to this
newsgroup, and was advised by certain SML/NJ advocates that
I should have simply copied the AST in its entirety every time
I wanted to add a node.  Their point was that this kind of
thing is very very fast in SML, and probably in my language as
well (I used Ocaml).  But even though I understand their position,
I disagree.  When writing a compiler I want to be able to do
AST surgery in-place.  I don't want there to be any question
about efficiency; in other words, if I'm thinking about putting
in a little AST optimization, I don't want there to be any
thought in there like "geez, I don't want to copy the whole AST
again, just to look for this one case."  And when you get to
the next step - turning trees into DAGs and doing lots of tree
transformations to improve your output code - the natural,
efficient way to do it is to scan the tree/dag and fix it up
as you go.  Yes, I know that the purely functional approach
would copy the whole tree for each such scan ... but I'm sorry,
that is just too kludgey for me.

Dwight



Mon, 30 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:



> >I have to parse source code written in a certain rather
> >unstructured language, and I want to fill several big AST's
> >(abstract syntax trees) with the information that the parser
> >reads. Thus, the AST's are updated in an order that is not
> >known beforehand, and with list-members of mixed type.

> I had to solve this exact problem for a client who needed a
> custom translator for his domain-specific language to C++.
> The data structure I ended up choosing was a tree that was
> formed by having a list of mutable structs, each element of
> which contained a list of children.  Then I destructively
> updated the structs when I needed to change the AST.  After
> implementing this solution, I posted my experience to this
> newsgroup, and was advised by certain SML/NJ advocates that
> I should have simply copied the AST in its entirety every time
> I wanted to add a node.  Their point was that this kind of
> thing is very very fast in SML, and probably in my language as
> well (I used Ocaml).  

You seem to be still missing the point.  The reason functional update
can be fast is that if your tree doesn't contain any mutable elements,
then you don't need to actually copy the *whole* tree to get a logical
copy; you just need to copy at most O(log n) nodes of the tree.  In
addition, whatever data the nodes point to doesn't need to be copied,
even for the nodes that get copied; only the references to the data
get copied.  All this sharing is safe because none of the data is
mutable.

For example you said you've used O'Caml.  If you look in its standard
library you'll see that its Map and Set datatypes are functional;
adding a new element returns a new Map or Set containing the new
element.  If the whole Map, including the data, had to actually be
copied this would be horendous, but in reality it's quite efficient.

Before you write off this whole idea based on your current intuitions
about efficiency, try comparing O'Caml's Map datatype to your own
which destructively updates the map.  Compare your code on both
efficiency and clarity and see what you come up with.

Quote:
> But even though I understand their position, I disagree.  When
> writing a compiler I want to be able to do AST surgery in-place.  I
> don't want there to be any question about efficiency; in other
> words, if I'm thinking about putting in a little AST optimization, I
> don't want there to be any thought in there like "geez, I don't want
> to copy the whole AST again, just to look for this one case."  And
> when you get to the next step - turning trees into DAGs and doing
> lots of tree transformations to improve your output code - the
> natural, efficient way to do it is to scan the tree/dag and fix it
> up as you go.  Yes, I know that the purely functional approach would
> copy the whole tree for each such scan ... but I'm sorry, that is
> just too kludgey for me.

Updating in place seems more kludgey to me, even if it is more
efficient at times.

Quote:

> Dwight

--
Adam P. Jenkins



Mon, 30 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:

>> But if you need it to be O(1) rather than O(log N), then
>> why not use that above-mentioned Erlang facility for global variables
>> that can be updated destructively?  Surely that is exactly
>> what it is there for?

>Alas, only the top node of the tree can be assigned to a global variable.
>The rest is "single assignment" like any normal Erlang variable.

I don't know Erlang very well at all.  But I think there will
be some way of doing this kind of thing in Erlang, even if
the above-mentioned global variable construct isn't it.

From a conversation I had a few years ago with Joe Armstrong,
I understand that Erlang has some kind of "table" or "database"
construct that basically gives you some kind of hash tables with
destructive update.  You should be able to easily use that to implement
"pointers" (a pointer is just an int, dereferencing a pointer is just
looking up the int in the table, assigning via a pointer is just storing
a value in the table), and then you can implement data structures
with destructive update using these "pointers", just as you would in C.

--

WWW: <http://www.cs.mu.oz.au/~fjh>  |   but source code lives forever"



Tue, 31 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

  Wouter> Upon the difficulties I had with updating an AST, Fergus

  >>  As someone else noted, if you use a tree structure rather than a
  >> list, then this copying need only be O(log N) rather than O(N).

  Wouter> My data structures are already trees of lists. Each list
  Wouter> component itself also can be a tree. Were functional
  Wouter> languages not especially good with respect to complicated
  Wouter> data structures?

  >>  But if you need it to be O(1) rather than O(log N), then why not
  >> use that above-mentioned Erlang facility for global variables
  >> that can be updated destructively?  Surely that is exactly what
  >> it is there for?

  Wouter> Alas, only the top node of the tree can be assigned to a
  Wouter> global variable.  The rest is "single assignment" like any
  Wouter> normal Erlang variable.

There is a library module for storing tree structures in ETS. It's
called digraph, and is pretty efficient.

On http://www.erlang.org, you will also find a module, dynarray.erl,
which essentially implements an array that grows dynamically to
accomodate your largest index. It's memory-efficient and faster than
ETS in many cases.

  Wouter> Tony Bass offered as a solution:

  >>  It may not be quite as bad as copying the whole AST, if these
  >> are tree-like structures.  One need only copy the nodes in the
  >> path from the root to the small changed component, leaving
  >> everything else shared;

  Wouter> This is obviously true. However, it implies that for each
  Wouter> tree component a different update function must be
  Wouter> written. Not very convenient.

I'm confused by this last statement. Perhaps I've missed some
fundamental aspect of your particular problem.

Actually, what you'd do is to write a generic "update-in-place"
function, to which you can pass a function object. I'll illustrate
on a simple list structure:

update_in_place([H|T],Key,KeyPos,F) when element(KeyPos,H)==Key ->
    [F(H)|T];
update_in_place([H|T],Key,KeyPos,F) ->
    [H|update_in_place(T,Key,KeyPos,F)];
update_in_place([],_,_,_) ->
    [].

The single-assignment aspect only adds the minor complication that
you will get a new variable for your tree after each update:

handle_objects([Object|Rest], Tree) ->
   NewTree = update_in_place(Tree,key(Object),keypos(),updatefun(Object)),
   handle_objects(Rest, NewTree);
handle_objects([], Tree) ->
   Tree.

/Uffe
--
Ulf Wiger, Chief Designer AXD 301
Ericsson Telecom AB                          tfn: +46  8 719 81 95
Varuv?gen 9, ?lvsj?                          mob: +46 70 519 81 95
S-126 25 Stockholm, Sweden                   fax: +46  8 719 43 44



Tue, 31 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:




> ...

> You seem to be still missing the point.  The reason functional update
> can be fast is that if your tree doesn't contain any mutable elements,
> then you don't need to actually copy the *whole* tree to get a logical
> copy; you just need to copy at most O(log n) nodes of the tree.  In
> ...
> Updating in place seems more kludgey to me, even if it is more
> efficient at times.

Unless I'm still missing the point, compilers such as SISAL's OSC
and interpreters such as J and SHARP APL give the programmer
functional semantics for update (hence preserving elegance, purity,
and apple pie), but generate update-in-place code when possible,
based on reference count analysis. In the case of OSC, it may reorder
code to achieve this end. OSC, by the way, uses similar analysis
methods to remove most reference counts from the run-time code.

The sad part of the tale is that LLNL killed the SISAL project in order
to concentrate on their other area of expertise -- creating weapons of
mass destruction.

Bob



Tue, 31 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)
Quote:
Robert Bernecky writes:

 >
 >

 >



 > > ...
 >
 > >
 > > You seem to be still missing the point.  The reason functional update
 > > can be fast is that if your tree doesn't contain any mutable elements,
 > > then you don't need to actually copy the *whole* tree to get a logical
 > > copy; you just need to copy at most O(log n) nodes of the tree.  In
 > > ...
 > > Updating in place seems more kludgey to me, even if it is more
 > > efficient at times.
 >
 > Unless I'm still missing the point, compilers such as SISAL's OSC
 > and interpreters such as J and SHARP APL give the programmer
 > functional semantics for update (hence preserving elegance, purity,
 > and apple pie), but generate update-in-place code when possible,
 > based on reference count analysis. In the case of OSC, it may reorder
 > code to achieve this end. OSC, by the way, uses similar analysis
 > methods to remove most reference counts from the run-time code.

That sounds very interesting!  I never thought of that particular
optimization.  Though what's there to stop any functional programming
language from doing that?  Is it something specific to the languanges
you mentioned?  I.e.  if I write

replace 1 2 [3,2,1]

in SML, where replace replaces 1s with 2s in this case, SML could see
that I'm using a literal list so there's no other references to it,
and actually modify the list.  This wouldn't break the functional
semantics so it's possible here.  Noone claims that functional
languages have to be functional behind the scenes.

In any case, this optimization doesn't change my point, it just
strengthens it, since it sounds like these languages still provide
functional semantics; the updating only takes place behind the scenes
when it can be done invisibly.  The original poster was talking about
having explicit side-effects at the language level which is a
different story.

--
Adam P. Jenkins



Tue, 31 Jul 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:

> I have to parse source code written in a certain rather
> unstructured language, and I want to fill several big AST's
> (abstract syntax trees) with the information that the parser
> reads. Thus, the AST's are updated in an order that is not
> known beforehand, and with list-members of mixed type.

> ...

> I can think of no way to perform the creation of the AST's
> efficiently in Erlang. For each small component that I add to
> an AST, I would be oblighed to copy the whole AST, and add the
> new component during this copy. I suppose that e.g. in Haskell
> it is as hard, and that only impure functional languages like
> Lisp and Scheme can do it easily.

> Functional language experts, please supply The Big Tric.

I think that *laziness* is the trick. By exploiting laziness you can
often construct efficient, purely functional solutions to problems that
would require updates in strict languages.

As an example of the kind of thing you can do, suppose you have a binary
tree of numbers and you should (for some reason) replace all of the
numbers with the minimum of the numbers in the tree. How many times do
you need to traverse the tree? With laziness: only once. (In a strict,
pure functional language: twice).

I am not sure about the details of your specific parsing problem, but if
you can express what you want to do using attribute grammars, then you
can do it in a lazy functional language without using updates or
copying.

Thomas Johnnson has written a paper titled "Attribute Grammars as a
Functional Programming Paradigm", which seems relevant to this. You can
get it from his home page,

    http://www.cs.chalmers.se/~johnsson/

Regards,
Thomas Hallgren



Fri, 03 Aug 2001 03:00:00 GMT  
 
 [ 26 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Updating AST's in a functional language (was: 8-queens contest)

2. ICFP'99 Functional Programming Contest

3. ICFP'98 Functional Programming Contest

4. ICFP'99 Functional Programming Contest

5. ICFP'99 Functional Programming Contest

6. 8 queens: a little contest

7. Contests for 8 queens problem

8. functional language archive update

9. functional language archive update

10. update regarding my functional language archive

11. functional language archive (update)

12. Ruby 10'th most popular ICFP contest language

 

 
Powered by phpBB® Forum Software