Updating AST's in a functional language (was: 8-queens contest) 
Author Message
 Updating AST's in a functional language (was: 8-queens contest)

Quote:
> As an example of the kind of thing you can do, suppose you have a binary
> tree of numbers and you should (for some reason) replace all of the
> numbers with the minimum of the numbers in the tree. How many times do
> you need to traverse the tree? With laziness: only once. (In a strict,
> pure functional language: twice).

Yes, this is one of the Seven Wonders of (Lazy) Functional
Programming.  Unfortunately, the resulting code allocates more heap
and takes more time to run (I ran a quick test with Hugs 1.4 and GHC
3.02), because of tupling. But it's a neat trick.

Regards,
  Laszlo



Sat, 04 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:


> >> But if you need it to be O(1) rather than O(log N), then
> >> why not use that above-mentioned Erlang facility for global variables
> >> that can be updated destructively?  Surely that is exactly
> >> what it is there for?

> >Alas, only the top node of the tree can be assigned to a global variable.
> >The rest is "single assignment" like any normal Erlang variable.

> I don't know Erlang very well at all.  But I think there will
> be some way of doing this kind of thing in Erlang, even if
> the above-mentioned global variable construct isn't it.

> From a conversation I had a few years ago with Joe Armstrong,
> I understand that Erlang has some kind of "table" or "database"
> construct that basically gives you some kind of hash tables with
> destructive update.  You should be able to easily use that to implement
> "pointers" (a pointer is just an int, dereferencing a pointer is just
> looking up the int in the table, assigning via a pointer is just storing
> a value in the table), and then you can implement data structures
> with destructive update using these "pointers", just as you would in C.

    Correct. Erlang has inbuilt  linear hash tables (accessible though
a  module called  ets) these  have  *constant* access  time for  large
number of keys (I am talking in terms of hundreds of millions of keys)
- you   can build  Key-Value  tables  with  constant time  destructive
read/writes. (and the constant in "constant time" is small :-).

   ets tables are designed to be used in real-world situtations when
"pure" solutions just would not work.

   Most Erlang programs are pure and don't need impure facilities. But
serious applications often need a balance of pure and impure
facilities. Try doing a load balancing DNS server in a pure way when
it has to handle tens of quadzillions of domain names - not easy.

 [download a load balancing DNS server in Erlang from
www.eddieware.org to see how]

        /Joe

--
Joe Armstrong  Computer Science Laboratory  +46 8 719 9452
AT2/ETX/DN/SU  Ericsson Telecom SE-126 25 Stockholm Sweden



Sun, 05 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:

>     Correct. Erlang has inbuilt  linear hash tables (accessible though
> a  module called  ets) these  have  *constant* access  time for  large
> number of keys (I am talking in terms of hundreds of millions of keys)
> - you   can build  Key-Value  tables  with  constant time  destructive
> read/writes. (and the constant in "constant time" is small :-).

It should be noted that is also possible to have constant access time
in a implementation of aggregates such as arrays, hashtables and
buffers, which is "behavioural" pure. One such technique is known
under the catchword of "version arrays", another one relates to the
"mbuf's" used in TCP/IP implementations of BSD unices for implementing
IP envelopes (there at least I saw it around 10 years ago, I don't
know of its actual origin).

The basic idea is that constant access is guaranteed as long as one
works on the "newest" version of an aggregate. Only if an "older"
version of the aggregate needs to be accessed again, then copying
may become necessary. So, as long as data structures are used
"single-threaded" (always working on the newest version), the
performance is quite well.

I have used this scheme for implementing immutable aggregates for the
Pizza language (a combination of functional programming concepts and
Java), and are quite happy with its performance in every-day usage
(see http://uebb.cs.tu-berlin.de/zeta/AGLDesignDecisions.html for a
some loose benchmarks).

The implementation of these aggregates is called "behavioural pure",
since one internally needs side-effects, that is either hand-coding or
executing a monad inside of a function which has no monadic
type. However, regarding the interface, no side-effects are observable
by the user.

The constant in "constant time" is not optimal small, and I don't know
how it behaves with hundreds millions of mapped keys in a hashtable ;).
In particular, memory needs to be allocated which keeps track of
the history of the modification of aggregates. However, at least
with copying GC, this memory needs to be never touched again if
an aggregate is used single-threaded.

Regards
  Wolfgang

--



Mon, 06 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

I've done a lot of this sort of thing over the years, in several different
languages, mainly functional ones. I'm currently doing it in Java, and
coming to a rather distressing conclusion: the imperative way is better.

It's not primarily efficiency. Depending on exactly what you're doing,
and what language/implementation you're using,
the cost of doing it functionally may be nothing (the compiler plants
the updates for you), a constant (copying vs. updating, but the same
overall traversal), a log cost, usually with a fairly large constant
(especially when dealing with cyclic structures), or worse.

The more important reason is that I believe that the imperative code is
actually "better", in two ways:

- It's simpler (not necssarily shorter because of syntax issues) because
there's no need to construct and pull apart tuples or other structured
return values.
- Copying gives you a version control problem. If you have multiple
versions of a data structure,or parts of a data structure, around it's
easy to pick up the "wrong" one. I've found this to be a sigificant source
of bugs, sometimes quite long-lived ones. In the imperative version,
everybody shares the same instance of a thing, so everybody consistently
sees the latest state, unless you explicitly decide to make a copy, which
is rare and explicit in the code.

I know this isn't going to be popular on comp.lang.functional, but
honestly that is my experience. I'm about to write a paper about this
issue amongst others. Whether I can get anybody to accept it is another
matter :-)

  John



Tue, 07 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:

> I know this isn't going to be popular on comp.lang.functional, but
> honestly that is my experience. I'm about to write a paper about this
> issue amongst others. Whether I can get anybody to accept it is another
> matter :-)

This is not a suprise for me. I just state in another thread some weeks
ago that IMO the best solution is an OO/FP language. In each region some
paradigm should take the lead, and Datastructures (with state and all
the like ) or the "Architekture" should be based on the OO side. And all
computation should be done functional. IMO that's the best solution.

Regards
Friedrich



Tue, 07 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:

>- Copying gives you a version control problem. If you have multiple
>versions of a data structure,or parts of a data structure, around it's
>easy to pick up the "wrong" one. I've found this to be a sigificant source
>of bugs

FWIW, we've had the same experience with this problem
while writing the Mercury compiler.

--

WWW: <http://www.cs.mu.oz.au/~fjh>  |   but source code lives forever"



Tue, 07 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:

> The more important reason is that I believe that the imperative code is
> actually "better", in two ways:

> - It's simpler (not necssarily shorter because of syntax issues) because
> there's no need to construct and pull apart tuples or other structured
> return values.

This is a striking argument. In particular, if one just wants to
attribute an AST, for example to add type information, the functional
style is cumbersome. But also turning around this inevitable
"environments" of the algorithm.

Quote:
> - Copying gives you a version control problem. If you have multiple
> versions of a data structure,or parts of a data structure, around it's
> easy to pick up the "wrong" one. I've found this to be a sigificant source
> of bugs, sometimes quite long-lived ones.

What do you mean with "versions around"? There is a common problem of
the kind where one unintentionally introduces dead code:

        LET x2 == f(x1)
            x3 == h(x2)
        IN
        x2

This typically happens if code is incrementally "refined". However, a
compiler should be able to point you to such bugs. (The OPAL compiler
does this.)

My experience with imperative programming of AST-problems in
Pizza/Java is just the other way around: in particular, attributes
attached to AST nodes which are represented by aggregates (such as the
"set of used names"), which are passed to other algorithms which
unintentionally modify them, turned out to be a significant source of
bugs.

The mean I used in the second try is to use selective update on
attribute fields of AST nodes, but to treat the other AST fields and
the attribute values themselves as immutable. And, of course, implementing
the environment of an AST algorithm as a state, to keep the parameter
lists and return types handy.

Quote:
> I'm about to write a paper about this
> issue amongst others. Whether I can get anybody to accept it is another
> matter :-)

This seems to be a serious problem at the moment.

Regards
  Wolfgang

--



Tue, 07 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)

Quote:


>> - Copying gives you a version control problem. If you have multiple
>> versions of a data structure,or parts of a data structure, around it's
>> easy to pick up the "wrong" one. I've found this to be a sigificant source
>> of bugs, sometimes quite long-lived ones.

>What do you mean with "versions around"? There is a common problem of
>the kind where one unintentionally introduces dead code:

>    LET x2 == f(x1)
>        x3 == h(x2)
>    IN
>    x2

>This typically happens if code is incrementally "refined". However, a
>compiler should be able to point you to such bugs. (The OPAL compiler
>does this.)

The Mercury compiler usually catches bugs like that too,
via singleton variable warnings.  So that's not the sort
of problem we encountered.  The ones we encountered were
things like this:

        foo index subindex data_0 = data_3 where
                field_0 = get_field index data_0
                subfield_0 = get_subfield subindex field_0
                (subfield_1, data_1) = bar subfield_0 data_0
                data_2 = baz data_1
                field_1 = set_subfield field_0 subindex subfield_1
                data_3 = set_field data_2 index field_1

The bugs here should be obvious, right? ;-)

--

WWW: <http://www.cs.mu.oz.au/~fjh>  |   but source code lives forever"



Wed, 08 Aug 2001 03:00:00 GMT  
 Updating AST's in a functional language (was: 8-queens contest)


Quote:
>> As an example of the kind of thing you can do, suppose you have a binary
>> tree of numbers and you should (for some reason) replace all of the
>> numbers with the minimum of the numbers in the tree. How many times do
>> you need to traverse the tree? With laziness: only once. (In a strict,
>> pure functional language: twice).

>Yes, this is one of the Seven Wonders of (Lazy) Functional
>Programming.  Unfortunately, the resulting code allocates more heap
>and takes more time to run (I ran a quick test with Hugs 1.4 and GHC
>3.02), because of tupling. But it's a neat trick.

Actually, it's not a wonder of lazy fp at all; it works just as well
(namely, not all that well, as Laszlo observes) in an eager fl. Just define
the function

  repmin :: Tree a -> (b -> Tree b, a)

which takes a tree t and returns a pair: the first component is a function,
which takes a value b and returns a tree of the same shape as t but with
all the elements replaced by b; the second component of the pair is the
minimum value in the tree t. In other words,

  repmin t = ( ( \ b -> map (const b) t ), treemin t )

Then you apply the first component to the second. That only traverses the
input tree once, and works just as well in a pure eager fl as in say
Haskell. (Of course, some "traversing" is involved in the first component
of the result, but that is generating a new tree and so is technically not
traversing the original one!)

Jeremy

--
Jeremy Gibbons <jgibbons at brookes dot ac dot uk>
  School of Computing and Mathematical Sciences,    tel +44 1865 483660
  Oxford Brookes University, Headington Campus,     fax +44 1865 483666
  Oxford OX3 0BP, UK.      http://www.brookes.ac.uk/~p0071749/home.html



Tue, 14 Aug 2001 03:00:00 GMT  
 
 [ 26 post ]  Go to page: [1] [2]

 Relevant Pages 
 

 
Powered by phpBB® Forum Software