Looking for help with Rattlesnake (more bite for Python) 
Author Message
 Looking for help with Rattlesnake (more bite for Python)

[Turning the pure stack based python VM into a full-blown register
 VM... Rattlesnake]

I like the approach. What I would change, though, is to make the
optimizer run as external program and not (only) hooked from inside
a running Python process.

To get a group working on the code, I'd suggest creating an eList
mailing list and inviting as many gurus as you can get :-)

I would also be especially interested in ways to inline
functions/methos. Function calls are still too slow in Python.

Aside: I have a hacked up version of ceval.c too, so you might want
to integrate those patches as well (mostly breaking the big switch
statement in smaller ones and restructuring the calling scheme).

--
Marc-Andre Lemburg                               Y2000: 404 days left
---------------------------------------------------------------------
          : Python Pages >>> http://www.*-*-*.com/ ~lemburg/  :
           ---------------------------------------------------------



Thu, 10 May 2001 03:00:00 GMT  
 Looking for help with Rattlesnake (more bite for Python)

Quote:

>I've been working on a peephole optimizer for Python for some time.  I

Cool.

Quote:
>One of the basic limitations of the work I reported on is that the
>stack-based virtual machine does an awful lot of data shuffling, much of
>which can't be eliminated by your basic peephole optimizer.  What is needed
>is an entirely different virtual machine architecture.

This isn't really true.  Stack-based optimizers have been less studied than
register-based ones, but there has been enough to give good results.  Forth
is one of the classical uses of such optimizers (although, of course, Forth
isn't crippled by being limited to 256 bytecodes; it uses wordcodes, so the
peephole optimizer can insert 'anonymous' wordcodes to replace an
inefficient sequence).

Um... Let's see.  Check out ThisForth, at www.taygeta.com; at that site
you'll also find a lot of papers on stack optimizations.

Stack machines vs. register machines does involve some interesting
tradeoffs, but there's absolutely nothing which makes one superior to the
other.

Interesting question: if changing the format of the bytecode to
register-based is interesting, how interesting owuld it be to change the
format to wordcode?  Every instruction would take up 16 bits -- that's 16K
instructions available to the optimizer.

That might be even better than the memory consumption for the register
machine, since the stack wordcodes don't need to store register numbers.

Quote:
>Skip Montanaro

-Billy


Thu, 10 May 2001 03:00:00 GMT  
 Looking for help with Rattlesnake (more bite for Python)

Quote:

>Billy,
>Thanks for the pointer to the Forth page. I didn't look at the ThisForth
>source code, though I did locate a copy of Koopman's "A Preliminary
>Exploration of Optimized Stack Code Generation" at
>   http://www.cs.cmu.edu/~koopman/stack_compiler/index.html

Anton Ertl has written more about these issues; see

http://www.complang.tuwien.ac.at/projects/forth.html
http://www.complang.tuwien.ac.at/projects/backends.html

under the subjects

Code generation for stack machines
Forth native code generation
Stack caching for interpreters
  (This one in particular deals with register-machine versus
   stack-machine tradeoffs)
Patterns in interpreted Forth execution

Also, Rob Pike and Phil Winterbottom's ``Design of the Inferno Virtual
Machine'' agrees with you about register machines being a better idea
today:

http://plan9.bell-labs.com/cm/cs/who/rob/hotchips.html

(I dunno about that and think there's room for more work.)

Quote:
>One thing that's worth noting is that Koopman appears to be generating code
>for a real stack machine (Harris RTX 2000).  Presumably access to that
>machine's stack is going to be substantially faster than main memory access.
>Such is not the case with the Python VM, since the stack and the local
>variables are right next to one another in memory.  A LOAD_FAST instruction
>does about as much work as a DUP_TOP or ROT_THREE instruction, so the effort
>of making a bottom-of-the-stack copy of a variable you want to reuse so you
>can recall it with a stack operation isn't going to gain much, if anything.
>Saving a copy of a global variable might be useful, but with multi-threading
>and the possibility of function calls it's not clear that the global
>variable's value won't be invalidated.
>I'm not sure what to make of the wordcode/bytecode distinction, but I'm not a
>Forth person, so perhaps I'm missing something.  I would appreciate a bit
>more elaboration on that point. I don't feel particularly limited by only
>having 256 byte codes, especially since the two VMs are completely separate.

I've worked on two 16-bit wordcode interpreters that others designed,
and don't think it's an especially great idea.  One of them only went
for 16 bits because the language used Unicode and it was convenient to
put bytecode programs in string objects -- though it also meant less
pressure to have short special-case opcodes for branches, etc.  The
other was in Java and had over 1000 opcodes, mostly combinations of
simpler actions, but we ended up using other tricks for real speed
(like direct JVM code generation).  Er, that's real speed by Java
standards, anyway.

my-first-c.l.p.-post'ly yrs,
Darius

--
Darius Bacon    http://www.well.com/~djello



Sun, 13 May 2001 03:00:00 GMT  
 Looking for help with Rattlesnake (more bite for Python)

Quote:

>> I'm not sure what to make of the wordcode/bytecode distinction, but I'm
>> not a Forth person, so perhaps I'm missing something.  I would
>> appreciate a bit more elaboration on that point. I don't feel
>> particularly limited by only having 256 byte codes, especially since
>> the two VMs are completely separate.

The bytecode/wordcode thing is not part of the ANS Forth specification.
My version is on a 32 bit machine and uses 32 bits for its dictionary
spaces and for its data stack.
The top of the data stack is actually a register, which makes a definition
very quick if its input and output are both a single stack value.

There is essentially only one thing I really miss when programming
in Python rather than in Forth. That is the ability to define
fragments of machine code using Forth as a macro-assembler, and
test them literally 2 seconds later.
My Forth system doesn't use about 7 of the 32 bit registers, so
I don't even have to wrap my machine code in a 'save registers / restore
registers' packet.
Whatever way you look at it, its a heck of a lot quicker than using
C.

Or is a macro-assembler actually possible in Python ? It would be nice to
be able to save the fragment of code as a normal Pythonish object.

Ken,     __O
       _-\<,_
      (_)/ (_)                                                                
    Virtuale Saluton.



Sun, 13 May 2001 03:00:00 GMT  
 Looking for help with Rattlesnake (more bite for Python)

Quote:
> Also, Rob Pike and Phil Winterbottom's ``Design of the Inferno Virtual
> Machine'' agrees with you about register machines being a better idea
> today:

> http://plan9.bell-labs.com/cm/cs/who/rob/hotchips.html

> (I dunno about that and think there's room for more work.)

Note that they are talking about a different kind of VM though --
their operands are more traditional data types like 32bit ints.  (They
talk a lot about the JVM and JIT in their paper.)

I'm not yet convinced that it makes much sense to apply the usual CISC
machine optimization techniques to Python and expect to get similar
results.  I don't really have the time to look into the Forth papers
but I suspect that they might provide a more fertile source of ideas.

I do think that reducing the number of instructions needed to execute
a given piece of Python make a lot of sense though, and that's why I
still think that Skip's plan to use 3-address opcodes makes sense.

--Guido van Rossum (home page: http://www.python.org/~guido/)



Sun, 13 May 2001 03:00:00 GMT  
 Looking for help with Rattlesnake (more bite for Python)

Quote:

> I do think that reducing the number of instructions needed to execute
> a given piece of Python make a lot of sense though, and that's why I
> still think that Skip's plan to use 3-address opcodes makes sense.

What are '3-address opcodes'?

Michael
--
     ''''\     Michael Scharf

    `    >     http://www.TakeFive.com



Sun, 13 May 2001 03:00:00 GMT  
 
 [ 9 post ] 

 Relevant Pages 

1. Rattlesnake progress

2. Help me overcome yet another bit of Java/Python brain damage

3. Help, Compile / debug Python 2.2.1 64 bit on AIX

4. Help needed - 16-bit Bit Serial Fixed POint Sine Function

5. 32-bit addr in 16-bit segments - help!!!

6. Help!! Writing binary bit by bit

7. HELP !! converting 16 bit code to 32 bit

8. Help Micro Focus 32 bit calling 16 bit c

9. Looking for IRIX Python programmers for help

10. Looking for help to compile Python

11. Looking for help in updating an old python program

12. Division 32-Bit/32-Bit with 16-Bit Register

 

 
Powered by phpBB® Forum Software