MachineForth disassembly question 
Author Message
 MachineForth disassembly question

Hello,

if I develop code in MachineForth and would place a command
like see <word>, would I get the same result as the source
code was ?
if it is so, than why do we need the source code anymore ?

probably there are problems with conditional jumps, I read
sometimes about that, but is it also true for MachineForth ?

I would be very happy if we could remove the source code.

--
Andreas Klimas



Sat, 07 Feb 2004 23:10:08 GMT  
 MachineForth disassembly question

Quote:

> Hello,

> if I develop code in MachineForth and would place a command
> like see <word>, would I get the same result as the source
> code was ?
> if it is so, than why do we need the source code anymore ?

What you describe was Chuck's sourceless programming
experiment.  He said the minimal source code is none.

OK and OKAD were implemented on Novix, ShBoom, Apple IIc,
386 (maybe something else I forget) all without source.
There was a code disassembler as part of the OK tools
and the code was bootstrapped through de{*filter*}s or
monitors to start and then build with the OK tools.
Font editor, screen editor, hex editor, code editor,
a couple tiny disk I/O routines, and a 7 function
key table interpreter are the OK system.

When I wrote a simulator for P20 and added a monitor like
interface I independently came up with the same basic
user interface that Chuck had put in OKAD for building
and testing P20 code in the simulator.  With only 25
5 bit opcodes you would toggle instruction fields with
something like a thumbwheel viewing disassembled
opcodes by name.  It is part of the user interface in
the S21 (P21) and F21sim simulators.  The one I did
for ITV also loaded a symbol table from the compiled
source so it could show all the word names when
decompiling the code.

This technique worked reasonable well for the first
generation of development tools, but Chuck abandoned
sourceless programming after a few years and first went
back to source for OKAD then later back to Forth, Forth
source, Forth comiler, etc.   At first ColorForth was
used for CAD scripting in OKAD under OK.  Later Chuck
reversed things rewriting ColorForth II and rewriting
OKAD II in ColorForth.   So instead of Forth being
a scripting layer on top of OK Forth became the OS
and platform for the CAD software.

As for the sourceless programming era...
The biggest problem that he noted was that things
that were computed at comile time and left as
data tables or literals were not easy to tweak
in HEX because what they were and why they had
the bit patterns they had was not obvious without
source.  Also some sequences of compiled code
were ambigous because they could have been
generated by a number of differnt high level
word sequences.  You can show the opcodes that
got compiled, but it may not be very close to
the source unless the source happens to be
a sequence of non-ambiguous opcodes.

I decided to take advantage of this in Aha
by combining ideas from traditional Forth,
sourceless programming (the ones that worked),
and Chuck's more recent ideas in ColorForth.
The editor tokenizes source and identifies
different classes that can be tokenized in
different ways.  In the case of unambiguous
5 bit opcode sequences the tokens choosen
to represent the source are the object code
binary sequences.  So this class of token
is both source and object code.  But other
classes of source are still needed to
solve the problems that we found with sourceless
programming.

I wanted full source, that could be normal ANS
Forth if I want with comments, and to be able
to represent it in a compact form and compile
it into object code as fast as possible with
a small program.  It should also be able to
provide source level debugging and code browsing
with a minimal representation of everything
in source and everything that goes on at
compile time.

I have a couple of essays and a nice video about the
history and stages of Chuck's approach to programming
over the last fif{*filter*} years and how the ideas fit
together in Aha.

Quote:
> probably there are problems with conditional jumps, I read
> sometimes about that, but is it also true for MachineForth ?

Conditional jumps are not really the problem. We only have
two, jump in T=0 and jump if carry=0.  They get compiled
by IF -IF  UNTIL -UNTIL  WHILE and -WHILE.  The compiled
code can be resolved back to source pretty easily in most
cases.  The bigger problem was data tables and comments.
Some opcode sequences are unambiguous but because of macro
expansion and code optimizations not all source can
be reconstructed from object code.

Quote:
> I would be very happy if we could remove the source code.

I spend a couple of years experimenting with sourceless
programming before I moved on.  However I did try to
reclaim the best ideas and combine them with other
ideas to make Aha.

But all of this stuff matches well to Chuck's chips
where there are few and tiny opcodes.  It didn't make
a good match to the Pentium on Chuck's CAD code.
Using a thumbwheel interface to cycle through all
the possible Pentium opcodes to insert the one you
want into a slot is not practical.  It works with
25 or 27 opcodes but not on Pentium where it would
take quite a long time to cycle through all possible
opcode bit combinations in a user interface.



Sat, 07 Feb 2004 23:07:58 GMT  
 MachineForth disassembly question

Quote:

> I have a couple of essays and a nice video about the
> history and stages of Chuck's approach to programming
> over the last fif{*filter*} years and how the ideas fit
> together in Aha.

which we might find on you home page, isn't it ?

Quote:
> I spend a couple of years experimenting with sourceless
> programming before I moved on.  However I did try to
> reclaim the best ideas and combine them with other
> ideas to make Aha.
> But all of this stuff matches well to Chuck's chips
> where there are few and tiny opcodes.  It didn't make
> a good match to the Pentium on Chuck's CAD code.
> Using a thumbwheel interface to cycle through all
> the possible Pentium opcodes to insert the one you
> want into a slot is not practical.  It works with
> 25 or 27 opcodes but not on Pentium where it would
> take quite a long time to cycle through all possible
> opcode bit combinations in a user interface.

I read some paper about Aha and ColorForth. it's really
great and I want to do it the same way. actually I'm
a newbie, but the time will go on :-)

I hate the i-architecture, so I do my Forth - stuff
with an old AtariST, it's fast enough. may be I would
implement ColorForth (ColorlessForth because the
monitor only is B/W) on top of a virtual MachineForth.

--
Andreas Klimas



Sun, 08 Feb 2004 01:01:18 GMT  
 MachineForth disassembly question
Hi Jeff,

Quote:
>Using a thumbwheel interface to cycle through all
>the possible Pentium opcodes to insert the one you
>want into a slot is not practical.  It works with
>25 or 27 opcodes but not on Pentium where it would
>take quite a long time to cycle through all possible
>opcode bit combinations in a user interface.

Re sourceless programming :
I can see the problem of object code not always mapping to only one source
string, but I am not sure about this last comment.
If the requirement is to allow the user to specify the next word to be
compiled, isn't it possible to use an input string and ' ( tick ) to find
the opcode from its name?
I am probably missing something here...

Regards

Howerd



Sun, 08 Feb 2004 06:05:03 GMT  
 MachineForth disassembly question

Quote:

> Hi Jeff,

> >Using a thumbwheel interface to cycle through all
> >the possible Pentium opcodes to insert the one you
> >want into a slot is not practical.  It works with
> >25 or 27 opcodes but not on Pentium where it would
> >take quite a long time to cycle through all possible
> >opcode bit combinations in a user interface.
> If the requirement is to allow the user to specify the
> next word to be compiled, isn't it possible to use an
> input string and ' ( tick ) to find the opcode from its name?
> I am probably missing something here...

Well the technique I described was used first by Chuck
in OKAD to edit the code to be executed by the simulated
chip inside of his simulator and at almost the same
time I independently put the same interface into a
chip software simulator.

Left and right keys moved between the four five bit
slots in a word that contained each of the opcodes.
The up and down arrow keys would toggle the opcodes.
So you could point at a word, tap the up/down and
left/right keys a few times and edit the word
pointed to into the pattern you wanted.

Now remember that on OKAD at this time there was
a chorded keyboard.  Later replaced by seven keys
on a PC keyboard.  The user had a left/right
and up/down and three "button" keys at any
time.  The user was not typing in text strings
and at no time were text strings (other than
a single character) being processed.

Now it is true that OKAD had a crude dictionary
with some markers, constructed in the hex editor
that was used by the decompiler to associate
names with some routines, but at no time did
it compile anything using this dictionary.

The S21 interface also used key based menus
(larger than 7 keys) on a PC keyboard.  But
it had no dictionary other than the instruction
disassembler being able to display opcodes
by name.  There was no facility to assign
a unique key to each possible opcode and
enter opcodes by name.

With only 25 or 27 opcodes the up or down
arrow key could get you from any opcode to
any other opcode in a dozen taps or less
if you went the right direction.  The
five bit slot toggle was capable of editing
literal values in a clumsy way and one
could enter 20 bit binary values as numbers
as well.

Of course it would be possible for these things
to have dictionaries and convert from a source
representation to object representation, but
the point was that they didn't.  They simply
had nothing resembling source.

The thumbwheel type opcode editor worked OK
on the MISC chips themselves because with
5 bit opcodes a few taps would cycle you
through all possible binary opcode patterns.
If you missed the one you wanted tap a few
more times and it would come by again.

On Pentium the opcodes are bigger than 5 bits.
To increment a binary number representing all
possible Pentium opcodes is not a good match
to this style of interface.  Tapping the cursor
key a few billion times to increment or decrement
a 32 bit number through all possible Pentium
opcodes wouldn't do.  But since Chuck did
create one version of OKAD for 386 by hacking
away with a de{*filter*} or the hex editor in OK
the sourceless programming concepts did not
match as well to the Intel host machine as
to his simple chips.  With a few 5 bit instructions
toggling opcode fields and displaying them by
name are trivial things.

The later F21sim simulator and F21emu emulator
added MachineForth compilers and used the FPC
editor to move between source files that you
could edit, and compiled executable object
code (images of which could also be saved
or loaded).  The user still had the opcode
toggle interface in the F21simulator, but
it did include source code, a compiler,
and an editor.  They they also got ROM
emulation so that if a ROM image file was
present they would load it and boot
from simulated ROM.  So after a program
was debugged for DRAM or SRAM some ROM
8 bit boot code could be added to load the
images from ROM in RAM and run them.

And of course, one can have a dictionary with
names and convert them. But sourceless programming
was about zero source.

The way I took advantage of the sourceless
programming idea of the high correlation between
source and object on this architecture was that
in Aha some source is tokenized into tokens that
just happen to also be the five bit opcodes that
the Forth words reprent.  This class of token
simply gets moved from source to object.  Another
class of token is macro words.  They expand at compile
time to make opcode sequences.  They are the predefined
macros in MachineForth, so assigned six bit tokens
represent both opcode tokens and macro tokens.  The
macro tokens require a execution jump table.  The
next type of token are defined words that compile to
callable subroutines. The preparsed source is compressed
by using counted packed linked strings.  So the words
are represented by pointers to CFA slots in the source
that are set and used in the compile process.  There
are also binary tokens and comment tokens.  All
dictionary building, searching, and error checking
takes place in an editor or in a converter that
converts from say ASCII source code representation
to tokenized source represenation.  Nothing is lost
in the tokenization process but the source is
preparsed and compressed so that a few hundred words
of code can compile object code and link it to
source at very high speed.  Editors for compressed
source do become more complex because lots of
parsing and error checking takes place at edit time.
I tried to take ColorForth one more step and mix
it with some older ideas from the sourceless
programming and MachineForth.  Something closely
matched to the chip design.

Sean Pringle used the callable words represented by dictionary
links in source idea in his FLUX implementation,
another variation on ColorForth ideas.



Sun, 08 Feb 2004 07:06:34 GMT  
 MachineForth disassembly question
Hi Jeff,

Thanks for the explanation!
I have not yet fully explored the "sourceless" route, but its good to see
that others are pushing into uncharted territory.
One of the most important aspects of Forth, for me, is the ability to decide
exactly which parts of the system go where.
My current thinking is to move all of the compiler into the editor ( i.e. no
source ).
ColorForth seems to have roughly half of it there ( pre-parsed source ).
Holon sort of pre-sorts the source.
All of these approaches are very interesting...
I probably won't understand the problems that you and Chuck have had with
"sourceless" systems until I hit my own head against them!

Regards

Howerd



Mon, 09 Feb 2004 01:26:02 GMT  
 MachineForth disassembly question

Quote:

> I probably won't understand the problems that you and
> Chuck have had with "sourceless" systems until I hit
> my own head against them!

Chuck considered his sourceless programing as a sort
of dead end experiment.  He had to go back to Forth
with source code.  Lots of MachineForth ideas went
into ColorForth, but few if any sourceless phase
ideas.  Aha picked up some of the discarded sourceless
phase pieces to create an odd source representation.

Chuck originally liked the idea that if the underlying
architecture were simple enough, and designed to
support it and if the high level language constructs
matched the hardware very closely that decompile from
object code would be all that was needed.  It is one of
those things that was close with his own machines
and his own language.  It is not a good match at
all to most other architectures or other languages.

It is also a good example of real bare metal
programming, using monitors, de{*filter*}s or hex
editors to bootstrap a system.  Been there,
done that a number of times.  I prefer to
compile from source if possible.



Mon, 09 Feb 2004 01:40:44 GMT  
 MachineForth disassembly question

Quote:

> On Pentium the opcodes are bigger than 5 bits.
> To increment a binary number representing all
> possible Pentium opcodes is not a good match
> to this style of interface.

The 386 instruction set is huge, but it's also mostly useless.
I wouldn't be surprised if once all the lard is eliminated
there aren't that many more than 32 useful instructions. You
just have to remember which ones you want to cycle through.
--
lysse at lysse dot co dot uk
"Why are your problems always so much bigger than everyone else's?"
  "Because they're mine."  -- Ally McBeal


Mon, 09 Feb 2004 09:31:32 GMT  
 MachineForth disassembly question

Quote:


> > On Pentium the opcodes are bigger than 5 bits.
> > To increment a binary number representing all
> > possible Pentium opcodes is not a good match
> > to this style of interface.

> The 386 instruction set is huge, but it's also mostly useless.
> I wouldn't be surprised if once all the lard is eliminated
> there aren't that many more than 32 useful instructions. You
> just have to remember which ones you want to cycle through.

Yes, that is true.  Once you define a small set of routines
to do the primitives of the virtual machine you can then
write in Forth.  Chuck's virtual machine in his ColorForth
has more or less the 27 opcodes in his machines and a
few other primitives that are macros on his machine.

What Chuck was doing in the hex editor in OK was building
386 code so he hadn't specified anything like the Forth
core words he used there.  He just used the thumbwheel
style interface in his code to edit the simulated
code for the simulated chip being designed in OKAD.
He had made the point that he didn't use such an
interface in OK to build 386 code as building 386
code and building code for his chips were quite
different.  You are saying that it _could_ be
used for something like Forth code editing on a 386.

Yes, the same thumbwheel opcode interface that was in early
OKAD and the S21 simulator could be used to construct
Forth code without souce on any machine where the compiled
opcode sequences could be identified and set to any
in a table.

But on Pentium the opcode sequences would not all be
the same length and could not be thumbwheeled in slots
unless they were all padded to the same minimal lenght.
But it could be done.  You could construct Forth code
with these big slots and even edit it with a program
with such an interface.  But I wouldn't.

It was used to construct and edit object code.  If it
was used on tokenized source then the thumbwheeling
tokens to edit source would be a better fit.  It is an
odd idea in the ColorForth and Aha directions.  In fact
an Aha editor would have to have all the same functions.  
The same functions that I wrote with those thumbwheel
sytle interfaces.

I kind of like the type ahead error checking editor
idea more these days.  Chuck isn't doing type ahead
in ColorForth but it would be a nice fit in Aha.



Mon, 09 Feb 2004 09:15:03 GMT  
 MachineForth disassembly question


Quote:

>> Hello,

>> if I develop code in MachineForth and would place a command
>> like see <word>, would I get the same result as the source
>> code was ?
>> if it is so, than why do we need the source code anymore ?

>What you describe was Chuck's sourceless programming
>experiment.  He said the minimal source code is none.

Many years ago I wrote a Forth compiler in Turbo Prolog. The source
became a Prolog database. Each word became a list of Prolog atoms
(string tokens). For each Forth word there was an atom. SEE <word>
just displayed the list. The result was nearly as good as the original
source. But I preferred the source nevertheless (readability,
comments, et cetera).

BTW the compiler produced 8086 opcodes as a "side effect" while
solving questions with the database. The resulting programs were very
compact, even smaller than those built with TCOM. But it was also a
bit less flexible than TCOM.

I still like that combination of Forth and Prolog. A Forth program
often resembles a word tree: a word is a list of other words or it is
a primitive. And Prolog is ideal for working with trees.

Andreas



Mon, 09 Feb 2004 20:30:05 GMT  
 MachineForth disassembly question

Quote:
>lysse

>> On Pentium the opcodes are bigger than 5 bits.
>> To increment a binary number representing all
>> possible Pentium opcodes is not a good match
>> to this style of interface.

>The 386 instruction set is huge, but it's also mostly useless.
>I wouldn't be surprised if once all the lard is eliminated
>there aren't that many more than 32 useful instructions. You

Given a register machine, capable of true multi-user operation via several
different schemes, there are more than 32 useful variants of MOV on 386+.
There are a number of ops you might only use once in an entire system, but
when you need em you need em. The 8008 ASCII instructions are, AFAICT,
truly useless, but that's just a few.

Rick Hohensee



Tue, 10 Feb 2004 04:05:52 GMT  
 MachineForth disassembly question

Quote:

>> On Pentium the opcodes are bigger than 5 bits.
>> To increment a binary number representing all
>> possible Pentium opcodes is not a good match
>> to this style of interface.

>The 386 instruction set is huge, but it's also mostly useless.
>I wouldn't be surprised if once all the lard is eliminated
>there aren't that many more than 32 useful instructions. You
>just have to remember which ones you want to cycle through.

One should also remember that most compilers DO NOT take advantage of
the huge instruction sets provided by CISC processors.  Therefore,
there is very little advantage in having a large instruction set, and
many disadvantages, such as having to use a microcode engine, having
very long pipelines to decode all sorts of different complex
addressing modes (on modern superscalar processors, long pipelines
mean having very large mispredict penalties; this is why Intel's
lengthening the pipeline to ramp of the GHz in the new Pentium 4 is
NOT a good idea; there's more to speed than MHz/GHz, and this all
comes at the price of having steep mispredict penalties, which can
quickly demolish any other speed advantages a processor has).  Thus,
even if they don't have as much MHz/GHz, a good RISC processor is much
better than an equivalent CISC processor; one does not need to use as
tiny of a fab process, which means that less chips per batch are bad
(whenever you shrink the features on a chip, the production of that
chip is less reliable, and more nonfunctional chips are caught in the
testing phase), the chips produced dissipate more power (just look at
how much power an Intel or AMD x86 chip sucks and how much heat it
dissipates; then compare this to various RISC chips, such as ARM (yes,
this isn't pure RISC because of the relatively limited (only 15)
general purpose registers and the existance of the LDM and STM
instructions), PowerPC, (Ultra)Sparc, MIPS, SuperH, HP-RISC, etc.),
etc.  In conclusion, CISC processors are just not worth it; I'd rather
spend my money on huge caches rather than on microcode engines and
ridiculously long pipelines.

--
Yes, I know my enemies.
They're the teachers who tell me to fight me.
Compromise, conformity, assimilation, submission, ignorance,
hypocrisy, brutality, the elite.
All of which are American dreams.

              - Rage Against The Machine



Tue, 10 Feb 2004 04:56:17 GMT  
 MachineForth disassembly question

Quote:

> One should also remember that most compilers DO NOT take advantage of
> the huge instruction sets provided by CISC processors.]

Worse than that, for anything post-486 it's actually more efficient
to use simple instructions; they can be hardware-decoded, whereas
the more complex ones escape to microcode.

Quote:
> testing phase), the chips produced dissipate more power (just look at
> how much power an Intel or AMD x86 chip sucks and how much heat it
> dissipates; then compare this to various RISC chips, such as ARM (yes,

I know. I'v just watched a friend of mine spend about twice as much
on his new computer than he had intended, because of heat problems.
I've sworn off new x86s because of it.

Quote:
> In conclusion, CISC processors are just not worth it; I'd rather
> spend my money on huge caches rather than on microcode engines and
> ridiculously long pipelines.

Problem is, where do you *get* alternatives? Apples are overpriced,
and they don't make RISC PCs any more. What I want is a cool,
fanless computer for $300, but nobody's making them. :-(
--
lysse at lysse dot co dot uk
"Why are your problems always so much bigger than everyone else's?"
  "Because they're mine."  -- Ally McBeal


Tue, 10 Feb 2004 07:31:50 GMT  
 MachineForth disassembly question

Quote:

>>The 386 instruction set is huge, but it's also mostly useless.
>>I wouldn't be surprised if once all the lard is eliminated
>>there aren't that many more than 32 useful instructions. You
> Given a register machine, capable of true multi-user operation via several
> different schemes, there are more than 32 useful variants of MOV on 386+.

Depends. If you're restricting yourself to the set of instructions
needed to implement a model like the P21, you're down to about the
same number of instructions. If you want to write the best code for
the largest number of post-386 architectures... once again, you're
down to the 1-cycle instructions, which are surprisingly regular.
I made myself up a crib sheet for the 386 instruction set a couple
of days ago, and it's just 2 sides of A4.
--
lysse at lysse dot co dot uk
"Why are your problems always so much bigger than everyone else's?"
  "Because they're mine."  -- Ally McBeal


Tue, 10 Feb 2004 07:31:37 GMT  
 MachineForth disassembly question

Quote:

>>>The 386 instruction set is huge, but it's also mostly useless.
>>>I wouldn't be surprised if once all the lard is eliminated
>>>there aren't that many more than 32 useful instructions. You

>> Given a register machine, capable of true multi-user operation via several
>> different schemes, there are more than 32 useful variants of MOV on 386+.

>Depends. If you're restricting yourself to the set of instructions
>needed to implement a model like the P21, you're down to about the
>same number of instructions. If you want to write the best code for
>the largest number of post-386 architectures... once again, you're
>down to the 1-cycle instructions, which are surprisingly regular.
>I made myself up a crib sheet for the 386 instruction set a couple
>of days ago, and it's just 2 sides of A4.

And you're writing a paged virtual memory multi-user OS? With 3D graphics?

Rick Hohensee
                                                www.clienux.com

Quote:
>--
>lysse at lysse dot co dot uk
>"Why are your problems always so much bigger than everyone else's?"
>  "Because they're mine."  -- Ally McBeal



Tue, 10 Feb 2004 10:12:55 GMT  
 
 [ 33 post ]  Go to page: [1] [2] [3]

 Relevant Pages 

1. MachineForth question

2. ColorForth / MachineForth question

3. Question: understanding disassembly in DVF (long)

4. Win32Forth MMX disassembly

5. virus disassembly

6. flashrom data disassembly (CD-RW)

7. how can i disassembly Winzip !!

8. Intel 80x86 Disassembly

9. disassembly tables

10. Assembling to original from disassembly

11. Disassembly information

12. BIOS disassembly

 

 
Powered by phpBB® Forum Software