Abbreviations in Forth 
Author Message
 Abbreviations in Forth

Rick Hohensee  po box 11340 Wash., D.C. 20008   USA

It doesn't look like I'm going to be able to attend the Rochester
Conference, so I will attempt to add the following essay to the discussion
of the future of Forth electronically.

Doubly-Linked Dictionary for Interpreting Abbreviations

Richard Allen Hohensee
June 1993

ABSTRACT: A doubly linked dictionary combined with an alternate word search
algorithm for tokens ending in a period allows succinct implementation of
interpretation of natural-language-like abbreviations of already-defined
words, in which earlier definitions will have the shortest valid
abbreviations. This facility can ameliorate the tradeoff between terseness
and clarity.

OVERVIEW
I am working on a Forth-like language I call `bana'. The feature of bana that
is least platform-specific, most clearly concieved so far, most easily
presented apart from the rest of bana and so on is the subject of this essay.

In the design of computer languages  there is a tendency to make frequently
used keywords short, ie shorter than they would be spelled as they are
typically pronounced. Thus, the most important words tend to be the most
cryptic. Forth is not immune to this malady. A succinct method of
abbreviating words in an instruction set, lexicon, or vocabulary, that acted
predictably over a large set of words/symbols/names, could alleviate the
tradeoff between the terseness that `one who knows' prefers and the clarity
that `one who needs to know' prefers.

The method presented here interprets tokens ending with a period as some
existing word that does not, e.g. [ qwe. ] might be interpreted as `qwe' or
`qwert' or `qwertyu' , but would not be interpreted as `zaqwe'. This method
also results in a particular abbreviation being interpreted as the oldest
word it is a legitimate abbreviation for. If the interpreter encounters
[ qwe. ] and the dictionary contains `qwerty' and `qwez', the word executed
will be the one that was defined first. This is why a doubly linked
dictionary is proposed; so that words chronologically and semantically close
to the kernel will have very short valid abbreviations.

IMPEMENTATION
This method requires three things that are not typical of a Forth; a
doubly-linked dictionary, an extra algorithm for interpreting tokens, and a
slight complication of the fundamental Forth syntax. The forth syntax is
often stated as `words are strings of charachters separated by spaces'. The
syntax of bana appends an `if' to the forth syntax, i.e.
  `Words are charachter strings separated by spaces. If a word ends in a
   period it is interpreted to be an abbreviation of an existing word.'
A word that contains a period but doesn't end with one is not an
abbreviation. Note that this is largely a superset of the forth syntax, and
could be ignored by a user who prefers not to deal with it. I'd also like to
mention in passing that in my work on bana I have developed the habit of
refering to bana `words' as `werds'. I find this helpful in distinguishing
between werds and words.

Our syntactic add-on now requires it's own dictionary search algorithm. When
the interpreter gets an abbreviation it is immediately passed to the
abbreviation search algorithm, without checking to see if it is a number.
Lets call our abbreviation search algorithm `suffind'. Suffind starts at the
oldest dictionary entry and checks for a match between the token in question
and the names in the dictionary. The matching routine begins with the first
letter of the sought token and continues until there is a mismatch between
letters, in which case the next name field is checked, or until the period in
the token is reached, in which case an abbreviation  match has been found. If
no match is found, well, no match was found.

Suffind, to work as described, requires dictionary links from oldest
definitions to newest, i.e. in the opposite sense from standard Forth
practice. Other features that become possible with the presence of double
linking may further justify the space consumption. Doing all word searches
from oldest to newest might speed compilation, for example.

EXAMPLE
Let's consider how this feature would effect one's interaction with a system
that was a fairly standard forth, but with the abbreviation feature added. In
this example we are in the interpreter. We type in....


Weeks pass.`newguy' proves useful and stays in the dictionary. We need to
refresh our memory about newguy. The forth in question has a `source' word.
We type in...

so. newg.

and the system responds with...

newguy     / is a colon definition

where vprecision and *pytes are existing application words.

SUPPORT WORDS
A useful utility word for a system like this could be named `abbreviate'.
`abbreviate' takes the word following itself and checks for a period at the
end. If abbreviate finds a period, i.e. if the next word is an abbreviation,
abbreviate prints out the full name of the word that will be asserted by the
abbreviation under the current vocabulary search order. If the word following
abbreviate is not an abbreviation, abbreviate prints out the shortest
abbreviation that will assert that word. Two examples...

ok
ab. v.
vocabulary ok

ab. vlist
vl. ok

MESSY DETAILS
What about [ . ] , the Forth word for print stack item? What about `u.' and
`f.' and other common words that end with a period? Well, this feature is
part of a project that I consider distinct enough from Forth to be considered
another language, albeit an offspring of Forth. This feature could be added
to a Forth without wholesale changes, but some words would have to be
renamed. I would disallow word names ending with a period. Note that if you
had to change `.' to `print' the abbreviation would probably be `p.' .

SOME ASSERTIONS
The ideal that this feature is aimed at is code that is entirely
self-documenting. Reducing the inconvenience of large word names goes a long
way in this direction. If the system includes an unthreader such as `def'
or `source' that produces recompilable output, and if that output can be
redirected to non-volatile storage, then the production of `source code'
becomes a mere adjunct to interactive experimentation with the system. Words
could be fully tested, and then the `source file' generated with a few words.
You would never have to exit the interactive forth environment to an editor.

The signal always expands to fill the bandwidth. HesForth for the Commodore
64 came with about 400 words. My old version of JForth for my old Amiga came
with a 1400 word `standard image' and about 800k of potential includes. In
this kind of a world it seems that any Forth user is doomed to oscillate
continuously between being `one who knows' and `one who needs to know'.
Systematic abbreviations let the user oscillate between terseness and
verboseness accordingly.

More natural names for frequently used words also make a language easier to
learn initially. This could help expand the user base for threaded languages,
and help to demonstrate their inherent superiority.

Rick Hohensee  po box 11340 Wash., D.C. 20008   USA

--



Sun, 03 Dec 1995 03:17:42 GMT  
 Abbreviations in Forth

[...]
: Doubly-Linked Dictionary for Interpreting Abbreviations
[...]
: IMPEMENTATION
: This method requires three things that are not typical of a Forth; a
: doubly-linked dictionary, an extra algorithm for interpreting tokens, and a
: slight complication of the fundamental Forth syntax. [...]
:   `Words are charachter strings separated by spaces. If a word ends in a
:    period it is interpreted to be an abbreviation of an existing word.'
[...]
: Suffind, to work as described, requires dictionary links from oldest
: definitions to newest, i.e. in the opposite sense from standard Forth
: practice. Other features that become possible with the presence of double
: linking may further justify the space consumption. Doing all word searches
: from oldest to newest might speed compilation, for example.
:
For lazyness, this is interesting.
I would make this feature switchable on/off.
However, I would not require to have abbreviations indicated by a dot.
All Forth standards including dpANS state that the newest definition with
the same name *must* be found.

I would do: Search as normal finding the latest exact matching, if any, else
find (possible the newest version) of a word name starting with what is given
and if it is not a unique abbreviation, complain ( e.g. ABORT" ).
For example (not in Forth, but doesn't matter), here on VAX/VMS I get:
$ a
%DCL-W-ABVERB, ambiguous command verb - supply more characters
 \A\

I might get trouble defining 1234 CONSTANT 1234 and looking for 123 ...

BTW, having a single-linked list or hash-table or whatever is implementation
dependend anyway, so you can't *require* to have double-linked lists.

But it should be possible to do this with a single scan of a single-linked
list form newest to oldest, while storing info about *one* possible abbreviation
and incrementing a counter until end-of-list or exact match is found.
If, at end-of-list, the counter is zero, check for number as usual.
If, at end-of-list, the counter is one, make shure it isn't digits only (can't
abbreviate numbers, see above), else you got it.
If, at end-of-list, the counter is greater one, and if not a number, request
the user to supply more characters...




Tue, 05 Dec 1995 20:00:34 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. Forth Interpreter Abbreviation Support Sourcecode

2. International abbreviations

3. Abbreviations redux

4. What's the abbreviation DPMI ?

5. The Abbreviation TS

6. Abbreviations

7. LOGO-L> Abbreviation for REPEAT

8. LOGO-L> The Abbreviation TS

9. Abbreviations for this newsgroup

10. minimum abbreviation?

11. Commnad abbreviation/completion in "unknown"

12. proposal for change in 4.0: option to disable flag abbreviation

 

 
Powered by phpBB® Forum Software