Simple syntax? A definition? 
Author Message
 Simple syntax? A definition?

Following the recent thread on perl vs. Icon, and posts about UNIX
*tools* I started thinking on how one can define what are the good
qualities a synatx should have.
It appears to me that what counts is simplicity. A simple syntax would be
one that can easaly be built (NOT parsed) by programs (so that the
language can be used as a tool interface). It would also be easy for
humans (that's us!) to learn and use.

Does anyone have suggestions on how to define this in a more mathematical
or precise way? I thought about YACC clauses/sentence. But I am not sure
it's enough because the YACC approach narrows you down to LALR.

I thought one criterion can be PROXIMITY - that is that semantically
related elements should be near each other.  Any others?




Tue, 04 Feb 1997 02:43:10 GMT  
 Simple syntax? A definition?

Quote:

>Following the recent thread on perl vs. Icon, and posts about UNIX
>*tools* I started thinking on how one can define what are the good
>qualities a synatx should have.
>It appears to me that what counts is simplicity. A simple syntax would be
>one that can easaly be built (NOT parsed) by programs (so that the
>language can be used as a tool interface). It would also be easy for
>humans (that's us!) to learn and use.

 There is a trade off to be considered.
 I am learning perl5 now for 2 reasons:
 1) it is functionally far superior to all the other shells and default
    filter tools
 2) Once mastered, it allows for concise specification of data strucutures
    and procedures.

 The problem is that to achieve the mastery required for 2) , you
 have to become not merely intimate with perl syntax, but *more* than
 intimate!  It's an *nary requirement: you need to learn
 perl-arcanery, perl-chicanery and some other *(star-caneries) !

 However, once one gets there (I hope I can get there before xmas!),
 one can write concise and powerful algorithms.

 If one wants the kind of syntactic simplicity which you are
 describing (and I also appreciate the strong points of that),
 then you want something like 'scheme' , I think.
 But then you have to spit out a lot of car's and cdr's in order to
 get <>  !



Tue, 04 Feb 1997 04:49:19 GMT  
 Simple syntax? A definition?

Quote:


>> Following the recent thread on perl vs. Icon, and posts about UNIX
>> *tools* I started thinking on how one can define what are the good
>> qualities a synatx should have.
>> It appears to me that what counts is simplicity. A simple syntax would be
>> one that can easaly be built (NOT parsed) by programs (so that the
>> language can be used as a tool interface). It would also be easy for
>> humans (that's us!) to learn and use.

>  There is a trade off to be considered.
>  I am learning perl5 now for 2 reasons:
>  1) it is functionally far superior to all the other shells and default
>     filter tools
>  2) Once mastered, it allows for concise specification of data strucutures
>     and procedures.

>  The problem is that to achieve the mastery required for 2) , you
>  have to become not merely intimate with perl syntax, but *more* than
>  intimate!  It's an *nary requirement: you need to learn
>  perl-arcanery, perl-chicanery and some other *(star-caneries) !

I think you should have a look at python, IMHO as expressive if not more
expressive than perl, with out all the *(star-caneries). It has a very nice
syntax, OO (almost) script language that can interface with 'C'.

Python can be found at

    Home:          ftp pub/python from ftp.cwi.nl
    N.America:     ftp pub/plan/python/cwi from gatekeeper.dec.com
    Europe:        ftp pub/unix/languages/python from ftp.fu-berlin.de

You should take a look, there are some documents there which you can pull off
to look at first, I think you will be impressed.

Regards
   -Alun
--
| A.Champion                | > I'm incomunicado.

|                           | > About 3 miles from Cognito. (10)



Tue, 04 Feb 1997 16:50:32 GMT  
 Simple syntax? A definition?

Zvi> Following the recent thread on perl vs. Icon, and posts about
Zvi> UNIX *tools* I started thinking on how one can define what are
Zvi> the good qualities a synatx should have.  It appears to me that
Zvi> what counts is simplicity. A simple syntax would be one that can
Zvi> easaly be built (NOT parsed) by programs (so that the language
Zvi> can be used as a tool interface). It would also be easy for
Zvi> humans (that's us!) to learn and use.

Zvi> Does anyone have suggestions on how to define this in a more
Zvi> mathematical or precise way? I thought about YACC
Zvi> clauses/sentence. But I am not sure it's enough because the YACC
Zvi> approach narrows you down to LALR.

Zvi> I thought one criterion can be PROXIMITY - that is that
Zvi> semantically related elements should be near each other.  Any
Zvi> others?

I've been thinking about this somewhat lately, because my dissertation
work involves adding parallel programming constructs to sequential
languages, and I want to do so in a *clear* way.

The things that I've come up with so far are:

<1> Unambiguous - for a human reader, it helps a *lot* if there's only one
   correct reading of an element.

<2> Orthogonal - different constructs should look different. (closely
  related to <1>. If I look at a construct, I should be able to tell what
  it does with minimal knowledge about context. if x = a is an assignment
  statement, then x=a shouldn't also be a comparison in a different context.)

<3> Minimal - the less "noise" there is in the syntax the better. (ie,
  COBOL is *bad*, because it's very hard to isolate the meaningful
  text from all of the excess garbage.)

<4> Proximity - I agree with you, it's very important that related elements
   be near each other.

<5> Seperation - clauses should be visibly seperated. (This applies to
   things like the different branches of an if statement. Lisp style
   if's can be bad, because it's hard to see where the "then"
   part ends and the "else" part begins.

<6> Legible - using a large number of bizzare symbols makes a program
  much harder to read. (Look at a one-liner in APL, compared to a
  program in J with the primitives aliased to english words. Both
  can provide a beautiful, elegant solution to a problem. But almost
  anyone can read the J program given a 5 minute introduction to the
  language, and the APL program can take an expert time to decode.)

        <MC>



Tue, 04 Feb 1997 21:35:02 GMT  
 Simple syntax? A definition?
A lot of these issues have been covered long ago.  Consider "On the Design
of Programming Languages", Nicklaus Wirth (Information Processing 74, pp.
386-393) or "Hints on Programming Language Design", C.A.R.Hoare (Stanford
University Science Department Technical Report No. CS-73-403, Dec. 1973
avaliable from IEEE).  Both are widely reprinted in various books on
language design (I have both in "Tutorial Programming Language Design",
A.I. Wasserman Ed., IEEE Catalog No. EHO 164-4).  It's rather interesting
that after 20 years people are still arguing the same issues (and coming
to the same conclusions).

A recent trend in functional languages is to design such simplistic syntax
that the whole syntax diagram fits on one printed page (with room for
commentary).  This causes programs written in the language to have a flat,
featureless appearance.  It's rather like trying to navagate in a
featureless desert - even a compass only tells you direction and give no
information about where you are.  The only way you can read such programs
is by carefully reading every definition sequentially.  You can't scan or
jump around because there are no syntactic landmarks to guide your way.
And you must memorize *all* prior definitions or you'll lose your way
entirely.  Very bad designs like this are often proudly displayed by their
designers (on this very newsgroup at times).  Oh well.  Wirth learned this
lesson with EUCLID over thirty years ago - a case of "those who don't
understand history are doomed to repeat it"?

Cheers.



Wed, 05 Feb 1997 02:12:03 GMT  
 Simple syntax? A definition?

Quote:

><3> Minimal - the less "noise" there is in the syntax the better. (ie,
>  COBOL is *bad*, because it's very hard to isolate the meaningful
>  text from all of the excess garbage.)

However, you have to distingusih between "noise" ( which is BAD ) and
"redundancy" ( which is GOOD ). i.e. there is such a thing as TOO
minimal.

A lot of sytle books, for example, recommend not using variable names
that only differ by one character. They can be misread too easily.

Excess verbosity that doesn't add anything is noise.
But a certain amount of redundancy helps these other points:

Quote:

><1> Unambiguous - for a human reader, it helps a *lot* if there's only one
>   correct reading of an element.

><2> Orthogonal - different constructs should look different. (closely
>  related to <1>. If I look at a construct, I should be able to tell what
>  it does with minimal knowledge about context. if x = a is an assignment
>  statement, then x=a shouldn't also be a comparison in a different context.)
><6> Legible - using a large number of bizzare symbols makes a program
>  much harder to read. (Look at a one-liner in APL, compared to a
>  program in J with the primitives aliased to english words. Both
>  can provide a beautiful, elegant solution to a problem. But almost
>  anyone can read the J program given a 5 minute introduction to the
>  language, and the APL program can take an expert time to decode.)

But you can also have bad, misleading redundancy, as when one of the
elements is non functional. For example, the use of both brackets and
indentation to delimit code sections can be misleading: the eye reads
the indentation, while the compiler ignore that and reads only the
brackets. ( But if you use a code-formatter or editor to ensure
that they both agree, then I'm willing to consider that as an essential
part of your programming environment, if NOT the compiler. )

[ comp.lang.perl & comp.lang.icon removed from Newsgroups:
  general discussion to comp.lang.misc only ]


- UVA Department of Molecular Physiology and Biological Physics



Wed, 05 Feb 1997 01:58:13 GMT  
 Simple syntax? A definition?

Let me put in a word for FORTH here. If simplicity is desired, this is
the simplest language I know. Moreover, it really has little in the way
of syntax, so that is simple too.

In FORTH there is only one sort of construct, and that is a subroutine.
It happens to be called a "word" in FORTH's jargon, but a subroutine is
what it is. What I like about this is that everything--operators (like
+, -, *, /, etc.), data structures, functions, programs--are words,
and they all work the same way: you execute a word by naming it.

Communication is generally managed through the data stack: arguments
are pushed on the stack and consumed by a word (or words), leaving the
result. The language is supplied with a collection of primitive words
which can be strung together to define more complex ones.

The system is interactive, so testing of new definitions is immediate,
which greatly simplifies debugging and speeds program development. To get
an idea of what FORTH does, imagine you are sitting before a machine
running FORTH, and type in

        3 5 * <cr>

The machine will respond with "ok". It looks as though you input two
numbers and multiplied them, a la Hewlett-Packard, but where is the
product? Still on the stack. You can display the top item on the
stack (consuming it in the process) by typing in

        . <cr>  15  ok

Here "." is a word that prints the top of the stack to the display.
The 15 and the ok are what you see after the carriage return. You could
of course perform more complicated maneuvers:

        3 4 5 * + . <cr>  15 ok

A little thought tells you what just happened. Now, suppose I felt a
need for a new word that would do the above, which (since there is
no restriction on the characters I can use in names) I will name "*+".
I define it via

        : *+   *  +  ;  <cr>  ok

and test it

        3 4 5 *+  . <cr>  23 ok

I see it works. The only syntactic convention in FORTH is that words are
pieces of text separated by ASCII spaces (32d). So *+ is one word, but
* + is (are?) two words.

What happened when I defined *+ ? The word : takes the text immediately
following it as the new name and creates a dictionary entry with that name.
Then it switches the system into compiling mode (formerly it was in inter-
pret mode). Now as new strings of text are encountered, they are looked
up in the dictionary (where else would you compile words?) and when
found, rather than being executed, are compiled into the body of the
new word being defined. When ; is encountered, something new happens:
FORTH allows a word to be marked, after definition (usually by setting
a bit in its dictionary entry), as IMMEDIATE. Such a word is executed
even when the system is in its compiling mode. The word ; is IMMEDIATE,
and its action is to install the terminating code (involving popping
addresses from stacks, etc.) for the new word, and then switching back
to interpret mode.

This enormous simplicity of the compiling mechanism permits two things:
first, the compiler and its components are part of the language and are
available to the programmer for his own uses; and second, the compiler
is much smaller than in any other language. Complete FORTHs have been
written that fit easily into 8K of memory. (To be sure, I use a fairly
sophisticated FORTH with many bells and whistles for doing numerical
work, so its code image is about 120K. OTOH, my FORmula TRANslator that
I wrote to give a more fortranish interface compiles to 7K including a
1K scratch buffer--and it permits mixed-type, mixed precision arithmetic
just like "real" FORTRAN.)

You expressed a certain antipathy to "noise" words. Unfortunately, the
primitives in FORTH really are primitive. Hence you might have to include
words for fetching and storing data, as you define the solution to your
particular programming problem. However, the end result tends to
be pretty English-like (well, Germanic--verbs tend to come at the end),
as with

        : }}solve   initialize  triangularize  backsolve  ;

which is an in-place linear equations solver used as

        A{{ b{ }}solve

Here A{{ is the name of the (square) matrix, b{ that of the (vector)
inhomogeneous term, and the solution vector is written over b{ .

I do not think you will find much trouble following the algorithmic
statement of }}solve -- it almost needs no documentation. On the other
hand, the primitive word } , used to compute matrix element addresses
as in b{ I } (leaves the address of the i'th element of b{ on the stack)
involves more primitive operations such as fetches and stores, i.e.
has more "noise".

I hope this gives you some feeling for the simplicity of FORTH. The
image of a 1.2 Meg diskette we were giving away last year to ACM student
members can be found on ftp.cygnus.com /pub/forth as primer.zip . It
has 2 public domain FORTHs, one very simple, the other quite sophisti-
cated, that you can have some fun with.

Ciao.
--
Julian V. Noble



Wed, 05 Feb 1997 04:54:31 GMT  
 Simple syntax? A definition?

Dang, I AM getting old or losing my marbles or whatever. I meant to say

        3 4 5 * + . <cr> 23 ok,

NOT

        3 4 5 * + . <cr> 15 ok    :-(

Mental languish, I guess...
--
Julian V. Noble



Thu, 06 Feb 1997 01:36:48 GMT  
 Simple syntax? A definition?

Quote:

> It appears to me that what counts is simplicity.

Agreed.

Quote:
>                                             A simple syntax would be
> one that can easaly be built (NOT parsed) by programs

                               ^^^^^^^^^^^^
No.  Any language is easily emitted by programs.  A simple syntax must
also be easily parsed.

Quote:
>                                            It would also be easy for
> humans (that's us!) to learn and use.

Right.  And that means excluding things that add to the learning curve
but do nothing for the expressive power of the language: like precedence,
and (less importantly) multiple, independent global name spaces.

The language should also be easy to read; that is, easy for a human to
parse, as well as for a machine.

Quote:
> Does anyone have suggestions on how to define this in a more mathematical
> or precise way?

For readability, it should be top-down recursive-descent (LL) parsable.

Quote:
>   I thought about YACC clauses/sentence. But I am not sure
> it's enough because the YACC approach narrows you down to LALR.

The problem is not that YACC parsability limits what you can parse, but,
rather, that it does not limit it ENOUGH!  A simple language is not just
LR or LALR parsable, it is also LL parsable.  Otherwise, the reader of
the language has to "look ahead" and read language tokens without/before
knowing what is being specified.

Out of consideration for the *humans* that have to read and write the
languages, no language should be designed that is not LL (top-down)
parsable.





Fri, 07 Feb 1997 20:08:36 GMT  
 Simple syntax? A definition?

Quote:
(David Burton) writes:
>>                                            It would also be easy for
>> humans (that's us!) to learn and use.

> Right.  And that means excluding things that add to the learning curve
> but do nothing for the expressive power of the language: like
precedence,
> and (less importantly) multiple, independent global name spaces.

Actually precedence is an important concession *toward* human
understanding.  I strongly dislike language in which addition and
multiplication have the same precedence.  It forces me to write *much*
less legible numerical expressions.

That being said, I would like to point out that precedence rules for
anything outside of numerical operators, relational operators, and logical
operators are undesirable.  Not because of precedence itself but because
there are no natural conventions to emulate.  

As for the global name space: I agree there should only be one.  It should
be reserved for the names of `modules' or `packages' and should have no
single elements in it at all.

Quote:
> The language should also be easy to read; that is, easy for a human to
> parse, as well as for a machine.

This is a much more important criterion than simplifying the machines
work.  The machine (or language or system) should be tailored to the needs
of people - not the other way around.

Quote:
> Out of consideration for the *humans* that have to read and write the
> languages, no language should be designed that is not LL (top-down)
> parsable.

Yes and no.  Humans are very good at performing small-scale (very local)
look-ahead, just as they are for small-scale look-back.   Witness the
number of natural languages in which the meaning of a sentence is deferred
(sometimes to the very end)*.  It's large-scale context that gives humans
problems.  Just being LL-parsable doesn't prevent this because it allows
things to depend on a *lot* of left-context.

Language design is an exercise in compromise.  There is no automatic tool
or mathematical property which will generate a successful design.

Cheers.

*-English is one of these languages.  The pronunciation of a printed word
is nearly always dependent upon the whole word - sounding it out from the
beginning is only a first cut (consider the silent -e).  The meaning of
English noun phrases also have this property.  Etc....



Sat, 08 Feb 1997 08:29:02 GMT  
 Simple syntax? A definition?

Quote:

>(David Burton) writes:
>>>                                            It would also be easy for
>>> humans (that's us!) to learn and use.

>> Right.  And that means excluding things that add to the learning curve
>> but do nothing for the expressive power of the language: like
>precedence,
>> and (less importantly) multiple, independent global name spaces.

>Actually precedence is an important concession *toward* human
>understanding.  I strongly dislike language in which addition and
>multiplication have the same precedence.  It forces me to write *much*
>less legible numerical expressions.

>That being said, I would like to point out that precedence rules for
>anything outside of numerical operators, relational operators, and logical
>operators are undesirable.  Not because of precedence itself but because
>there are no natural conventions to emulate.  

Actually, PL/I and APL taught us a valuable lesson: humans (even
experts) can't remember more than 2-3 levels of precedence.  A
substantial fraction of PL/I errors could be tracked back to
precedence problems (C programmers continue to make many of the same
errors), and APL therefore threw in the towel completely.

Quote:
>As for the global name space: I agree there should only be one.  It should
>be reserved for the names of `modules' or `packages' and should have no
>single elements in it at all.

I think you may be mixing two different things: syntax and namespaces.
Namespaces definitely add complexity to a language, but I wouldn't classify
this complexity as syntactic.

This having been said, your wish for a global name space is hopelessly
naive.  _Every_ language which has been moderately successful, has been
forced to use some kind of scoping to keep names from colliding.  You
can either do it clumsily, as in file-scoped names in Fortran and C, or you
can do it elegantly, as in Algol, Pascal, (Common) Lisp, etc.

Quote:
>> The language should also be easy to read; that is, easy for a human to
>> parse, as well as for a machine.

>This is a much more important criterion than simplifying the machines
>work.  The machine (or language or system) should be tailored to the needs
>of people - not the other way around.

Any system with > 50,000 lines of code will require some sort of
browser to keep the SW people sane.  Once you have a good browser, the
necessity for human readability goes down considerably, since the
browser is in a position to provide all kinds of aids (e.g.,
cross-referencing) which syntax could never handle, no matter how
baroque.  Screwing up the syntax of a language for the sake of passive
publication-style programs is penny-wise and pound-foolish.  If you
prefer to read programs printed out on passive paper, your days as a
professional programmer are going to be very short.

Quote:
>Language design is an exercise in compromise.  There is no automatic tool
>or mathematical property which will generate a successful design.

This may be true, but it is also true that language design which
ignores automatic tools and mathematical properties will be expensive
to implement, expensive to maintain, expensive to teach, impossible to
extend, and a dead end.  To the extent that it achieves these
inefficiencies, it will also be wildly successful, because it will
offer jobs to a lot of people who delight in inanities. ;-)


Sat, 08 Feb 1997 23:27:50 GMT  
 Simple syntax? A definition?

Quote:
Baker) writes:
>>>This having been said, your wish for a global name space is hopelessly

naive.  _Every_ language which has been moderately successful, has been
forced to use some kind of scoping to keep names from colliding.  You
can either do it clumsily, as in file-scoped names in Fortran and C, or
you
can do it elegantly, as in Algol, Pascal, (Common) Lisp, etc.<<<

Where it gets confusing is in something like C where typedefs have a
separate name space.  (Or is it structs that have a separate name space?
And what is the difference anyway?  Another thing that can be confusing is
when you have several concepts that are SIMILAR but not quite identical.)



Sun, 09 Feb 1997 11:59:07 GMT  
 Simple syntax? A definition?

Quote:

>Baker) writes:
>> This having been said, your wish for a global name space is hopelessly
>> naive.  _Every_ language which has been moderately successful, has been
>> forced to use some kind of scoping to keep names from colliding.  You
>> can either do it clumsily, as in file-scoped names in Fortran and C, or
>you
>> can do it elegantly, as in Algol, Pascal, (Common) Lisp, etc.

>First of all, Fortran doesn't have file scoping, only C does that.  In
>Fortran 77 (and before) the only global names were procedure names and
>common block names. In Fortran 90, Ada, and more modern languages, the
>only global names are procedure names, package (or module) names, and
>common block names (Fortran only, of course).  This is now most widely

                                                ^^^^^^^^^^^^^^^^^^^^^^^
Quote:
>held to be superior to the nesting of scopes that Algol, Pascal, etc. made

 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Quote:
>use of.  It's easier to write, read, debug, and maintain programs when the
>different components are separate and individually testable packages
>rather than imbedded in a nested structure where they can't be easily
>considered separately.  Smart programmers of the Algol heritage made sure
>their interfaces were clean and not imbedded anyway and can move to the
>more modern paradigm fairly easily.

I beg to differ regarding the 'superiority' of packages/modules v.
arbitrarily-nested lexical scoping.  They are either subsumed by lexical
scoping w/ closures, or are orthogonal to lexical scoping.

First of all, the type of lexical closures created by 'packages/modules'
_can_ be created by lexical scoping, if the language offers proper
closures.  For example, in Scheme one can 'define' a function at other
than the top-most lexical level (although the function name itself _is_
at the top-most lexical level) and the function will inherit from the
'package' of variables & functions created at that subsidiary level.
If one defines a number of functions at this level, this is completely
equivalent to a package in Ada.

(Scheme's way of doing this is actually substantially superior to
the usual package/module mechanism, since during the creation of the
closure, one can perform an arbitrary computation, instead of the
highly constrained 'initialization' capabilities usually provided
for packages/modules.  Roylance's paper on expressing mathematical
subroutines constructively in the Lisp Conference '88 (??) shows the
elegance of this approach.)

Secondly, even in languages which do go to the trouble of providing a
distinct package/module capability, it is still extremely useful to
have arbitrary lexical scoping within and without these
packages/modules.  (Ada9X has actually gone to the trouble of
introducing 'child units' (I don't recall the exact terminology) which
provide for scoped access to packages themselves!)

I can show you some examples of the advantages of nesting in Ada if
you wish.  These are covered in the following papers:

"Object-Oriented Programming in Ada83--Genericity Rehabilitated".  ACM
Ada Letters XI,9 (Nov/Dec 1991), 116-127.

"Structured Programming with Limited Private Types in Ada: Nesting is
for the Soaring Eagles".  ACM Ada Letters XI,5 (Jul/Aug 1991), 79-90.



Mon, 10 Feb 1997 00:29:20 GMT  
 
 [ 15 post ] 

 Relevant Pages 

1. Suggested modifications to syntax definition in OOSC

2. Current language definition (Algebraic syntax)

3. pl1 syntax definition

4. Syntax - Definition for C modules

5. Seeking extend-syntax definition for record-case construct

6. Syntax definition in conditional

7. Syntax question regarding matrix definition

8. where can I find abstract syntax definition of Fortran

9. Does Prolog have an unambiguous syntax definition?

10. Local definitions without excessive syntax ?

11. New definition for Simple Macro

12. simple definitions needed

 

 
Powered by phpBB® Forum Software