Properties of CL symbols (probably an RTFM question) 
Author Message
 Properties of CL symbols (probably an RTFM question)

I was asked by someone if it's OK to put properties on symbols in CL.
Both the asker and I thought that it wouldn't be.  But I can't find
such a restriction.  If there was one I'd have expected to find it in
11.1.2.1.2, but it's not there.  So maybe it's OK to do this, but I
can't find a statement that it is, either.  I'm not sure if is an
omission, if it is OK or if I need to RTFM (RTFHS?) better.  Probably
the latter.

Intuitively, it seems to me that it should be OK to add properties
that the user controls to CL symbols - (setf (get 'cons 'mypack:foo)
1) for instance.  But (setf (get 'cons 'type) 1) might be a bit
dubious, and (setf (symbol-plist 'cons) '()) strikes me as positively
foolhardy.  So I'm, slightly nervously, thinking about going with a
restriction to non-CL property names.

Can anyone clarify this, or point out what I should have read?

Thanks

--tim



Fri, 12 Dec 2003 20:31:08 GMT  
 Properties of CL symbols (probably an RTFM question)

Quote:
>Intuitively, it seems to me that it should be OK to add properties
>that the user controls to CL symbols - (setf (get 'cons 'mypack:foo)
>1) for instance.  But (setf (get 'cons 'type) 1) might be a bit
>dubious, and (setf (symbol-plist 'cons) '()) strikes me as positively
>foolhardy.  So I'm, slightly nervously, thinking about going with a
>restriction to non-CL property names.

>Can anyone clarify this, or point out what I should have read?

As long as either the symbol you're adding the property to or the symbol
being used as the property name are in your own package, you should be
fine.  The point of most of the restrictions on what you can do with CL
symbols is to prevent conflicts between unrelated packages, and using a
property name in your own package should achieve that.

So even though the spec doesn't specifically prohibit (setf (get 'cons
'type) 1), don't do it.

--

Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.



Fri, 12 Dec 2003 22:55:54 GMT  
 Properties of CL symbols (probably an RTFM question)

Not at all an RTFM question.  Quite the contrary.  I think it was McCarthy
at one of the very early X3J13 meetings who asked a very intresting
question about our charter
  http://www.*-*-*.com/ ~pitman/CL/x3j13-86-020.html
What is it to "establish normative Common Lisp programming practice"?
It sounded good when we wrote it in, but how were we going to accomplish
it.  And it's still the part that I think we occasionally fell short on.

As I recall, one example that was given at the time was that it might
mean "don't add operators to the language that appear to be the thing
to use, but that are really traps".  APPEND was discussed, and the fact
that almost all uses of it were an efficiency trap, for example.  I think
there were not good answers; no one wanted to leave out APPEND at that
time, nor did they want to document it as inefficent or unusable per se.
It was left for the teaching texts how to resolve this.  But it isn't the
only operator that suffers from the need for careful explanation about
how to use it.  Maybe the answer is that this didn't belong in our charter.

Quote:
> >Intuitively, it seems to me that it should be OK to add properties
> >that the user controls to CL symbols - (setf (get 'cons 'mypack:foo)
> >1) for instance.  But (setf (get 'cons 'type) 1) might be a bit
> >dubious, and (setf (symbol-plist 'cons) '()) strikes me as positively
> >foolhardy.  So I'm, slightly nervously, thinking about going with a
> >restriction to non-CL property names.

> >Can anyone clarify this, or point out what I should have read?

> As long as either the symbol you're adding the property to or the symbol
> being used as the property name are in your own package, you should be
> fine.  The point of most of the restrictions on what you can do with CL
> symbols is to prevent conflicts between unrelated packages, and using a
> property name in your own package should achieve that.

> So even though the spec doesn't specifically prohibit (setf (get 'cons
> 'type) 1), don't do it.

Nothing Barry says is linguistically false, and I've dispensed this same
advice over time, but I'd go a different path here...

The following two are about computationally equivalent:

 (defun foo (x) (get x 'foo))

or

 (defvar *foo-table* (make-hash-table))
 (defun foo (x) (gethash x *foo-table*))

The latter, modulo the use of specials (where they'd use a closed lexical)
is more like what you'd do in Scheme, and I tend to use the Schemish
solution these days so as not to clutter the property list AND to add a
slight degree of extra data privacy (not the least of which is keeping
some idiot from using SETF of SYMBOL-PLIST to{*filter*}things.  You should
definitely NEVER EVER using SETF of SYMBOL-PLIST  to do anything other
than *augment* a symbol.  

[Well, I used to once-in-a-while, in single-processing lisps, use
SETF of SYMBOL-PLIST to implement things like fast SUBLIS by pushing and
popping the plist with gensymed properties to make sure the property was
at the head or not there at all.  The problem is that you have to lock out
interrupts in a multiprocessing Lisp, or at least synchronize the use of
SETF of GET/SYMBOL-PLIST/etc, and you can't.  So that makes applications
which try to play with the plist order less useful in modern lisps.  It'
obscure trivia now, so I'll leave it at that, but Ask if I haven't made
that clear and you actually care.]

The reason I tend to go for hash tables, though, is (ironically) that they
are a little slower than the plist.  The plist is directly attached to a
symbol and provides really quite a fast way to get to certain data.

Note, though, that the plist is not threadsafe.  Well, neither is a hash
table for that matter, but it's much more common that a plist would be used
cross-process than a hash table would, since the plist serves multiple masters.
Usually you know when your hash table will need to be threadsafe, but knowing
whether the plist is threadsafe is trickier.  So you're always at some
risk of lossage.  Some people say don't do updates  of plists EXCEPT at
load time and don't do LOAD except when a multiprocessing  system is
quiescent.  I heard such lore for the Lisp Machine, and it probably related
to this.

Anyway, to go on, I tend to try to save properties in reserve for
things I need to super-optimize, because the number of instructions to
execute (get 'foo 'bar) can be really tiny compared to executing a hash
key and searching a hash table, IF the plist is short.  If you've used lots
of properties, you may have screwed your ability to use this as a last
ditch speed bum against a hash table.  (Also, of course, GET will not take
the occasional irksome speed hit that GETHASH will due to GC cyclage, and
sometimes one needs to work past that.)

[Ask if you don't know about the "ironic" speed hit GETHASH can take.
 Everyone should know about that, but I'm too lazy to mention it now.
 It's due to a mis-design in MAKE-HASH-TABLE, which should take a
 :OPTIMIZE-PERFORMANCE :REALTIME or :OPTIMIZE-PERFORMANCE :TOTAL-TIME,
 but since it doesn't, it tries to guess [usually total-time] and screws
 real-time applications.]

Bottom line: Use GETHASH unless your application is real-time or you need
 a last-minute speed bum.  Do not use GET on symbols otherwise.

So goes my personal style rule.

Your mileage may vary, as with all style rules.



Sat, 13 Dec 2003 00:49:26 GMT  
 Properties of CL symbols (probably an RTFM question)

Kent> [Ask if you don't know about the "ironic" speed hit GETHASH can take.
Kent>  Everyone should know about that, but I'm too lazy to mention it now.
Kent>  It's due to a mis-design in MAKE-HASH-TABLE, which should take a
Kent>  :OPTIMIZE-PERFORMANCE :REALTIME or :OPTIMIZE-PERFORMANCE :TOTAL-TIME,
Kent>  but since it doesn't, it tries to guess [usually total-time] and screws
Kent>  real-time applications.]

I would like to hear more about this.

I am working on an application that uses quite a lot of hashtables,
and allthough this is neither hard realtime nor in the stage where we
need to really squeeze it for speed, we are a bit concerned whether we
are in fact doing something silly.

------------------------+-----------------------------------------------------
Christian Lynbech       | Ericsson Telebit, Skanderborgvej 232, DK-8260 Viby J

Fax:   +45 8938 5101    | web:   www.ericsson.com
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.



Sat, 13 Dec 2003 18:20:41 GMT  
 Properties of CL symbols (probably an RTFM question)

Quote:
> [Interesting things]

I think I agree with everything you say, although my personal
tradeoffs might be slightly different I think (maybe not).

At the level of `must' rather than `ought' though, I was slightly
surprised that there isn't a requirement in the standard not to, say,
do (setf (symbol-plist 'car) nil).  In at least one implementation
something like this is catastrophic.  I guess the problem is
describing what you are and aren't allowed to do in such a way that
it's not restrictive but still useful - for instance you don't want to
restrict changing properties that are in CL, you also want to prevent
ones that are in implementation-internal packages. Or maybe it's just
because that's such an obviously stupid thing to do...

The issue with hashtables that you mentioned: is this that there's no
way of asking the system `I want operations on hashtables to take
bounded time'? So, for instance, it might elect not to do expensive
rehashes or representation changes which mean that, say, a particular
GETHASH will take a long time, even though it thinks that will make
the amortised cost lower.

--tim



Sat, 13 Dec 2003 18:48:33 GMT  
 Properties of CL symbols (probably an RTFM question)

Quote:


> Kent> [Ask if you don't know about the "ironic" speed hit GETHASH can take.
> Kent>  Everyone should know about that, but I'm too lazy to mention it now.
> Kent>  It's due to a mis-design in MAKE-HASH-TABLE, which should take a
> Kent>  :OPTIMIZE-PERFORMANCE :REALTIME or :OPTIMIZE-PERFORMANCE :TOTAL-TIME,
> Kent>  but since it doesn't, it tries to guess [usually total-time] and screws
> Kent>  real-time applications.]

> I would like to hear more about this.

> I am working on an application that uses quite a lot of hashtables,
> and allthough this is neither hard realtime nor in the stage where we
> need to really squeeze it for speed, we are a bit concerned whether we
> are in fact doing something silly.

EQ hash tables have to be "rehashed" on every gc when a relocating gc
is used, since the pointers might have moved.  However, often such a
table is not used between one gc cycle and the next, so many
implementations do you the "favor" of not rehashing.  This definitely
optimizes "total time" since the time to rehash is simply avoided (the
hash table is marked "dirty" and the hash table is rehashed instead on
demand the next time a gethash is done, which might be several gc
cycles later).  HOWEVER, it ignores the fact that most people use
GETHASH thinking they are getting more-or-less O(1) fast lookup time,
and a rehash of a table is not necessarily an O(1) "feel"... It's
*possible* that the gc would have occurred while the machine (or at
least the Lisp process) was idle and that rehash at that time, while
increasing "total time", would not have compromised realtime performance.
IMO, and this is another one of those stupid banners that I seem to be the
only one ever to wave, the Lisp system should CEASE AND DESIST making this
"helpful" decision for users and vendors should offer a keyword
option at MAKE-HASH-TABLE time which allows you to either select or avoid
this "optimization" in a way that is right for the application.  
Especially if you're doing things like writing CLIM interfaces, where the
CLIM interface is often already doing a ton of stuff and running slowly,
there's nothing like finding you clicked on a pull-down menu and you
went to access something in a not-recently-used hash-table only to find
it says "before I pull down this menu, how about I rehash this hash table".
Bleah...  I think right now just about all implementations unconditionally
assume you want to optimize "total time" and don't really understand the
"realtime response" issue.


Sat, 13 Dec 2003 23:16:57 GMT  
 Properties of CL symbols (probably an RTFM question)

Quote:


> > Kent> [Ask if you don't know about the "ironic" speed hit GETHASH
> > Kent> can take. ...]

> > I would like to hear more about this.

> EQ hash tables have to be "rehashed" on every gc when a relocating gc
> is used, since the pointers might have moved.  However, often such a
> table is not used between one gc cycle and the next, so many
> implementations do you the "favor" of not rehashing.  This definitely
> optimizes "total time" since the time to rehash is simply avoided (the
> hash table is marked "dirty" and the hash table is rehashed instead on
> demand the next time a gethash is done, which might be several gc
> cycles later).  HOWEVER, it ignores the fact that most people use
> GETHASH thinking they are getting more-or-less O(1) fast lookup time,
> and a rehash of a table is not necessarily an O(1) "feel"... It's
> *possible* that the gc would have occurred while the machine (or at
> least the Lisp process) was idle and that rehash at that time, while
> increasing "total time", would not have compromised realtime performance.
> IMO, and this is another one of those stupid banners that I seem to be the
> only one ever to wave, the Lisp system should CEASE AND DESIST making this
> "helpful" decision for users and vendors should offer a keyword
> option at MAKE-HASH-TABLE time which allows you to either select or avoid
> this "optimization" in a way that is right for the application.  
> Especially if you're doing things like writing CLIM interfaces, where the
> CLIM interface is often already doing a ton of stuff and running slowly,
> there's nothing like finding you clicked on a pull-down menu and you
> went to access something in a not-recently-used hash-table only to find
> it says "before I pull down this menu, how about I rehash this hash table".
> Bleah...  I think right now just about all implementations unconditionally
> assume you want to optimize "total time" and don't really understand the
> "realtime response" issue.

Btw, I wanted to add a couple of remarks.

It's not that I'm not sensitive to the fact that sometimes a gc could happen
in a place that disturbs realtime  response, so it's not like some
implementations couldn't ignore my request to reshash at a different time.

One way to handle this would be for the "idle process" (or the
last-ditch-before-idle process :-) to do rehashing rather than nothing
on tables known to be needed for realtime response.  Another way to handle
it would be to have a (with-realtime ...) that hinted to the system that
this would be a bad time to be trading total time for realtime response.

Mostly what I'm arguing for is not countermanding careful metering of
a given implementation, but for the language offering enough information to
know what's in the programmer's head.  Certain gethash's need to be
especially fast.  Part of the problem may be using hash-tables at all.
We should have just called them tables and let the system pick the
implementation.  It might be that linear search could be faster and this
might queue the system to know that...  For very small tables, most everyone
knows an alist is probably faster anyway, but it's per-implementation
where the dividing line is, so having a generic table type would hide this
in the implementation's domain...

Just thinking out loud, but I figured if I was going to say half of the stuff,
I should say the other half.

What really bugs me about make-hash-table/gethash as they are now is that
the system makes all its decisions on the basis of no cues from the user,
when the user probably has cues to offer but has no gesture available
for offering them.



Sat, 13 Dec 2003 23:34:01 GMT  
 Properties of CL symbols (probably an RTFM question)

Quote:
>At the level of `must' rather than `ought' though, I was slightly
>surprised that there isn't a requirement in the standard not to, say,
>do (setf (symbol-plist 'car) nil).  In at least one implementation
>something like this is catastrophic.

The cleanup issue LISP-SYMBOL-REDEFINITION says: At the March 89 X3j13
meeting, a proposed additional constraint ("Altering its property list")
was removed. Presumably this means that conformal programs are allowed to
alter the property list of symbols in the COMMON-LISP package.

This means that implementations are not supposed to put anything they
depend on in the property list.

--

Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.



Sat, 13 Dec 2003 23:41:54 GMT  
 Properties of CL symbols (probably an RTFM question)

Quote:

> The cleanup issue LISP-SYMBOL-REDEFINITION says: At the March 89 X3j13
> meeting, a proposed additional constraint ("Altering its property list")
> was removed. Presumably this means that conformal programs are allowed to
> alter the property list of symbols in the COMMON-LISP package.

> This means that implementations are not supposed to put anything they
> depend on in the property list.

I believe I can remember something about the reasoning at that meeting,
or perhaps some later meeting where the plist issue came up:  There was
sentiment that the standard _should_ codify proper implementation and
application treatment of symbol property lists, but those present at the
time realized that the issues, while not extremely difficult, were
difficult enough that we were unlikely to get it right the first time
around, and that we were supposed to be finishing up the standard at
whichever meeting it was that I'm remembering.  This is fom someone who
has difficulty remembering what he ate for breakfast yesterday.

I think the central point is fairly clear, that well-behaved program
modules/threads shouldn't mess with the symbol-plist of symbols that
other modules might be using, other than by using cl:set and setf
thereof.  All property indicators used should be symbols in some package
under the control of the particular module.

These rules permit even a multiprocessing implementation to provide
applications with safe plist usage.  Probably most people would agree
with the intent of this specification.

But there are some fine points:  What about a program that uses symbol
property lists for some kind of temporary storage?  That program might
be able to guarantee that all the symbols it uses have no possibility of
use by any other program -- gensyms and private packages are your friends
--  and should not such a program be allowed to bash or pop symbol-plist
when it wants to flush accumulated properties?  Although this style of
Lisp programming feels about 35 years old, I would not want to prohibit
it.

So, if anyone thinks he can draft a workable specification for use of
symbol-plists, write them down and we can get them in the next version
of the ANS.



Tue, 16 Dec 2003 11:04:30 GMT  
 Properties of CL symbols (probably an RTFM question)

Quote:

> EQ hash tables have to be "rehashed" on every gc when a relocating gc
> is used, since the pointers might have moved.

I don't mean to pick an argument here, but I want to point out that
hash table implementation is more subtle than realized in 103% of the
messages posted about the subject to comp.lang.lisp.

It is in general true that an EQ (or EQL and maybe even EQUAL/P,
depending on the implementation) hashtable needs to be rehashed before
it can be accessed after any GC, there are possible optimizations to
this requirement that, indeed, make a difference to some applications.

For any hashtable predicate (whether or not one of the standard four
predicates in the ANS) the set of possible Lisp objects can be partitioned
in an implementation-dependent way according to their properties as hash
keys (see ANS 18.1.2 Modifying Hash Table Keys):

 (1) Some objects change their hash lookup as a result of GC.
 (2) Some objects don't.

Now, the implementation of a hash table certainly can know the difference
between these two kinds of objects, and it follows that the hash table can
know (sometimes) that it contains no keys from the first class.  If it
contains none, then it need not rehash after GC.  If it contains some, or
previously contained some and _might_ still contain some, then it can
trivially and efficiently recompute this binary attribute the next time it
is forced to rehash by access after a GC.

Now this might at first seem a tedious and useless optimization applicable
only in absurdly rare circumstances, but this is not the case.  Consider
for example an implementation that assigns immutable hash codes to each
symbol upon its creation.  If all the keys in the hash table are symbols
(or perhaps fixnums or other eq--under-gc-preserving hashcodes) then the
hash table does not need to rehash after GC.  And the hash table can
maintain this attribute with nearly zero cost.

ACL performs this optimization (the implementation is mostly due to Duane)
and it is very important to certain applications.  But there is a meta
point here:  The performance of hash tables is very subtle, and it is
_very_ hard for a naive application to predict and optimize the efficiency
of particular hash table usage.  So, portable applications can't depend
upon this or any other hash table optimization.  For many applications,
efficiency-critical hash table use remains in the domain of portions of
the application tha may need to be customized to the implementation.  This
is, of course, not a good thing, but it is the condition on the ground.

Now, during the X3J13 process Kent was eloquent that the standard should
not merely define certain guaranteed semantics, it must also delineate
(so far as possible) the proper usage that quality implementations with
achieve with efficiency, and not suggest code usage that (while
semantically portable) would not be practically portable because of
uncertain performance behavior between implementations.  (Kent might not
remember making this point, but I remember it well and I remember being
impressed with the point at the time and agreeing with him.  I still
agree.  If Kent wants to deny the point now, that's his privilege, but
I'll point out that Kent probably _also_ doesn't remember what I ate for
breakfast yesterday...)

It is really _really_ hard to capture the possibility of this kind of
optimization in a language standard.  The conditions of applicability
are much too hard to nail down. (Memory addresses change in any sort of
copying GC, but never change in freelist gc implementations.  Few GC
implementations these days use freelists, but it occurs to me that my
monthly salary would today easily buy more semiconductor memory than
ever plugged in all 36xx/LM-xx chassis every built, don't discount the
notion that GC strategies may yet change again.  Maybe there will again
even be viable implementations that never GC.  But I digress...)
I'm here publicizing the optimization because if enough other
implementations provided it, there would be that much less variability
in hash table performance, and the language would be improved.




Tue, 16 Dec 2003 11:45:39 GMT  
 
 [ 16 post ]  Go to page: [1] [2]

 Relevant Pages 

1. ANSI CL spec question: FORMAT ~f 'symbol

2. RTFM question...

3. RTFM and/or parser question

4. Use blinking property node with a pipe indicator from Automation Symbols toolkit

5. TinyScheme, trying to set symbol properties

6. Symbol Properties or Generic Functions

7. Eliminating a property of a Lisp symbol.

8. Setting a property in a symbol.

9. CL standardization (identifiers'semantics and symbols)

10. Tricky indexing question (probably impossible)

11. Extremely naive (and probably stupid) question

12. I have a very simple..probably stupid..clipper question

 

 
Powered by phpBB® Forum Software