Module Packages - package-search-strategy controversy 
Author Message
 Module Packages - package-search-strategy controversy

At the first python workshop (nov '94) i proposed an extension of the
python 'import' mechanism, to support nested-module "packages".  We
discussed it, and i then implemented a prototype of what we discussed,
in time for the second python workshop this may.  Since that time i
have been reworking newimp, to work out (serious) bugs and refine the
structure for further enhancements.  In resolving some details of some
changes with guido, he proposed an alternate to the package-search
strategy i've implemented.  We would like to open up the discussion to
the group, in order to help decide which way to go.

(I should mention off-the-bat that i have a clear preference between
the two schemes.  I will try to minimize the amount that this will
slant my presentation, and i hope it does not obscure any meaningful
issues.)

The import extension provides module "packages" - modules within which
other modules are defined.  In python, packages are simply module
objects containing their constituent modules.  In the filesystem,
packages are directories or collections of directories (depending on
the path associated with the module) containing modules.  The
difference between the two schemes we're considering concerns
specifically how the module names in an import statement are
interpreted, and how the corresponding modules are sought in the
filesystem.

In particular, python's current import mechanism uses a path-based
search of directories listed on sys.path.  With packages, we need to
consider how to designate searches within a package versus searches
along the root path (ie, sys.path).  The two schemes we are
considering basically differ in how imports originating within a
package are interpreted and resolved.

                           The two schemes

Alternative #1: "expanding search"

   On import, by default, the named target modules are first sought
   within the containing package, then within its' containing package,
   and so on out to the "root" - the sys.path-dictated package.

   In a special case, module names prefixed with "__." are sought only
   along sys.path - the contents of intervening packages are ignored.

   Modules imported in this scheme are established in the imported
   namespace sans the '__.' prefix, if any.  The actual names of the
   module objects, though, reflects the package nesting that contains
   the name, eg 'package.subpackage.name'.

   Examples:

     1  import utils            # Search current package outwards, until
                                # 'utils' is found.
     2  import filesys.utils    # Search outwards for 'filesys' package
                                # which has 'utils' module component.
     3  import __.utils         # Search sys.path for 'utils' module.
     4  import __.filesys.utils # Search sys.path for 'filesys' package
                                # which has 'utils' module component

  1 and 3 would bind the name 'utils' to the imported module, 2 and 4
  would bind to the name 'filesys.utils'.

Alternative #2: Explicit, relative reference - "explicit-relative"

   *Almost* the converse of the expanding search, the default search
   (when no '__' prefix is specified) is to search only on sys.path.

   The special case, with the '__.' prefix, means search in the
   current package.  Doubled '__.__' means search in the packages'
   package, and so on.

   The imported modules are bound with their full names (including
   their package qualifications, ie relative to the root path) - the
   same as the actual name of the corresponding module object.

   A variable named '__' is automatically bound in every module to the
   package which contains the module.  Hence all of a packages loaded
   modules are reachable via conventional variable attribute ref
   '__.<modname>'.  Likewise on up the hierarchy.

   Examples:

     1  import __.utils         # Search current package only for utils module
     2  import __.filesys.utils # Search for 'filesys' package with 'utils'
                                # module, only in current package
     3  import utils            # Search sys.path for utils
     4  import filesys.utils    # Search sys.path for filesys package
                                # which has 'utils' module component
     5  import __.__.utils      # Search current package only for utils module
     6  import __.__.filesys.utils      # Search for 'filesys' package with
                                        # 'utils' module, only in package
                                        # _containing_ current package.

   The local names bound to the modules resulting from the '__'-qualified
   imports depend upon the name of the package from they are loaded.
   However, once imported, they are inherently reachable by all modules
   within that package using the '__' package variable, eg 1 via
   '__.utils'.  The module also can be bound under it's relative name,
   only, using, eg, 'from __ import utils'.

                        My Preference, and Why

I strongly prefer alternative #2, the explicit-relative scheme, for
a few primary reasons:

Determined identification of target packages:

   In the explicit-relative ref scheme, implicit searches do not go
   across package boundaries.  The programmer must overtly express, in
   their code, in which package the imported module is supposed to be
   found.  In the expanding search scheme, only loads from the "root"
   package *can* be unambiguously spelled out.

   In expanding-search, the reader must search for the match in the
   filesystem, themseleves to determine which module was obtained.
   Unless they use the rooted, sys.path-based imports, the programmer
   cannot control where in the intervening packages the desired module
   is supposed to be found.  People reading the code cannot tell, from
   the code, whether or not the right one was, in fact, found - there
   is no redundancy, it is not determined except according to the
   arrangement of filesystem.

   With long, substantial experience as a system administrator, i
   recognize that many of the problems i had to help people untangle
   stem from a unbridled search of this kind.  Much of my time helping
   people sort out problems came down to their environment
   customizations, eg, getting executables from different "packages"
   than they expected - /usr/bin/mail instead of /usr/ucb/mail for a
   rudimentary one, or /usr/5bin/cc instead of /usr/bin/cc.  I think
   the explicit-relative style would avoid this problem, without
   getting in the way.

   Explicit-relative still has search, within the context of each
   target package, but that is controlled by the package, and is not
   sensitive to the calling context.  The package can set several
   directories in its search, but different people will get the same
   searches once they designate the package as their search target.
   And they can designate the target package unambiguously.

   In general, i think the programmer should be given maximum
   opportunity to be clear about what components they are using, so as
   much as possible is determined in the code.  Programmers can use
   the 'from __ import ...' form to avoid sprinkling their code with
   local '__.'  references, but at least that local naming will be
   spell out in the code, so it is clear and determined.

Coherent '__' denotation of package, and handle on siblings and ancestors:

   '__' coherently connotes the parent package, both within the
   context of variable references and in the context of import
   statements.  I think this will prove to be extremely useful.  For
   instance, a module can force importation of its' entire containing
   package, via 'import __', and can then refer to any of the loaded
   siblings, using '__.sibling'.  It need not spell out the loads, or
   else the names, for every sibling.

   In the expanding search scheme, on the contrary, the module must
   either import every one of the specific siblings, or load the
   entire package.  (I think that loading the package can be tricky in
   the expanding-search scheme, and imagine how it would be if the
   package had the same name as one of its' constituents!).  If it
   went with loading the entire package, the module would then have to
   use the package name together with the sibling name to refer to the
   siblings.

   In explicit-relative, regularly imported (ie, not 'from
   ... import') modules are always bound into their importing
   namespace with their full-nesting names.  Thus, a module loading
   two different 'utils' modules will not yield name collisions, as
   would happen in expanding-search.  Moreover, the module can handle
   the imported modules using the same name that it used to import
   them.  In expanding search, name collisions will require finagling,
   either by stashing a separate reference before doing the second
   import, or by using an unambiguous absolute import from the root.

                                In Sum

Due to my system administration experience, i cannot overstate how
much i think unambiguously identified target packages will contribute
to clear code.  I think the explicit-relative-ref mechanism i've
implemented provides a tidy, powerful means of providing that sort of
scheme.

Considered comments are welcome.  I'll post the code that implements
the explicit-relative scheme immediately after posting this one.

Cheers!

ken



Tue, 23 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy
Ok, I'll bite: I don't like either scheme, for a number of
more-or-less unrelated reasons.

First, I agree with Ken that the expanding-search is probably not a
good idea, since you may pick up modules in unexpected
places. However, I think that his alternative will lead to incredible
amounts of __.__ stuff in code, not really increasing the legibility.

Second (or is this third by now?) I don't think that the hierarchy
you're proposing is the right one. There is clearly a hierarchy of
packages, but I think that it is much more a dynamic hierarchy than a
static one.

Next, as soon as the stuff is imported I see little more use for the
extra qualifiers. I really don't want to go around saying
UI.Tk.Tkinter.Button, and Ken's semantics seem to lead to a style
where that would be the normal case. (I may have a strange style here:
I *do* like one level of qualification. You won't see many 'from xxx
import *' in my code).

I think that my preference would go to a system whereby the importer
specifies from which environments the importee can import
modules. Think of it as a scoped sys.path: a module can add stuff
to sys.path, but that will disappear again as soon as we leave the
module.

I haven't yet thought much about the details, but they're probably
pretty gory. You'd need a per-module copy of the path, say __path__,
but I'nm not sure that that is enough.

Would it be acceptable to have to code something like
        __path__ = addpack('Xextension')
        import X11
        import Xm

--
--
Jack Jansen        | If I can't dance I don't want to be part of

uunet!cwi.nl!jack    G=Jack;S=Jansen;O=cwi;PRMD=surf;ADMD=400net;C=nl



Wed, 24 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

Quote:

> I guess I'd find it easier to think about how the more complex cases
> should work if I understood how and why they would occur.  What kind
> of packages would be nested?  What's the advantage, compared to independent
> non-nested packages?  Is it possible that the primary reason to nest
> packages would be to get exactly this transparent name space overlay
> effect?

The purpose of packages, and how they might be used, are good
questions, and ones i should have expressed at the outset.  (Then
again, my initial message was long enough that it probably already
discouraged some readers!  Oh well.  Here goes.)

Probably the formost purpose for packages is to enable people to
gather associated module components of a unitary system into a unitary
collection.  Examples might include a numeric-methods package, an
editor, a web browser or server, and so on.  Each of these systems
would probably consist of a suite of distinct modules which use
eachother, as well as modules from the standard library and, perhaps,
from other packages.  In general, packages are intended to enable
organization of associated suites of modules into a coherent, unitary
arrangement.

In a more technical vein, python modules currently inhabit a flat
namespace, both in the filesystem and on sys.modules.  This is a real
problem, because you can't get at both of two different modules that
happen to have the same name.  In general, having a flat namespace for
modules is like having a flat namespace for all variable names - ie,
just as functions should not have to worry about naming their local
variables uniquely from all other local and global variables, so
system developers should not worry about giving a *globally* unique
name to every module in their system!  (Modules may be less common
things than local variables, but python is an extensible language,
with extensions commonly expressed in modules and suites of modules.)

The example i've used before is having a 'mailbox' module associated
with both email handling and shared-memory handling.  If both
'mailbox' modules are on sys.path, only one can be reached, and only
one at a time can be loaded into python, because they will each take
over the module name 'mailbox'.  However, if one is situated within a
'shmem' package and one is within an 'mail' package (possibly itself
contained within an 'inet' package), then there will be no conflicts
between 'shmem.mailbox' and 'inet.mail.mailbox'.

(There are many, more general examples.  Many systems include suites
of routines that are shared across the system.  These routines can
typically be bundled into a 'misc' or 'utils' module, common names
that would collide across packages.  Or a system may want to have it's
own overload of the standard 'os' module, loaded under the same name,
and importing the standard 'os' module somewhere inside.  Whatever the
reason, a system should only be burdened with arriving at *locally*
unique names for its modules - like a function should only be burdened
with arriving at locally unique names, not globally unique ones
w.r.t. all current and future modules in the library!)

Finally, packages offer a means for organizing modules in a rational,
findable way.  One example that already exists is in the file-path-
oriented routines pseudo-"packaged" in 'os.path'.  I envision
organizing the standard library into some major categories.  Here's a
beginning of an organization i put together one day, which helped
provoke the proposal for newimp:

sys package:
  - current contents of sys - sys.path, sys.modules, sys.ps1, ...
  - .compile subpackage:
    - compile, dis, compileall, importall (ie, eg: 'runtime.compile.dis')
  - .debug subpackage:
    - bdb/pdb/wdb, profile, pstats, traceback ('from runtime.debug import pdb')
  - .data subpackage:
    - pickle, shelve, dump, copy, array, struct, marshal, matrix
gui:
  - ...
os:
  - regular contents of os
  - path, stat, mutex, pipes, subproc subpackages
inet:
 - .web
   - httplib, urllib, gopher, cgilib
 - .mail
   - mimetools, rfc822, mailbox, mhlib, packmail
 - ftplib, rpc, telnet, pty, server, dnslib
string
 - string, regex, regsub
 - .cryptography
   - md5, des, rsa4, rsa5, .........
db
 - dbm, sql, msql, oracle,
apps
  - .dancer (web browser)
  - rcs
  - cvs
  - etc, etc, etc

Not only will it be easy to remember to go to the inet.web package
when looking for the httplib module, but people who don't already know
what's available for dealing with the web can go and browse the
contents of inet.web.  (I realize that there is never a perfect way of
categorically sorting substantial sets of things up in the real world,
but i also know that a reasonable degree of categorization is both
possible and can go a long way to making things more manageable.)

(Incidentally, i know that substantial lisps have had packages for a
while, and gather that perl has a package arrangement.  I'd be
interested to hear about the organization of it, and whether there are
any lessons we can take in the organization of pythons standard
library packages.)

To get back to donn's question:

Quote:
> non-nested packages?  Is it possible that the primary reason to nest
> packages would be to get exactly this transparent name space overlay
> effect?

I'm not certain i understand what you mean by that, but i don't think
it's where i envisioned all this is going.  I don't think the
inheritance paradigm that you mention in other parts of your message
holds very well, except perhaps in cases where you have a module sort
of "overloading" a standard module, eg the package-specific version of
the 'os' module i allude to above.  As you can probably tell from my
explanation, this is not what i see as the primary use for packages.

Though an analogy with the division of python namespaces into local
and global partitions is a little closer, it still doesn't equate
well.  The package hierarchy is unitary, recursively nested, and
shared across all modules, while, in python, each module has its' own
local and global namespaces, and they're only two deep.  I think
packages are a different kind of beast, that need to be considered
according to their own particular utility.

Quote:
> For the simple case where only one package is involved and there's no
> need to borrow modules not written for that package, either alternative
> sounds fine.  If I put together a package of modules, and then forget
> they're there and accidentally import one when I wanted a sys.path module
> by the same name, then I should probably adjust my coffee intake or
> something.  Because evidently the name I used is known to me as part
> of Python, so the name collision with my own module must have been
> deliberate.

What if someone else put together the package, and you're reading
their code?  I think it would be helpful and proper for readers to not
have to know the names of all the modules in the package, and in
containing packages, in order to tell wether imports come from inside
or outside package containers.  With the explicit-relative scheme, you
can tell from the code which modules they intend to get from their
package and which modules they don't.  With the expanding-search, you
need to know the other contents of the package, and its containers.

I believe this kind of redundancy is crucial to the programmer, as
well.  You're right, the programmer does have to know which module
they mean to be getting.  Well, why not have them explicitly express
that in the code?!  That way, if the package structure is changed,
that encoded dependancy will express the structural dependancy, and
raise an error if the dependancy was broken.  This is a good thing!

BTW, i do **not** intend for every one of my responses on this issue
to be this long!-)  This one is because i took the opportunity, in the
guise of answering a direct question, to present my view of the
potential employment of packages in python.  I do think we should
defer discussion of things like the prospective packaging of the
standard libraries to a separate thread, and maybe delay it until we
get the current issue tackled.  That was a bit of a side trip, but i
did want to give the flavor of it, in order to flesh out thinking
about the two different search schemes being proposed.

ken



Wed, 24 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

Quote:

> Ok, I'll bite: I don't like either scheme, for a number of
> more-or-less unrelated reasons.

> First, I agree with Ken that the expanding-search is probably not a
> good idea, since you may pick up modules in unexpected places.

.

Quote:
> places. However, I think that his alternative will lead to incredible
> amounts of __.__ stuff in code, not really increasing the legibility.

I don't think that is a problem!  I also like a single level of
qualification, and expect that i would be using:

        from __.__ import swanee, percolator, utils

in order to get the utils module (and others) from the containing
packages' container.

Then i would refer to utils.spam(), rather than __.__.utils.spam().
(I agree that the double '__.__' would be grotesque, but i happen to
like the single '__.' qualifier for referring to sibling packages,
which is automatic once the package itself has been loaded.)

If you just want the spam module, and want to avoid forcing import of
the entire __.__ package (which 'from __.__ import whatever' does
cause), you could do:

        import __.__.spam
        spam = __.__.spam

The important thing, in either case, is that you've clearly identified
*which* spam you're getting, and that 'spam' is just a prettier local
handle for it.

Quote:
> Second (or is this third by now?) I don't think that the hierarchy
> you're proposing is the right one. There is clearly a hierarchy of
> packages, but I think that it is much more a dynamic hierarchy than a
> static one.

This i do not understand.   ??

Quote:
> I think that my preference would go to a system whereby the importer
> specifies from which environments the importee can import
> modules. Think of it as a scoped sys.path: a module can add stuff
> to sys.path, but that will disappear again as soon as we leave the
> module.

> I haven't yet thought much about the details, but they're probably
> pretty gory. You'd need a per-module copy of the path, say __path__,
> but I'nm not sure that that is enough.

The current newimp implementation, as shipped yesterday, enables you
to do exactly this, among other things.  It does associate a path
with every module (and a bit more), and the load-path for packages can
be adjusted to include more than, or even other than, just the package
directory contents.

In fact, as guido and i recently discussed (i believe in response to
some things you asked him about), i'm going to soon be adding a hook
so that the existence of a directory in a package with a
platform-specific name will cause that directory to automatically be
added to a package's path, so the contents of the platform directory
are effectively projected into the contents of the package's immediate
directory.

Package modules, particularly the __init__ module, can already
explicitly do this kind of path extension.  (This is the kind of thing
for which the __init__ module is intended - to customize the package
loadup, instead of just doing the vanilla "load all the constituent
modules".)

Quote:
> Next, as soon as the stuff is imported I see little more use for the
> extra qualifiers. I really don't want to go around saying
> UI.Tk.Tkinter.Button, and Ken's semantics seem to lead to a style
> where that would be the normal case. (I may have a strange style here:
> I *do* like one level of qualification. You won't see many 'from xxx
> import *' in my code).

        from UI.Tk import Tkinter.Button

        b = Tkinter.Button(...)

ken



Wed, 24 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy
...
|                          The two schemes
|
| Alternative #1: "expanding search"
|
|   On import, by default, the named target modules are first sought
|   within the containing package, then within its' containing package,
|   and so on out to the "root" - the sys.path-dictated package.
...
| Alternative #2: Explicit, relative reference - "explicit-relative"
|
|   *Almost* the converse of the expanding search, the default search
|   (when no '__' prefix is specified) is to search only on sys.path.

Good points (that I deleted) about the advantages of requiring explicit
reference for non-standard imports.  Since there wasn't much said about
the reasons for proposing #1, for the sake of discussion let me give it
a guess.

Alternative #1, the expanding search strikes me as very natural within
Python's class inheritance paradigm.  If I derive one class from another,
I expect the base's member functions to be transparently available
in the derived class.  Granted that the analogy has some flaws, still
I can imagine getting the same kind of benefits.  For example, suppose
that I have developed some support for Mayan hieroglyphs and numbering
system.  This support addresses pervasive issues like strings, so I want
to essentially re-implement as much as I can with this Mayan twist.
With the expanding search, a lot of that's transparent for modules
written in Python - just re-import them as part of the package, and
transparently they'd take the string (and other) modules I've developed
as Mayan substitutes.

For the simple case where only one package is involved and there's no
need to borrow modules not written for that package, either alternative
sounds fine.  If I put together a package of modules, and then forget
they're there and accidentally import one when I wanted a sys.path module
by the same name, then I should probably adjust my coffee intake or
something.  Because evidently the name I used is known to me as part
of Python, so the name collision with my own module must have been
deliberate.

I guess I'd find it easier to think about how the more complex cases
should work if I understood how and why they would occur.  What kind
of packages would be nested?  What's the advantage, compared to independent
non-nested packages?  Is it possible that the primary reason to nest
packages would be to get exactly this transparent name space overlay
effect?

        Donn Cave, University Computing Services, University of Washington



Wed, 24 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy
[Whoops - my previous reply had an early line consisting only of a
'.', which apparently signaled an end-of-message to some mail agent in
the process.  Here's a resend.]

Quote:

> Ok, I'll bite: I don't like either scheme, for a number of
> more-or-less unrelated reasons.

> First, I agree with Ken that the expanding-search is probably not a
> good idea, since you may pick up modules in unexpected places.

Ok!

Quote:
> places. However, I think that his alternative will lead to incredible
> amounts of __.__ stuff in code, not really increasing the legibility.

I think this is not the problem you think it is.

I also like a single level of qualification, and expect that i would
be using:

        from __.__ import utils
        x = utils.spam()

in order to get, eg, the utils module from the containing packages'
container.  Then i would invoke utils.spam(), rather than
__.__.utils.spam().  (I agree that the double '__.__' would be
grotesque.  However, i happen to like the prospect of the single '__.'
qualifier for referring to sibling packages.)

If you just want the utils module, without precipitating import of the
entire __.__ package (which 'from __.__ import whatever' does cause),
you could do:

        import __.__.utils
        utils = __.__.utils
        x = utils.spam()

The important thing, in either case, is that you've clearly identified
*which* utils you're getting, and that you're introducing a prettified
'utils' handle for it.

Quote:
> Second (or is this third by now?) I don't think that the hierarchy
> you're proposing is the right one. There is clearly a hierarchy of
> packages, but I think that it is much more a dynamic hierarchy than a
> static one.

This i do not understand.   ??

Quote:
> I think that my preference would go to a system whereby the importer
> specifies from which environments the importee can import
> modules. Think of it as a scoped sys.path: a module can add stuff
> to sys.path, but that will disappear again as soon as we leave the
> module.

I don't understand the first sentence.  From the second, i gather
you're suggesting that a package designate a load-path of directories
which are considered containers of the package constituents.  If
that's correct, then newimp provides *almost* exactly what you're
saying, except that newimp provides a default path setting, which
consists of the package directory itself.  These packages load-paths
can be adjusted to include more than, or even other than, just the
package directory contents.

(In fact, i will soon work on implementing a refinement, suggested by
guido, and, i believe, inspired by you, where a platform-specific
directory found in the package directory is automatically included on
the package path, so the platform-specific directory contents are
effectively included among contents of the package's immediate
directory.  I had already designed in the package load-path, fitting
in the provisions for a platform specific dir is trivial.)

Quote:
> I haven't yet thought much about the details, but they're probably
> pretty gory. You'd need a per-module copy of the path, say __path__,
> but I'nm not sure that that is enough.

The current newimp implementation in fact does associate a path with
every module, and a bit more.

Quote:
> Next, as soon as the stuff is imported I see little more use for the
> extra qualifiers. I really don't want to go around saying
> UI.Tk.Tkinter.Button, and Ken's semantics seem to lead to a style
> where that would be the normal case. (I may have a strange style here:

        from UI.Tk import Tkinter
        ...
        b = Tkinter.Button(...)

ken



Wed, 24 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy
| The purpose of packages, and how they might be used, are good
| questions, and ones i should have expressed at the outset.
...

I think your example package hierarchy is the answer to my question.
Every last module fits into this hierarchy somewhere - it's a taxonomy.
In a taxonomy, packages grouped together may be functionally related,
but they're likely not functionally inter-related.  Their relationship
is the concern of the taxonomist, not the author.

I hadn't thought about that, I guess I was imagining more of an ecology
of packages.  In an ecology, the packages would group together because
they're inter-related.  The same module or package could reappear in
any package where it plays a functional role, or it could remain outside
any package.  These relationships would be created at the convenience of
the author.

Which explains the confusion.  The expanding search makes perfect sense
in a package ecology, no sense at all in a package taxonomy.

--- the rest of this post is Ken Manheimer's example hierarchy ---
| Finally, packages offer a means for organizing modules in a rational,
| findable way.  One example that already exists is in the file-path-
| oriented routines pseudo-"packaged" in 'os.path'.  I envision
| organizing the standard library into some major categories.  Here's a
| beginning of an organization i put together one day, which helped
| provoke the proposal for newimp:
|
| sys package:
|   - current contents of sys - sys.path, sys.modules, sys.ps1, ...
|   - .compile subpackage:
|     - compile, dis, compileall, importall (ie, eg: 'runtime.compile.dis')
|   - .debug subpackage:
|     - bdb/pdb/wdb, profile, pstats, traceback ('from runtime.debug import pdb'
|   - .data subpackage:
|     - pickle, shelve, dump, copy, array, struct, marshal, matrix
| gui:
|   - ...
| os:
|   - regular contents of os
|   - path, stat, mutex, pipes, subproc subpackages
| inet:
|  - .web
|    - httplib, urllib, gopher, cgilib
|  - .mail
|    - mimetools, rfc822, mailbox, mhlib, packmail
|  - ftplib, rpc, telnet, pty, server, dnslib
| string
|  - string, regex, regsub
|  - .cryptography
|    - md5, des, rsa4, rsa5, .........
| db
|  - dbm, sql, msql, oracle,
| apps
|   - .dancer (web browser)
|   - rcs
|   - cvs
|   - etc, etc, etc
-------------------
        Donn Cave, University Computing Services, University of Washington



Thu, 25 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

Quote:

[...]
> I think your example package hierarchy is the answer to my question.
> Every last module fits into this hierarchy somewhere - it's a taxonomy.
> In a taxonomy, packages grouped together may be functionally related,
> but they're likely not functionally inter-related.  Their relationship
> is the concern of the taxonomist, not the author.

> I hadn't thought about that, I guess I was imagining more of an ecology
> of packages.  In an ecology, the packages would group together because
> they're inter-related.  The same module or package could reappear in
> any package where it plays a functional role, or it could remain outside
> any package.  These relationships would be created at the convenience of
> the author.

> Which explains the confusion.  The expanding search makes perfect sense
> in a package ecology, no sense at all in a package taxonomy.

That's an excellent clarifying thought.  It seems to explain why Ken
favors the explicit-relative while I prefer expanding search.  In our
private discussions about this it has become clear that Ken is
concerned about the reader ("module Foo?  where's that defined?") --
while I'm more concerned about the writer, who knows perfectly well
that there's a module Foo in his package, and wants to import it with
the least fuss (i.e. "import Foo").


URL: <http://www.python.org/~guido/>



Thu, 25 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

Quote:


> | The purpose of packages, and how they might be used, are good
> | questions, and ones i should have expressed at the outset.
> ...

> I think your example package hierarchy is the answer to my question.
> Every last module fits into this hierarchy somewhere - it's a taxonomy.
> In a taxonomy, packages grouped together may be functionally related,
> but they're likely not functionally inter-related.  Their relationship
> is the concern of the taxonomist, not the author.

This perspective is *real* intriguing to me.

I'm kind of fascinated by taxonomies - one of my favorite reference
books is Roget's "classic standard" thesaurus, both because i'm bowled
over by the concept of taxonomically cataloguing a language, and
because i find it incredibly useful for tracking down words that i
only marginally have placed, or am able to locate, in my vocabulary.
I suppose the benefits i'm expecting from a taxonomic organization of
python libraries are related.

Quote:
> I hadn't thought about that, I guess I was imagining more of an ecology
> of packages.  In an ecology, the packages would group together because
> they're inter-related.  The same module or package could reappear in
> any package where it plays a functional role, or it could remain outside
> any package.  These relationships would be created at the convenience of
> the author.

I'm not sure i can completely envision what you're describing, but i
think it would be difficult, using what you're calling an "ecological"
organization, to tie things together in a coherent, extensible, and
*static* way.

Such an organization principle may make a lot of sense in the context
of a dynamic tool, however!  I presume it would have to allow a more
elaborate organization than one based on strictly hierarchical
relationships, and it would adjusts the organization as relationships
shift with the inclusion of new components.  Hey, this begins to sound
like, eg, a class browser!

However, i'm really interested in providing a basis by which we can
organize the python libraries, and by which people can organize their
own packages.  It needs to be in a fairly compartmented, self-
contained manner, so that the libraries and application packages then
fit together in simple ways.  (Rather like the ideal modularity of
program components, where each routine is either fairly self-contained
- a "black box" - or it interacts with other specific components in
its' package according to a well-defined, explicit interface - eg, the
explicit-relative references.)  That's where i see the explicit-
relative scheme.

Quote:
> Which explains the confusion.  The expanding search makes perfect sense
> in a package ecology, no sense at all in a package taxonomy.

Thanks for the illuminating perspective.  Great stuff to chew on.

ken



Thu, 25 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

Quote:

>> Which explains the confusion.  The expanding search makes perfect sense
>> in a package ecology, no sense at all in a package taxonomy.
>Thanks for the illuminating perspective.  Great stuff to chew on.

Hmm, you were starting to win me over, but now that it appears decided
that the package hierarchy is a taxonomy it suddenly doesn't make
sense anymore to import everything in x.y upon a 'from x.y import z':
the stuff in x.y isn't related anyway, so why would you want to import
it all?
--
--
Jack Jansen        | If I can't dance I don't want to be part of

uunet!cwi.nl!jack    G=Jack;S=Jansen;O=cwi;PRMD=surf;ADMD=400net;C=nl


Fri, 26 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

Quote:


> >> Which explains the confusion.  The expanding search makes perfect sense
> >> in a package ecology, no sense at all in a package taxonomy.

> >Thanks for the illuminating perspective.  Great stuff to chew on.

> Hmm, you were starting to win me over, but now that it appears decided
> that the package hierarchy is a taxonomy it suddenly doesn't make
> sense anymore to import everything in x.y upon a 'from x.y import z':
> the stuff in x.y isn't related anyway, so why would you want to import
> it all?

Because not all packages in the hierarchy are taxonomic categories:
some packages are applications in their own right, with their own
internal structure.  Consider the leaves of the taxonomy hierarchy as
being applications, which are not necessarily modules - some will be
packages.  (This ability to package large applications into integral
units is the basic purpose of newimp.  The opportunity to organize the
distribution libraries in categories - a taxonomy - is an additional
benefit.)

You would typically import the applications as integral units, eg,
'import symbolic' (for a mythical symbolic-math package).  But there
sometimes will be pieces that will be useful to other packages, as
standalones.  Eg, 'from symbolic import reduce'.  More importantly,
the internal modules of a package *will* be getting at the other
internal pieces, individually: 'from __ import reduce'.

I hope this helps convince you that there will be the need to import
package components of the hierarchy both as integral units and
piecemeal.  For that matter, i hope that was the question being
raised!  Lemme know if i'm on the wrong issue here...

Pardon me if i'm all over the place, here, but i have a sort of
interesting side note.  Some packages in the hierarchy can be both
taxonomic categories *and* applications in their own right.  There's an
example in the prototype library hierarchy that i began to sketch out in
my referenced posting:

Quote:
> sys package:
>   - current contents of sys - sys.path, sys.modules, sys.ps1, ...
>   - .compile subpackage:
>     - compile, dis, compileall, importall (ie, eg: 'runtime.compile.dis')
>   - .debug subpackage:
>     - bdb/pdb/wdb, profile, pstats, traceback
>   - .data subpackage:
>     - pickle, shelve, dump, copy, array, struct, marshal, matrix

The 'sys' package includes components that would be loaded on
initialization of the package - basically, the current sys variables,
eg sys.path, sys.ps1, etc - and *also* subpackages, which would not be
loaded until needed - eg sys.compile, sys.debug, etc.

(In the case of the sys module, the actual loadup would probably be
done as part of the python build, but the ability to achieve this mix
of initial and deferred loads is part of newimp.)

The way would this sort of thing is done is by having a master
__init__ module for the package, eg 'sys/__init__.py'.  This module
would take care of populating the __ package with the desired initial
components, importing component modules and/or packages, and/or doing
specific assignments - path, ps1, etc.  The __init__ would ignore
component modules and packages which are not meant to be part of the
standard package initialization.  Later, when one of those excluded
parts is desired, you would do an 'import sys.compile.dis' (or, more
likely, 'from sys.compile import dis', if you want a more convenient
handle for it).

(Note that musing this over led me to realize that i had to revise
newimp slightly to implement that last 'from' mechanism - it doesn't
work right in the version i recently shipped.  Works nicely in the new
version, though.  If anyone is interested in playing with this
particular aspect, let me know, and i'll send you the patch.)

ken



Fri, 26 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

|> Hmm, you were starting to win me over, but now that it appears decided
|> that the package hierarchy is a taxonomy it suddenly doesn't make
|> sense anymore to import everything in x.y upon a 'from x.y import z':
|> the stuff in x.y isn't related anyway, so why would you want to import
|> it all?
|
| Because not all packages in the hierarchy are taxonomic categories:
| some packages are applications in their own right, with their own
| internal structure.  Consider the leaves of the taxonomy hierarchy as
| being applications, which are not necessarily modules - some will be
| packages.  (This ability to package large applications into integral
| units is the basic purpose of newimp.  The opportunity to organize the
| distribution libraries in categories - a taxonomy - is an additional
| benefit.)

How does this fit in with the sys.path part of either search strategy?
I assume all the existing code that uses ``import regex'' will continue
to work, whether or not someone has officially assigned regex to the
``string package'', and ``import struct'' will still work even if struct
is in sys.  If that's true, the distribution will have a mostly flat
namespace, for practical purposes.

For the sake of discussion let me propose that the existing flat name space
is a good thing.  It doesn't allow name collisions, and that's also a good
thing (within the distribution, at any rate).  It's easy to use - I don't
have to remember where anything is, I don't have to track its movements as
it gets reassigned to another family when every third version of Python
introduces a new, rationalized organization.

In C, most common include files are at the top level, and the ones that
aren't are often annoying - viz. <file.h> (or is it <sys/file.h>?);
<time.h> (now, again, how is that different from <sys/time.h>?)
Maybe it's unfair to drag out the historical mistakes of C's include file
structure, but then it's probably not fair to expect the person who
reorganizes Python's module structure to be omniscient, either (unless
it's Guido himself, I guess).

Now, packages is a vital amendment to the flat space, because independent
parties working with Python do need their own name space.  If some of the
distribution modules need to be re-written, and it's convenient to use the
package system in the process, that's fine too.  I'm proposing simply that
the existing distribution modules won't be improved much by imposing a whole
hierarchy of name spaces.

If that all makes sense, then it seems to me we're back to the point where
each package is designed to meet some specific need, probably designed by
the author of the package contents.  Now, the stuff in x.y IS related, and
the expanding search starts to make sense again.  (Not that the issue is
resolved so easily, but at least one can start to see why it might have
been proposed in the first place.)

        Donn Cave, University Computing Services, University of Washington



Sat, 27 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

Quote:

> For the sake of discussion let me propose that the existing flat name space
> is a good thing.  It doesn't allow name collisions, and that's also a good
> thing (within the distribution, at any rate).  It's easy to use - I don't
> have to remember where anything is, I don't have to track its movements as
> it gets reassigned to another family when every third version of Python
> introduces a new, rationalized organization.

Uh oh.  It looks like we're nearly back to square one, then, in our
shared view of what the new import mechanism is to do.

The flat namespace may be better than a senselessly elaborate one (eg,
the C system header files), but that doesn't mean the flat name space
is good!  Your objections are against badly executed, overly elaborate
organizations, not to all organizations.

Quote:
> thing (within the distribution, at any rate).  It's easy to use - I don't
> have to remember where anything is, I don't have to track its movements as

I have to point out that you already *do* have to remember where
things are, even within the flat module namespace, because module
variables and classes, and such, add other levels.  Eg, you know that
you will find the module load path *within* the sys module, as
sys.path.  This is obscure to the uninitiated, but becomes clearer
once you realize that the 'sys' module contains the *python* system
stuff.

I'm suggesting, eg, building on that model, making sys a package that
has those things, and also "runtime" oriented modules like the
de{*filter*}, compiler, structure handling stuff, etc.  Then, knowing
about 'sys' means knowing where to find all that kind of stuff, and
knowing where to look when you're seeking it.

Quote:
> Now, packages is a vital amendment to the flat space, because independent
> parties working with Python do need their own name space.  If some of the
> distribution modules need to be re-written, and it's convenient to use the
> package system in the process, that's fine too.  I'm proposing simply that
> the existing distribution modules won't be improved much by imposing a whole
> hierarchy of name spaces.

I don't quite understand what you mean by "imposing a whole hierarchy
of name spaces" here, but i do agree that we don't want to get overly
elaborate.  However, you either have something Really rudimentary,
like just a public and a private namespace (a la the python global and
local namespaces for variables), or something like the package
mechanism, with arbitrarily nestable modules.

I say the two-level name space is untenable, because python is,
perhaps foremost, an extension language.  We want to provide for
people mixing and matching their own and other people's extensions,
and progressive incorporation of extensions to the distribution where
warranted.  At what point do we want to arbitrarily make it harder for
people to integrate their wonderful applications with the
distribution, itself?

And, granting nestable module packages, i don't know *where* you can
draw the line on the elaboration of packages people may need to build.

I believe we want to provide for unlimited growth of the application
libraries, and even of the core facilities.  If we do not provide a
sufficient framework, now, as the number of independent applications
increases, and the scale of some applications, themselves, increase,
we will face the same problem all over again, of insufficiently
compartmentable namespaces.  I think it would be foolhardy not to take
care of that now, in a thorough, but moderate, way!

I do agree that we want to avoid using such an elaborate central
organization that we keep needing to adjust it out from under
everyone.  I do not think it would be hard to come up with something
that adds some sense with just a little structure.

I thought that's what you had seen in the example hierarchy that i had
posted, but i suppose that you were looking at it more in abstract,
rather than specific, terms.  It may even be flawed, but it reflects
a first draft.  I think with a bit of attention we can come up with a
structure sufficiently simple that it encompass everything without
getting in the way.

Quote:
> How does this fit in with the sys.path part of either search strategy?
> I assume all the existing code that uses ``import regex'' will continue
> to work, whether or not someone has officially assigned regex to the
> ``string package'', and ``import struct'' will still work even if struct
> is in sys.  If that's true, the distribution will have a mostly flat
> namespace, for practical purposes.

I don't agree with it, so i don't agree that the distribution (module)
namespace will have a mostly flat namespace.  We can provide some easy
backwards compatiblility measures, to enable a smooth migration.  But,
as with the "grand renaming", i think it would be in our interest to
migrate to a moderately more organized arrangement, for the sake of
scaling and sensibility.

Quote:
> If that all makes sense, then it seems to me we're back to the point where
> each package is designed to meet some specific need, probably designed by
> the author of the package contents.  Now, the stuff in x.y IS related, and
> the expanding search starts to make sense again.  (Not that the issue is
> resolved so easily, but at least one can start to see why it might have
> been proposed in the first place.)

I don't think that does make sense, obviously, but at any rate i don't
see how expanding search follows from it!?  I thought jack was asking
whether there was any point, given the purely taxonomic scheme we
seemed to be discussing, to enabling imports of a package as a whole.
I was saying that some packages will be integral applications, with
internal module structure, that often would be imported as a unit, but
sometimes would have individual constituents imported.  

Anyway, i say that there is merit to enabling structure within
packages, and the *components* of such packages *will* need to be able
to import individual other components.  Packages are warranted, and it
will be worthwhile to modestly structure the python libraries.

ken



Sat, 27 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy

|> For the sake of discussion let me propose that the existing flat name space
|> is a good thing.  It doesn't allow name collisions, and that's also a good
|> thing (within the distribution, at any rate).  It's easy to use - I don't
|> have to remember where anything is, I don't have to track its movements as
|> it gets reassigned to another family when every third version of Python
|> introduces a new, rationalized organization.
|
| Uh oh.  It looks like we're nearly back to square one, then, in our
| shared view of what the new import mechanism is to do.

Well, it seems like it's worth discussing.  Some of this seems new to
me - not the design of import, but the notion of re-organizing the
existing distribution.  If it's actually been hashed out at tedious
length, I hope someone will clue me and I will lay off.  It sounds
like this eventually may mean having to rewrite most existing programs
written in Python, so there are naturally some questions.

| The flat namespace may be better than a senselessly elaborate one (eg,
| the C system header files), but that doesn't mean the flat name space
| is good!  Your objections are against badly executed, overly elaborate
| organizations, not to all organizations.

I proposed (for the sake of discussion) that the flat name space is good,
and I do mean compared to any hierarchical organization.  I don't find
C's /usr/include elaborate, in fact I think it's fairly good.  The main
point is, the parts everyone uses are right at the top, so you could go
for a while thinking it's flat.  That's why it's good, it's almost flat,
and that's a better way to describe what I mean - "almost flat" - since
I think everyone agrees that perfectly flat won't work.

If Python's first release were being written today, I'd still be ready
to write about the benefit of an almost flat name space, but it would
be just one alternative.  At present, it also happens to be the alternative
for which all Python programs are written.  I think it contributes to
the phenomenal ease of use that we enjoy with Python without detracting
a bit from its rigor.

...
| I believe we want to provide for unlimited growth of the application
| libraries, and even of the core facilities.  If we do not provide a
| sufficient framework, now, as the number of independent applications
| increases, and the scale of some applications, themselves, increase,
| we will face the same problem all over again, of insufficiently
| compartmentable namespaces.  I think it would be foolhardy not to take
| care of that now, in a thorough, but moderate, way!

I agree.  Let the package mechanisms support every conceivable feature,
and let there be extensive use of these features in the development of
packages.  That isn't the issue, unless the existing distribution needs
to be re-organized to accommodate this.  I'm thinking it doesn't, that
it should be possible to continue to support "import regex" while adding
structurally complex packages right and left.

What's more, the comprehensive organization may actually get in the way
of the package support.  We're talking about all this because there was
some divergence of opinion on how packages ought to be implemented, and
I think we're seeing this has a lot to do with what role packages play
in the system.

If we say a package is a special purpose tool that's going to be used
as part of a design that probably also includes the modules within it,
then it makes sense to *support* a strongly inter-reliant package structure.

If we say that packages are going to be applied universally, then we
begin to doubt that packages necessarily will have this characteristic
inter-reliance.  Some will, but others will be receptacles of convenience,
housing modules that never heard of each other but are conceptually
related.  The same package features that might serve a strongly
inter-reliant package well, may be unwholesome applied to packages
constructed in this latter way.

That's why without the comprehensive organization, I think you could
start to see why someone would propose the "expanding search".  It might
be a powerful feature, but only if packages are ordinarily inter-reliant,
designed entities; if that assumption won't be generally true it becomes
a liability.

        Donn Cave, University Computing Services, University of Washington



Sun, 28 Dec 1997 03:00:00 GMT  
 Module Packages - package-search-strategy controversy
I am concerned that the motivation for having a nested namespace for
python modules is not clear.  I think there are important reasons for
having such a thing, and those reasons also underlie the motivation
for the explicit-relative, as opposed to expanding, module search.

One question was, why not just partition the module name space into two
parts, one for the python distribution libraries, and one for the users?

I think this would neglect a fundamental reality of python - it is an
extension language, where much of the power is in using other people's
extensions, *together* with your own, *together* with the distribution
libraries.  The partitioning can not be just twofold, it must be manyfold.

And what happens when we want to combine the partitioned entities, eg
incorporate useful extensions to the python distribution with the
distribution itself?  Do we want the added burden of sanitizing the names
in the extensions, since they come from an "other" partition?  Or are the
distribution libraries supposed to stay the way they are now, forever??
Not likely!  Ideally, they would grow without requiring reorganization of
the existing pieces.

The hierarchical model i have in mind a model for the python module
framework is simple and straightforward, thorough, and, most importantly,
scalable into the future, with less need, rather than more, for
reorganization as the libraries grow.

It involves elaborating the namespace in a hierarchy, not for the sake
of implementing intricate and baroque relationships, but to leave room
for growth that easily avoids the pitfalls of name collisions.  A very
simple hierarchy - probably just two or three levels, at least
initially - would provide that.

Such a hierarchy makes it simple to recognize which module is which,
according to its name wrt the package containment.  Anyone really
familiar with python already depends on this principle to distinguish,
eg, the sys.path list versus os.path module.

(I realize these are different types of objects, but the principle, of
identifying them according to their containing module, is exactly what
i'm referring to.  This principle will work well for making the python
libraries, and individual user packages, more understandable, and
making their relationships more manageable.)

As with retrofitting any scaling provisions, this will mean migrating
existing import statements to a new naming scheme.  However, we should
be able to make this an easy transition, because we can very easily
implement some transient backwards compatability measures over which
we could gradually phase out the old names.

One more point about this model.  I think the expanding search is the
wrong way to implement the module search mechanism because it tries to
hide the hierarchy - it presents an impression of a flat name space
that is "tilted" to prefer the contents of the containing packages.
Such a scheme might be suitable for interactive shell command paths,
where you're not *supposed* to know whence the commands are coming.
The job or programming, however, entail identifying specifically which
module you mean to be importing.

Some seem to like expanding search because, they figure, it will
prevent variable-name clutter (the '__' qualifier prefix for
package-relative references), inherent in explicit-relative imports.
I have repeatedly tried to make it clear that the explicit-relative
mechanism gives the programmer complete discretion, using eg 'from
... import', about whether or not their variable names have the '__'
qualifier.  In explicit-relative "search", the variable names can
include or inhibit that information, but the import statements
themselves always convey the hierarchical model, clearly identifying
the modules being imported.

Just in case anyone is still interested.

ken



Sun, 28 Dec 1997 03:00:00 GMT  
 
 [ 26 post ]  Go to page: [1] [2]

 Relevant Pages 

1. package.module != module

2. Puzzling difference between module and package.module

3. Packages / PACKAGEd WSs / namescope isolation

4. Automatically unloading a package when loading another package

5. Save package as Dolphin 3.0 package

6. How to force a package to reference another package

7. HELP! Envy Packaging Problem: Packaging Proxies

8. Packaging ideas (was: Re: packaged REXML)

9. Giving a package specification access to the private types of another package

10. (generic) child package of generic package; intantiation of

11. Visibility of package within package

12. Package parameters in generic packages

 

 
Powered by phpBB® Forum Software