Reinventing Python

In late 1998, a discussion on the types-sig was looking at, among other things, getting some facilities (that could fit naturally into python) which would make for `customisable' class behaviour. This involved modified class objects which should, naturally, be instances of a so-called `meta-class' - that is, a class whose instances are, themselves, classes.

Rather than adding more variant special-case hard-wired stuff in the python engine to provide meta-class facilities, let us bring more of what is below the bonnet of the engine out into the domain of python: found the existing model on a Spartan sufficiency; implement the existing class and instance `types' by building implementations (as if in python on this small foundation) of their behaviour; build (actually in python on this small foundation) kindred tools which do the kinds of things the types-sig is thinking about; then use the flexibility of python as a prototyping tool to let us play with these until we understand what to keep. We can then, of course, replace mission-critical parts of the result with C coded modules; build something bigger than the minimalist toy used in studying the matter, if this gives us some efficiency wins, possibly in conjunction with a deliberate sacrifice of unnecessary spare rope with which to hang oneself.

Towards a smaller foundation

So what must this Spartan toy support ? It must, inevitably, support some kind of function which it `simply calls' - however far we may chase down a chain of __call__ attributes, we must have some, somewhere, that we can `simply call'. We'll need to be able to package some of the existing python engine's (other) types - e.g. int, long, real, string, tuple, list, dictionary - in some form, while replacing others (initially at least classes and instances) with toy implementations. The more types we can faithfully migrate into the latter form, the more flexibility we'll be able to play with. (I'd aim to put the existing pythonic tuples, dictionaries and functions among the former. They are perfect already. ;^)

Namespaces

The types I want the toy to implement are namespaces, and I have at least half an eye on how to unify them with the two other namespace types (modules and packages). Although the python engine has different ways of performing namespace lookup, depending on the type of namespace involved, they can all be modelled in terms of functions which I'll describe as `lookups' - a lookup expects one input (a name), is apt to raise AttributeError and describes the value it returns as the `attribute' it associates to that name. With this view, a namespace is simply a packaging of a lookup: getattr(obj, key) unpacks the lookup packed up as obj, passes this lookup the given key and returns the result. Python provides ample elbow-room for implementing lookups which do everything we need (hidden arguments are very useful).

If the toy is to give elbow-room for experimentation, we also need a syntax by which python code can bring new namespace objects into being. I want to allow a module (for instance) to have a hierarchical namespace (so module units may have within it a namespace units.SI whose attributes are (say) the S.I. units for quantities of each `dimension'. This gives rise to the most primitive of object notions, the sub-namespace of a namespace (originally a module). Simply allow it to partake in the standard protocols of the language (e.g., and most importantly, attribute lookup): this primitive notion would suffice to construct sophisticated callables which could be used as if they were, for example, classes, complete with more-or-less arbitrary protocols for their behaviours.

However, by this stage I notice I have two supposedly primitive but very similar constructions: the new sub-namespace (which I'll call object) and the old and familiar class, which has always been available directly. Each can be characterised by the way its syntax implies the generation of a suitably-initialised object, its insertion into some namespace, and the subsequent execution of a suite of code, typically in the namespace just created, or at least with access (if only via globals) to it.

So I end up deciding that object creation should be a packaging of - surprise ! - a function. This one (which I'll call a `builder', at least for now; until 2002 I called it a `generator', but I gather some other traditions are using that name for something else ...) takes an argument list (the tuple that at least a class thinks of as __bases__) and returns a pair - that is, a `twople' or 2-tuple - of objects: a `built object' to be bound up in a namespace, as dictated by the syntax of the `build statement' causing the function to be invoked; and a `suite wrapper', in the namespace of which the suite of the build statement is executed (after the built object has been stored in the ambient namespace).

The built object and suite wrapper may coincide, as in a plain object; even when they differ, attribute access in the namespace of the suite wrapper will typically correspond to corresponding access to the built object, though the wrapper may support attribute modification on the built object even when the latter does not, or via some very different process. The existing behaviour of classes and their instances, along with the new `objects', can then be supported by suitable builders (which could, of course, be implemented under the bonnet, so long as this is done `as if' in python) given the natural raw object-creator implied by the packaging of lookups as namespaces.

At this point, class ceases being a reserved word and becomes, instead, a built-in builder. This does provide elbow-room for us to vary the recipe used to build new objects with subtly altered behaviour: at the same time, it provides for implementation of what we're used to (or, for a more polished approach, see the Commentary following).

In the interests of allowing the python engine to handle magic attributes such as __file__ and __name__, a builder should also accept name=value arguments to be understood as stuff that the python engine wants to put in the namespace of the newly built object. The builder is not obliged to pay any attention to these, but it should accept them - the same goes for its `positional' arguments read as the new object's bases: every builder should be callable in any way allowed to a function defined using

def builder(*args, **what): return None, None

Commentary

A builder returns two objects: one gets bound into some ambient namespace, the other is used as a namespace in which to execute some code. In bonnet.py and python.py, I build up a succession of builders (or see just the code from both).

mason
archetype for builders, in bonnet.py: serves merely to carry documentation.
within
introducing the __lookups__ protocol, in bonnet.py

This serves as the foundation for what follows. It comes with associated aslookup which returns an attribute-lookup function for its (one) input. This will be the internal function to which getattr would be delegating, where available: otherwise a lambda calling getattr is built on the fly to do the job. These tools, within and aslookup, are defined in bonnet.py; they would sensibly be implemented `under the bonnet' of the interpreter, rather than actually in python.

The lookups of within support a __dir__ attribute listing all the public names in the namespace: within and instance (below) regard names as public if they do not start with an underscore (bonnet.py provides a function, public(name), which returns a false value unless the given name is public). Other lookups supporting this attribute should make clear what they consider `public' if this differs. Note that __dir__ will be computed each time it is sought, which may be expensive - so __dir__ should be used sparingly !

During a build statement using within, the __lookups__ list of the new object is accessible: most obviously, one can add to it to give the new object richer semantics. However, this access is provided only by the object in whose namespace the builder's suite is executed: once that object has been discarded, the lookup chain is effectively frozen. The builders below, defined in python.py, do not provide access to the __lookups__ attribute.

The suite-wrapper of a within (likewise: that of an instance) also supports a (no-op) __del__ attribute: this is needed to prevent it borrowing the __del__ attribute from a base, which would cause mayhem when the garbage collector tidies away the wrapper. The same concern applies to all suite wrappers.

instance
builds `instances' supporting basic facilities plus any magic defined by a `class' provided to the builder. The basic facilities include:

No class is provided when instance is used as the builder in a build statement: the class, if any, is passed via a keyword-only parameter of the builder (and passing class=None is synonymous with not providing a class). If no class is provided, or the object provided as class lacks a __wrap__ attribute, the suite wrapper is equipped with fall-back attribute modification functionality acting on its __dict__ attribute: this functionality is accessed last among the suite wrapper's lookups, so can over-ride itself if it is used to assign to __delattr__ or __setattr__; doing so should be approached with caution !

If the class has a __wrap__ attribute, the suite wrapper uses attributes thereof as methods (i.e. functions to be curried with the wrapper as first argument, in the same manner as the self parameter of methods of a python class); these are accessible to the suite wrapper but not the built object; and the wrapper accesses them before (i.e. in preference to) any attribute modification functionality the built object may support (so using attribute modification functionality of __wrap__ to modify __setattr__ or __delattr__ will only affect attribute modification on the built object, not on the wrapper).

Note that, when the class does have a __wrap__ attribute, fall-back attribute modification (as above) is not provided: if the class has a __wrap__ attribute which does not provide the relevant magic methods, the suite wrapper will not support attribute modification. Furthermore, when building a __wrap__ attribute, care must be taken over defining the methods which effect attribute modification.

architect
archetype - i.e., like mason, this just carries documentation - describing what a `builder class' is.

A builder class is a builder which supplies itself as the class argument to instance when building its instances. Thus it may equip its instances with interesting functionality via the __wrap__ and __magic__ protocols outlined above.

builder
a builder class which builds builder classes

Instances of builder can serve as builder classes. They support the magic and wrap protocols specified by instance. They also support, via the former, magic lookup schemes data (by which an instance borrows attributes off the __data__ attribute of its class) and method (by which an instance inherits curried methods from the __method__ attribute of its class).

Builder classes built by builder also contribute their magic schemes (including wrap) to their instances; this is only relevant in so far as the instance is itself a builder class; indeed, it is by arranging to look enough like an instance of itself that builder contributes the magic schemes just detailed to its instances (which are builder classes).

class
a builder class built by builder; builds classes.

Provides (via somewhat scary wrap magic) each suite wrapper with attribute modification functionality which stores: magic attributes of lookup schemes on the class itself (i.e. the built object), functions on its __method__ attribute (forcing a copy-on-write operation in doing so) and all other data on its __data__ attribute (likewise).

Provides a __call__ method, for its instances, which instantiates the class in the manner of python: it creates the instance, invokes its __init__ with the arguments passed to the instantiator and returns the suite wrapper (not the built instance, though this is accessible as the wrapper's __self__ attribute; python instances expose their __dict__ attribute; python instances are synonymous with the self their methods see).

Although class is implemented as an instance of builder, and builder as an instance of instance, the primitive builders within and instance are implemented as functions. The body of a build statement using instance, builder or (naturally) class can `look like' that of a python class statement: it can begin with a doc-string, implicitly assigned to name __doc__, it can def functions and assign names in its namespace without having to do anything fancy. Indeed, builder is implemented as an instance in just this way; likewise class is implemented as a builder.

Note, however, that an instance of builder or instance which wants to store methods on a __method__ sub-object, or similar, has to do so by saying (something like):


builder novel (some, base):
    instance __method__ (__method__,):
        def demo(self): return 'demo'

thereby creating a new classless object, borrowing from the __method__ object originally built when builder constructed novel, but extending it with the extra method demo. Since python's magic for `private' attribute names (which start in double underscore but don't end in it) is mediated lexically on the reach of the most closely enclosing build (presently just class) statement, a private attribute mentioned in such a __method__ definition of a builder or instance will be entirely disconnected from any seemingly identical private attribute provided by a corresponding __data__ entry or mentioned in any associated __wrap__. For contrast, class lets a single build statement conflate at least the __data__ and __method__ private namespaces.

Initialisation Wrappers

The new object created by a build statement using instance as builder has no class. The build object supports attribute lookup but not modification: however, its suite gets executed in the namespace of a suite wrapper which borrows from the built object and provides attribute modification methods which modify the namespace of the object being created. So we are able to populate the new namespace during its creation, without supporting later modification. It is for this that the build statement returns a pair of objects: the heavy-weight object needed to initialise a new object (which might, without it, be light-weight) need not be a persistent part of the new object. The suite wrapper exists only for the duration of the build statement's execution, after which it will fall to the garbage-collector. Some behaviour of the new object may be hidden by attributes of its suite wrapper: these come into play after the suite wrapper has been discarded.

In general, a suite wrapper can be as simple or as sophisticated as one wishes.

This allows suite wrappers to do various jobs:

Thus various semi-magical things we either already do or might want to do become implementable in suite wrappers, saving us the need to hard-wire them into the python engine: indeed, different builders could use totally different magic for many of these things. This will make it much easier to play with different options for how to do these `behind the scenes' magics and thereby discover what works well for subsequent use as `standard' idioms which we might chose to hard-wire into a descendant or cousin of python.

Dictionaries, which live in the python world with limited namespaces, behave as if built with an initwrap which sets up their visible namespace (which can't be modified subsequently) on top of a hidden primitive which supports item lookup (you can't ask a dictionary for its __setitem__ attribute, but it behaves like it has one).

Conflicts

I've presumed a different model of how namespaces get their globals. To show the difference, here's a piece of code presuming what I have, run by python 1.5:

>>> def out(key, val):
...     class dumb:
...             safe = val
...             def meth(self, _k=key): return _k, self.safe
...     return dumb
... 
>>> out(3, 7)
Traceback (innermost last):
  File "", line 1, in ?
  File "", line 2, in out
  File "", line 3, in dumb
NameError: val

The order of events presumed in building a class here conflicts with the Don Baudry hook (apologies if mis-spelled), which wants a dictionary made out of the class' body as an input to the function which builds the class object itself. I'm presuming the class object to be created before its suite gets executed: while it may be possible to pass the hook the desired dictionary, while creating the class, it won't yet contain the results of executing the suite, which the hook presently expects. For similar reasons, the hook can't serve as a means for me to slyly slide my schemes into the existing engine.

The relevance of the toy

Aim: python's interface to entities is `transparent' - it `merely makes available' in python the internal behaviour of things whose behaviour is faithfully described by the python engine's model of them - and the internal machinery really operates in the way described; to do which the entities must know enough of how to `cheat' that they can get things to happen for real; this `cheating' happens within a careful set of rules (to do with `privacy' of data fields and kindred protocols) so that it doesn't trip itself up and the python engine never notices the cheating.

Note, for pythoneers: I imagine the C I'm playing with probably isn't the best way to implement a python engine in practice, but the way I'm approaching it will lead me to something I Know will tell me which things matter. In some sense, I'm implementing a `canonical virtual machine' so as to identify the structures that must be done right in a practical implementation. In particular, hard-wiring dictionaries into the foundation would make everything much easier - it's just that, in the design, I'd get distracted by the incidental properties of dictionaries and wouldn't be able to see the structure underneath. Once I've seen that, I expect to understand what to do with dictionaries to make it look as if I've done it the canonical way, but at much less cost: which will also teach some lessons in how to use dictionaries generally.

Written by Eddy.
$Id: python.html,v 1.4 2002/01/30 17:52:21 eddy Exp $