python package for scientific study One of my on-going projects is the development of a body of python code for studying
mathematical and scientific questions. A separation is begining to evolve
– in the functionality of the package – between, on th one hand,
pythonic utilities and some raw mathematical infrastructure (such as the tuple
of primes) and, on the other, scientific data and the machinery for representing
it faithfully. This may lead to me splitting it into two packages at some later
date; but, for the moment, I'm too busy adding stuff to actually find time to
re-organize what's there.
On the scientific side, my main emphasis has been the faithful representation of scientific quantities: while software for scientific computation generally concerns itself with attaining the best precision available (ignoring the imprecision of the input data), I have concerned myself more with having the numeric types involved keep track of the uncertainties in data and the units in which things are measured.
Units are generally ignored in scientific computation; the system of units in use implies the units associated with each quantity so, the reasoning goes, one only needs to know each quantity's value. There are two flaws with this, from a software maintenance (i.e. we know there exist bugs) point of view: omitting conversion factors (e.g. when one converts from electron-Volts to Joules, or gallons per minute to cubic feet per second) is all too easy, and hard to notice when trying to work out why the code is producing wrong answers; and quantities with different units should not be added together. Having quantities know their own units makes it possible for a program to raise an error in the latter case; while eliminating the former issue. It also provides a handy check, when one does a quick computation in an interactive session, that one has computed what one expected – if the units aren't right, something went wrong !
Since python supports emulation of numeric types via magic
methods, it's entirely practical to ensure that quantities behave as required,
even to the extent of getting nice display. When displayed (i.e. when
repr or str is called on it), a number only displays
one more decimal place than is justified by its precision. Thus, when it
displays Hubble's constant, study.chemy.physics.Cosmos.Hubble, as
2.3 * atto / second this package means that the data at my disposal
seem to indicate that there's at least a fifty percent likelihood that the value
lies between 1.5 aHz and 2.5 aHz (values in this
interval would all round to 2 aHz; in which a is the
abbreviation for quantifier atto = 1e-18 and Hz is the
short form for Hertz, an alternate name for 1/second)
and the next digit of my best estimate of the value is a 3, but
there's less than a fifty percent likelihood of the value falling between
2.25 aHz and 2.35 aHz.
I'm also a big fan of lazy evaluation; most of the types used in the package
support attributes which are only computed when they are first referenced. When
specifying one (or a few) of an object's attributes can suffice to determine
various others, this makes it possible for objects to know
many
attributes as soon as enough are specified. This carries the added advantage of
simplifying some cases where specifying enough of a set of attributes suffices
to determine the rest of the set, without it mattering (much) which attributes
in the set were specified. For example, specifying momentum, energy, frequency
or wavelength of a photon is sufficient to determine all of the others.
In on-going development (not yet released), I've been expanding on the basic python infrastructure I can use to handle lazy evaluation. I've also been arranging for weak lazy attributes; that is, dealing with attributes that can be computed on demand and remembered, but which can safely be forgotten if the garbage collector wishes to do so. This is particularly relevant in my on-going re-implementation of the tuple of prime numbers: the crude version available in the presently released code makes the hideous mistake of trying to hold all of its data in memory at once, even when most of that data has been saved to disk. The objects – deployed in the (as yet unreleased) new version – load data from disk but allow the garbage collector to discard the data once no longer actively in use. The class hierarchy of the infrastructure for the new cache is currently rather complex: I should probably analyse it some more and simplify it !
The mathematical infrastructure includes various standard functions, probability distributions and approximations. It provides classes for polynomials and permutations, an implementation of the find-unite algorithm for partitioning graphs, (an extended form of) Pascal's triangle and diverse tools used in assorted parts of this web-site.
The current version of this software is what evolved out of my messing
around with wanting to represent error bars and units. I sporadically dream of
doing a major re-write which resolves assorted issues I'm lumbered with by how I
got to where I am; however, it may be some time before I even complete thinking
through the design of that (I can fairly be accused of a bad case of the
second-system effect), let alone have anything better to show for it than what I
have now. In the mean time, what I've got works and I sporadically improve it;
if anyone else wants to play with it, I have a bzip2-ed
tar-ball available; I (as copyright holder) grant permission for
anyone to download it and use it for their own
amusement. If you want to do anything else with it, I'd be most interested, but
please get in touch with me. The principal reason why I haven't yet released it
under the GPL is the thought that I really should make it redundant with my
planned next edition first, so don't be afraid to ask.
For documentation, print the __doc__ attributes of relevant
objects (or just read doc-strings in the source). Unpack the
tar-ball (everything it contains is in a directory called
study) as a sub-directory of some directory in your
$PYTHONPATH (or simply in sys.path) and, in a
python session, import study or assorted sub-objects
of it. The package gives an over-view of the sub-packages it contains, each
sub-package explains what it contains. I make extensive use of hierarchical
name-spaces, each step of which indicates what it makes available.
Note that:
python's dir built-in doesn't always list all the
attributes actually available (and .__dict__.keys() is even less
complete), but each method with a name of form
_lazy_get_name_ contributes an attribute with the given
name. (Yes, I'm aware that recent versions of python
support property, which lets me define a nicer way to do lazy
attributes. I use this increasingly and intend to ultimately replace all the
old lazy-infrastructure, but it's not a high priority …) import of any major component of
study.value is apt to take quite a long time; don't worry, it's
just got a lot to set up ! study.maths.primes may lead to a cache directory being
created; otherwise, I can't think of anything this package attempts to modify on
your hard disk – however, 
Written by Eddy.