This page uses an antiquated notation and is in need of maintenance.

There's a standard framework for physics that dates back a good two centuries and is still a vital part of modern theories. It consists of Lagrange's equations of motion for a system and Hamilton's principle. Dirac, in discussing quantum mechanics, quietly points out the parallels between these and quantum mechanics. The differences all spring from the former's assumption, discarded in the latter, that multiplication is always commutative: that is, a×b = b×a.

In both formalisms, a dynamical system is described in terms of two families
of parameters, stereotypically position and momentum. The formalism describes
the system in terms of an action

function, H, which depends on these
parameters: from the way it depends on each parameter, we read the dynamics of
the system.

Crucially, there are as many

parameters of each kind as of the other:
indeed, there is a definite one-to-one association between the families, which
we can formalize using a label-set, which I'll call dim

for dimension, and
introducing functions (dim| q :) and (dim| p :) for which one family is {q(i): i
in dim}, the other is {p(i): i in dim} and, for each label, i, in dim, q(i) and
p(i) are associated with one another. In particular, there is some
multiplicative way of combining q(i) with p(i) to get a scalar (or, at least,
some quantity of a kind which doesn't depend on i) and it makes sense

to
examine the sum over all i in dim of such products. It is usual to take dim
to be a natural number.

In principle, each i in dim has its own space in which q(i) varies: the
matching p(i) varies in a corresponding space, but for any other j in dim, q(j)
could be some other space entirely. In principle, thus, we have mappings (dim|
Q :{parameter spaces}) and (dim| P :{parameter spaces}) with the parameter q(i)
varying over space Q(i) and p(i) varying over P(i). What matters is that, for
each i, we have a linear contraction

which combines a tangent to Q(i) with a
tangent to P(i) to produce a quantity of some kind which doesn't depend on i.
(Note that a tangent to a linear space is functionally a member of that linear
space: the difference is only relevant for smooth spaces.) In practice,
however, it suffices to treat Q as if it were some constant space, whose tangent
bundle I'll call T, and P likewise as constant with tangent bundle G, with the
linear contraction expressed, for t in T and g in G, as t*g. This is
classically taken to be a scalar but I'll hang fire on that: it suffices to know
that we can add such results and scale them.

Although T and G are the tangent bundles of the parameter spaces, I'll quietly elide the machinery of that while establishing the basics, and pretend they are the parameter spaces. Forget last paragraph's use of names Q, P and I'll re-use those names, below, for the collections, {(dim| :T)} = Q and {(dim| :G)} = P, of all possible values for q and p respectively. We can then deal with q and p as members of these two linear spaces, with a contraction, constructed on that between T and Q, that may sensibly be written p*q = sum(dim| i-> p(i)*q(i) :B) – I'll call the space of outputs of * B, so p*q and the earlier t*g are in B, which is a linear space, usually presumed one-dimensional.

The Hamiltonian is a function which takes a (dim| q :T) and a (dim| p :G) and returns a scalar quantity: H = (Q| q-> (P| p-> H(q,p) :S) :), with S = {scalars} or possibly some suitable (algebraic, and I suspect it'll have to be ordered or even positive) replacement. B might be S, or its dual. We have differential operators, I'll call them D0 and D1, with D0(H, q, p) being a linear map from Q to S which digests a perturbation in q and returns the change in H which would result (so this would be ∂H/∂q with ∂ the curved d denoting partial differentiation) and D1(H, q, p) is linear (P|:S) and describes how H(q, p) depends on p (locally). I'll use · to denote the natural linear contraction induced by trace, which corresponds directly to the general action and composition of mappings: this is what takes a small change in q and D0(H,q,p) and combines them to yield a small change in H(q,p), or equally gets similar from p and D1.

The system at any given moment is notionally described by a q in Q and a p in P, and these vary with time: denote their time-derivatives q' and p' (which will be linear maps from time-tangents to tangents in Q and P respectively). Let TT = {time-tangents}, which is one-dimensional and has a characteristic forward direction. Hamilton's rearrangement of Lagrange's formalism then gives q' = D1(H,q,p) and p' = −D0(H,q,p). These are linear (TT| :Q) and (TT| :P), respectively, so {linear (TT| :Q)} is {linear (P|:S)} and {linear (TT| :P)} is {linear (Q|:S)}, give or take some natural isomorphisms. We then have

- D1(H,q,p)·p' which is linear (TT| :S)
- q'·D0(H,q,p) which is the transpose of a linear (TT| :S)

and these are both q'·p', subject to some natural isomorphisms: which
might plausibly include the transposition indicated. But it doesn't.
D1(H)·p' binds D to p and leaves H to the left of the',
q'·D0(H) binds D to q and leaves ' to the left of H.
so S&tensor;dual(TT) = dual(TT)&tensor;S.
e.g. S = dual(TT). But equally, S = &tensor;[U,…,U], dual(TT) = &tensor;[U,…,U]
will do, with the list lengths not necessarily equal.
dual(TT) = {time-gradients}
but both S and TT are 1-dimensional !
so S is functionally dual(TT); and *vice versa*.
dual(TT)&tensor;S is then S&tensor;dual(TT)
That reads the transpose of a linear (TT| :S) as a linear (S| :TT).
We get equality between an (S| q'·D0(H) :TT) and a (TT| D1(H)·p' :S).
can we infer S = TT self-dual ?

Given a linear space U, for any S = &tensor;[U,…,U] and R =
&tensor;[U,…,U], regardless of the lengths of the lists, we can deal with
{linear (dual(S)|:R)} as {linear (dual(R)| :S)} via a natural isomorphism
induced by their respective descriptions as &tensor;[U,…,U] with a
list whose length is the sum of those the lengths of the lists for S and R.
This involves no re-ordering

, only re-partitioning of a list of length n+m as
a list of length m+n, as [a,…,g,h,…,z] = [a,…,m,n,…,z] with
length([a,…,g]) = length([n,…,z]) and length([h,…,z]) = length([a,…m]).

When U is 1-dimensional, however:
U&tensor;U or, indeed, &tensor;[U,…,U] for an arbitrarily long list,
is also 1-dimensional.
For more general V&tensor;W, we only have that a *typical* member
has the form v×w = (dual(W)| a-> v.a(w) :V) while the general
member of V&tensor;W is a sum of typical members. a in dual(W) means a(w) is
a scalar, for w in W.
However, for 1-dimensional V&tensor;W, we immediately know V and W are each
individually also 1-dimensional. Pick any non-zero v in V, w in W; so v×w
is in V&tensor;W as a typical member: and is non-zero; but V&tensor;W is 1-dimensional,
so that makes any general member simply a scalar multiple of it; take that scaling and
apply it to v (say); you've now got a typical member equal to the given general
member; so all members are typical.
This makes tensor products with 1-dimensional spaces more straightforward
(indeed, only one of V, W needs to be 1-dimensional for this to work; though all
but at most one of a list must be 1-dimensional for the list's &tensor; to be so
easy).
chose any non-zero u in U (chose it +ve if U has a natural sense of sign)
This gives a basis of U, [u] = (1| 0-> u :U).
For any v in U, we now have a scalar V for which v = V.u, which induces a linear
map (U|:{scalars}) given by v-> V. This depends on u, of course; it is the inverse

of u. Call it n. It's in dual(U). It's non-zero (it maps u to 1, for
a start). So [n] is a basis of dual(U).
Equipped with u, we have non-zero members of S and R, namely ×[u,…,u] of
the appropriate lengths: S and R are likewise 1-dimensional, so these provide
bases of S and R. Let s in S and r in R be these basis members. We have
s×r = r×s from the associativity of ×, without any shuffling
of order. We have, equally, members of dual(S) and dual(R) which, respectively,
map s and r to 1.
{linear (dual(S)| :R)} is R&tensor;S
{linear (dual(R)| :S)} is S&tensor;R
these are the same &tensor;[U,…,U] space, via associativity (which only
rearranges bracketing, not the order of terms, (a*b)*c to a*(b*c) and similar).
A linear (S| f :dual(R)) is characterized by f(s).r, which is a scalar: but it
does depend on our choice of u, i.e. our basis of U. The associated linear (R|
g :dual(S)) is characterized by having g(r).s = f(s).r, because f and g are this
scalar times s×r = r×s.
Since r and s are members of our 1-dimensional space

- q'*p' = (TT| t-> (TT| e-> q'(t)*p'(e) :S) :{linear (TT| :S)})
- D1(h,q,p) and D0(H,q,p)

If S and TT are both {scalars}, this just says that Q and P are mutually dual: in any case it defines a linear contraction between {linear (TT| :Q)} and {linear (TT| :P) yielding answers in S, just as * yields answers in B.

So consider the contraction of q' = (TT| t-> q'(t) :Q) with (TT| p' :P), yielding q'*p' = (TT| t-> q'(t)*p'(t) :B) in linear (TT| :B). Our other contraction, yielding In any case, I expect S to be one-dimensional: it may even be (or involve) the dual of {time-tangents}.

Problem: there's a presumption that we have some function (Q:time:E) for which we only care about (D|q:Q) if time&on;q is the identity on D.

Presume:

- a scalar domain, R,
- R-linear spaces V and W (both of which, with E and C below, are R for Lagrange)
- a compact subset, D (typically a time-interval), of an R-linear space E with dual C, with boundary ∂D,
- a measure, ({(D::R)}: m :W), on D, extending to ({(D::U)}: :W⊗U), for any linear space U, via tensor action: m also induces a measure ({(∂D::C)}: M :W) via the machinery of Stokes' theorem.
- a manifold Q, with tangent and gradient bundles (Q|q->T(q):) and (Q|q->G(q):), respectively, and
- a function L= (Q: q-> (C⊗T(q)| r-> (D| t-> L(q,r,t) :V) :) :) with (|L) presumed big enough to let us have plenty of free choice of (D|:L) – that is, (D|q:Q) with L accepting, as its first input, any output q produces in Q.

From these we construct:

- {(D| :L)}
the collection of mappings from our compact measure domain to our manifold for which every output is acceptable as L's first argument. For any given boundary condition (∂D| b :Q), we can restrict to {(D|q:L): (∂D:q:) = b}: where I use {(D| :L)} below I may well presume such a restriction.

- for any (D|q:Q), we have q' = (D| t-> q'(t) :C⊗T(q(t)))
this is D-tangent formation: for given (D|q:Q) and trajectory (R:s:D) which passes through t in D with tangent v in E, v·q'(t) is then in T(q(t)): it is the Q-tangent of (R: q&on;s :Q) as it passes through q(t).

- S = ((D|:L)| q-> m(D| t-> L(q(t), q'(t), t) :V) :W⊗V)
this is known as the (Lagrange)

**action**induced by L on the domain D.

We can also define derivatives of L

- ∂0(L)
- (Q: q-> (C⊗T(q)| r-> (D| t-> ∂0(L,q,r,t)
:G(q)⊗V) :) :)
so it accepts the same three inputs as L but produces ∂0(L,q,r,t) in G(q)⊗V rather than L(q,r,t) in V. If some trajectory (R:x:Q) has x(y) = q, then x'(y) is a tangent to Q at q, so x'(y) is in T(q). Then (R: z-> L(x(z),r,t) :V) has derivative x'(y)·∂0(L,q,r,t) at y: ∂0(L,q,r,t) is the appropriate linear operator accepting small changes in q and returning the consequent small changes in L, with r and t fixed.

The other derivatives of L likewise accept the same three inputs as L, with

- ∂1(L,q,r,t)
in dual(C⊗T(q))⊗V, a typical change in r being a member of C⊗T(q), and

- ∂2(L,q,r,t)
in C⊗V, since C is dual(E) and E subsumes D.

We now look at how small changes in (D|q:L) alter the value of S(q), with particular attention to when the value of (∂D:q:) is kept fixed. To this end, consider a function (R: f :{(D|:L)}) whose transpose (D| :{(R::L)}) produces only smooth trajectories {(R::Q)} as outputs, with f(0) = q and (|f) containing some open neighbourhood of 0. Let v = (D| t-> (R: r-> f(r,t) :L)'(0) :T(q(t))), so v(t) = ∂0(f,0,t) is the tangent of the trajectory (R: transpose(f,t) :L) as it passes through q(t) = f(0,t) = transpose(f,t,0). [So v(t).delta, for some small delta, would be the q-displacement if Q were a linear space.] If each (∂D: f(r) :) is equal to (∂D: q :), we obtain (∂D: v |{0}).

I now suppose we have a derivative of v, (D| t-> w(t) :C⊗T(q(t))) = (D| t-> v(t) :T(q(t)))' = (D| t-> ∂0(f,0,t) :)' = ∂1(∂0(f),0) = ∂0(∂1(f),0) = (R| r-> ∂1(f,r) :)'(0) = (R| r-> (D| t-> f(r,t) :L)' :)'(0) = (R| r-> f(r)' :)'(0) write p = (D| t-> ∂1(L,q(t),q'(t),t) :dual(C⊗T(q(t)))⊗V) and likewise suppose we have p' at our disposal and permute its tensor factors to make p'(t) be in C⊗G(q(t))⊗E⊗V. Note that p(t) is a linear map from D-tangents at q(t) in Q to V: so q'(t)·p(t) is in V. Apply Stokes' theorem as m(D| t-> w(t)·∂1(L,q(t),q'(t),t) :V) = M(∂D| t-> v(t)·∂1(L,q(t),q'(t),t) :) −m(D| t-> v(t)·p'(t) :) with v zero on ∂D, the M() term vanishes and we obtain (S&on;f)'(0) = (R: r-> S(f(r)) :W⊗V)'(0) = (R: r-> m(D| t-> L(f(r,t), f(r)'(t), t) :V) :)'(0) = m(D| t-> v(t)·∂0(L,q(t),q'(t),t) +w(t)·∂1(L,q(t),q'(t),t) :W⊗V) = m(D| t-> v(t)·(∂0(L,q(t),q'(t),t) −p'(t)) :W⊗V) Since v(t) is arbitrary, we obtain: (S&on;f)'(0) = 0 precisely if p'(t) = ∂0(L,q(t),q'(t),t).

That gives us Lagrange's equations. The importance of the Lagrangian approach rests on the fact that there are plenty of real systems which are well-modelled by a description in terms of an action on a trajectory – or, equivalently, by differential equations having the form of Lagrange's (or, as we'll see, Hamilton's) equations.

It is of note that a model in terms of an integral along a trajectory being
stationary is about as simple a general specification of a curve as one can
construct on a smooth manifold: and the above construction actually works just
fine with D more interesting than just a trajectory. The integral could be of a
simpler function – L could depend only on q, for instance: but the result would
tell us static

truths about the sub-manifold (D:q|) of Q rather than dynamic

truths about the trajectory q itself. The minimal extra complexities one can
add to L are then dependence on when, in your trajectory, you pass through q and
on the velocity

at which the trajectory does so: these are just the
complexities introduced for the Lagrangian.

For any t in D, q(t) is a position on the manifold Q: p(t) =
∂1(L,q(t),q'(t),t) is in G(q(t))⊗E⊗V or, strictly,
dual(C⊗T(q(t)))⊗V. For any point m in Q, the (D|:Q) trajectories,
q, through m can have any member of C⊗T(m) as their q' as they pass
through m. If V is 1-dimensional (and possibly under some other circumstances)
we may find (C⊗T(m)| r-> (D| t-> ∂1(L,m,r,t)
:G(m)⊗E⊗V) :) invertible in the sense of providing a mapping,
which I'll call tod(m)

, (G(m)⊗E⊗V| p-> (D| ∂1(L,m,r,t) =
p, t->r :C⊗T(m)) :) at each point, m, of Q. This defines a mapping
(Q|tod:), of course, which lets us define H(m,p) = tod(m,p)·p −L(m,tod(m,p)) or, more strictly, H(m,p) = (D| t-> tod(m,p,t)·p −L(m,tod(m,p,t),t) :V). This makes H = (Q: m-> (G(m)⊗E⊗V|
p-> (D| t-> H(m,p,t) :V) :) :).

So Hamilton's formalism introduces a function H which wants inputs m in Q followed by p in G(m)⊗E⊗V: which is just the kind of thing that arises as the gradient, at m, of a function (M| :E⊗V), with E being the tangent bundle of D. H(q,p) = q'·p −L(q,q') and obtain ∂0(H,q,p) = −∂0(L,q,q') = −p', ∂1(H,q,p) = q' with p = ∂1(L,q,q') S(q) = m(D| t-> q'(t)·∂1(L,q(t),q'(t),t) −L(q(t),q'(t),t) :V) stationary under variation in q precisely if

(Q| q-> (dual(C⊗T(q))⊗V| p-> u(q,p) :U(q))

Define, for each q in Q, P(q) = dual(C⊗T(q))⊗V: we have p in P. Given functions u = (Q| q-> (P(q)| p-> u(q,p) :U(q)) :) and x = (Q| q-> (P(q)| p-> x(q,p) :X(q)) :) we have ∂0(u) = (Q| q-> (P(q)| p-> ∂0(u,q,p) :{linear (T(q)|:U(q))}) :) ∂1(u) = (Q| q-> (P(q)| p-> ∂1(u,q,p) :{linear (P(q)|:U(q))}) :) and similar for x. We can express {linear (P(q)|:U(q))} as U(q)⊗dual(P(q)) = U(q)&tensor;dual(V)&tensor;C⊗T(q) and contract any linear (T(q)| h :U(q)) with any linear (P(q)| f :X(q)) to get τ([3,*,2,1,0,*], [U,G,X,dual(V),C,T], h×f) a member of U⊗X⊗dual(V)⊗C or {linear (E⊗V| :U⊗X)}. Doing this with f = ∂0(u,q,p) and h = ∂1(x,q,p) gives τ[3,*,2,1,0,*](∂0(u)⊗∂1(x)) as a mapping (Q| q-> (P(q)| :{linear (E⊗V| :U(q)⊗X(q))}) :), which may be read (by shuffling the order in which it accepts its arguments, exploiting that E⊗V doesn't depend on q) as a (Q| q-> :{linear (E⊗V| :{(P(q)| :U(q)⊗X(q))})}), hence a section of {linear (E⊗V| :{(P| :U⊗X)})}. Note that the mappings (P| :U⊗X) aren't constrained to be linear. Likewise, τ([3,1,0,*,2,*], [U,dual(V),C,T,X,G]) contracts any linear (P|:U)×(T|:X) to give a U⊗X⊗dual(V)⊗C. Hence we obtain τ[3,1,0,*,2,*](∂1(u)×∂0(x)) as a section of {linear (E⊗V| :{(P| :U⊗X)})}. likewise, we can contract (T(q)| h :U(q)) with linear (G(q)| x×t :X(q)) to get x×h(t) in U(q)⊗X(q). Consequently, we have well-defined operators [u,x]-> ∂0(u)·∂1(x), i.e. τ([1,*,0,*], [U,G,X,T], ∂0(u)×∂1(x)) ∂1(u)·∂0(x), i.e. τ([1,*,0,*], [U,T,X,G], ∂1(u)×∂0(x)) both in U⊗X, so we can take their difference and obtain ∂0(u)·∂1(x) −∂1(u)·∂0(x)

Now consider k = (Q| q-> (T(q)| p-> k(q,p) :{scalars}) :) and examine what we get by applying the above operator to u×k.x and u.k×x. We have ∂0(k.x) = ∂0(k)×x +k.∂0(x) and similar, so [u, k.x]-> ∂0(u)·∂1(k)×x +∂0(u).k·;∂1(x) −∂1(u)·∂0(k)×x −∂1(u).k·;∂0(x) [u.k, x]-> u×∂0(k)·∂1(x) +∂0(u).k·∂1(x) −u×∂1(k)·∂0(x) −∂1(u).k·∂0(x) differing by ∂0(u)·∂1(k)×x +u×∂1(k)·∂0(x) −∂1(u)·∂0(k)×x −u×∂0(k)·∂1(x) which is (∂0(u)·∂1(k) −∂1(u)·∂0(k))×x +u×(∂1(k)·∂0(x) −∂0(k)·∂1(x)) or pb(u,k)×x −u×pb(k,x)

Indeed,

- pb(u, x×w)
- = ∂0(u)·∂1(x×w) −∂1(u)·∂0(x×w)
- = ∂0(u)·∂1(x)×w +τ([*,2,1,*,0], [G,U,X,T,W], ∂0(u)×x×∂1(w)) −∂1(u)·∂0(x)×w −τ([*,2,1,*,0], [T,U,X,G,W], ∂1(u)×x×∂0(w))
- = pb(u,x)×w +τ([2,0,1], [U,W,X], pb(u,w)×x)
- pb(u×x, w)
- = ∂0(u×x)·∂1(w) −∂1(u×x)·∂0(w)
- = τ([*,2,1,*,0], [G,U,X,T,W], ∂0(u)×x×∂1(w)) +u×(∂0(x)·∂1(w)) −τ([*,2,1,*,0], [T,U,X,G,W], ∂1(u)×x×∂0(w)) −u×(∂1(x)·∂0(w))
- = τ([2,0,1], [U,W,X], pb(u,w)×x) +u×pb(x,w)
- pb(u, x×w) −pb(u×x, w)
- = pb(u,x)×w −u×pb(x,w)
- pb(t×u, x×w)
- = pb(t×u, x)×w +τ([3,2,0,1], [T,U,W,X], pb(t×u,w)×x)
- = t×pb(u,x)×w +τ([3,1,2,0], [T,X,U,W], pb(t,x)×u×w) +τ([3,2,0,1], [T,U,W,X], t×pb(u,w)×x +τ([3,1,2,0], [T,W,U,X], pb(t,w)×u×x))
- = t×pb(u,x)×w +τ[2,3,1,0](u×pb(t,x)×w) +τ[3,2,0,1](t×pb(u,w)×x) +τ[2,3,0,1](u×pb(t,w)×x)
- = t×pb(u, x×w) +τ([2,3,1,0], [U,T,X,W], u×pb(t, x×w))
- = t×(pb(u,x)×w +τ([2,0,1], [U,W,X], pb(u,w)×x)) +τ([2,3,1,0], [U,T,X,W], u×(pb(t,x)×w +τ([2,0,1], [T,W,X], pb(t,w)×x)) )
- = t×pb(u,x)×w +τ[3,2,0,1](t×pb(u,w)×x) +τ[2,3,1,0](u×pb(t,x)×w) +τ[2,3,0,1](u×pb(t,w)×x)