# The Lagrangian and Hamiltonian Formalisms

This page uses an antiquated notation and is in need of maintenance.

There's a standard framework for physics that dates back a good two centuries and is still a vital part of modern theories. It consists of Lagrange's equations of motion for a system and Hamilton's principle. Dirac, in discussing quantum mechanics, quietly points out the parallels between these and quantum mechanics. The differences all spring from the former's assumption, discarded in the latter, that multiplication is always commutative: that is, a×b = b×a.

In both formalisms, a dynamical system is described in terms of two families of parameters, stereotypically position and momentum. The formalism describes the system in terms of an action function, H, which depends on these parameters: from the way it depends on each parameter, we read the dynamics of the system.

Crucially, there are as many parameters of each kind as of the other: indeed, there is a definite one-to-one association between the families, which we can formalize using a label-set, which I'll call dim for dimension, and introducing functions (dim| q :) and (dim| p :) for which one family is {q(i): i in dim}, the other is {p(i): i in dim} and, for each label, i, in dim, q(i) and p(i) are associated with one another. In particular, there is some multiplicative way of combining q(i) with p(i) to get a scalar (or, at least, some quantity of a kind which doesn't depend on i) and it makes sense to examine the sum over all i in dim of such products. It is usual to take dim to be a natural number.

In principle, each i in dim has its own space in which q(i) varies: the matching p(i) varies in a corresponding space, but for any other j in dim, q(j) could be some other space entirely. In principle, thus, we have mappings (dim| Q :{parameter spaces}) and (dim| P :{parameter spaces}) with the parameter q(i) varying over space Q(i) and p(i) varying over P(i). What matters is that, for each i, we have a linear contraction which combines a tangent to Q(i) with a tangent to P(i) to produce a quantity of some kind which doesn't depend on i. (Note that a tangent to a linear space is functionally a member of that linear space: the difference is only relevant for smooth spaces.) In practice, however, it suffices to treat Q as if it were some constant space, whose tangent bundle I'll call T, and P likewise as constant with tangent bundle G, with the linear contraction expressed, for t in T and g in G, as t*g. This is classically taken to be a scalar but I'll hang fire on that: it suffices to know that we can add such results and scale them.

Although T and G are the tangent bundles of the parameter spaces, I'll quietly elide the machinery of that while establishing the basics, and pretend they are the parameter spaces. Forget last paragraph's use of names Q, P and I'll re-use those names, below, for the collections, {(dim| :T)} = Q and {(dim| :G)} = P, of all possible values for q and p respectively. We can then deal with q and p as members of these two linear spaces, with a contraction, constructed on that between T and Q, that may sensibly be written p*q = sum(dim| i-> p(i)*q(i) :B) – I'll call the space of outputs of * B, so p*q and the earlier t*g are in B, which is a linear space, usually presumed one-dimensional.

The Hamiltonian is a function which takes a (dim| q :T) and a (dim| p :G) and returns a scalar quantity: H = (Q| q-> (P| p-> H(q,p) :S) :), with S = {scalars} or possibly some suitable (algebraic, and I suspect it'll have to be ordered or even positive) replacement. B might be S, or its dual. We have differential operators, I'll call them D0 and D1, with D0(H, q, p) being a linear map from Q to S which digests a perturbation in q and returns the change in H which would result (so this would be ∂H/∂q with ∂ the curved d denoting partial differentiation) and D1(H, q, p) is linear (P|:S) and describes how H(q, p) depends on p (locally). I'll use · to denote the natural linear contraction induced by trace, which corresponds directly to the general action and composition of mappings: this is what takes a small change in q and D0(H,q,p) and combines them to yield a small change in H(q,p), or equally gets similar from p and D1.

The system at any given moment is notionally described by a q in Q and a p in P, and these vary with time: denote their time-derivatives q' and p' (which will be linear maps from time-tangents to tangents in Q and P respectively). Let TT = {time-tangents}, which is one-dimensional and has a characteristic forward direction. Hamilton's rearrangement of Lagrange's formalism then gives q' = D1(H,q,p) and p' = −D0(H,q,p). These are linear (TT| :Q) and (TT| :P), respectively, so {linear (TT| :Q)} is {linear (P|:S)} and {linear (TT| :P)} is {linear (Q|:S)}, give or take some natural isomorphisms. We then have

• D1(H,q,p)·p' which is linear (TT| :S)
• q'·D0(H,q,p) which is the transpose of a linear (TT| :S)

and these are both q'·p', subject to some natural isomorphisms: which might plausibly include the transposition indicated. But it doesn't. D1(H)·p' binds D to p and leaves H to the left of the', q'·D0(H) binds D to q and leaves ' to the left of H. so S&tensor;dual(TT) = dual(TT)&tensor;S. e.g. S = dual(TT). But equally, S = &tensor;[U,…,U], dual(TT) = &tensor;[U,…,U] will do, with the list lengths not necessarily equal. dual(TT) = {time-gradients} but both S and TT are 1-dimensional ! so S is functionally dual(TT); and vice versa. dual(TT)&tensor;S is then S&tensor;dual(TT) That reads the transpose of a linear (TT| :S) as a linear (S| :TT). We get equality between an (S| q'·D0(H) :TT) and a (TT| D1(H)·p' :S). can we infer S = TT self-dual ?

Given a linear space U, for any S = &tensor;[U,…,U] and R = &tensor;[U,…,U], regardless of the lengths of the lists, we can deal with {linear (dual(S)|:R)} as {linear (dual(R)| :S)} via a natural isomorphism induced by their respective descriptions as &tensor;[U,…,U] with a list whose length is the sum of those the lengths of the lists for S and R. This involves no re-ordering, only re-partitioning of a list of length n+m as a list of length m+n, as [a,…,g,h,…,z] = [a,…,m,n,…,z] with length([a,…,g]) = length([n,…,z]) and length([h,…,z]) = length([a,…m]).

When U is 1-dimensional, however: U&tensor;U or, indeed, &tensor;[U,…,U] for an arbitrarily long list, is also 1-dimensional. For more general V&tensor;W, we only have that a typical member has the form v×w = (dual(W)| a-> v.a(w) :V) while the general member of V&tensor;W is a sum of typical members. a in dual(W) means a(w) is a scalar, for w in W. However, for 1-dimensional V&tensor;W, we immediately know V and W are each individually also 1-dimensional. Pick any non-zero v in V, w in W; so v×w is in V&tensor;W as a typical member: and is non-zero; but V&tensor;W is 1-dimensional, so that makes any general member simply a scalar multiple of it; take that scaling and apply it to v (say); you've now got a typical member equal to the given general member; so all members are typical. This makes tensor products with 1-dimensional spaces more straightforward (indeed, only one of V, W needs to be 1-dimensional for this to work; though all but at most one of a list must be 1-dimensional for the list's &tensor; to be so easy). chose any non-zero u in U (chose it +ve if U has a natural sense of sign) This gives a basis of U, [u] = (1| 0-> u :U). For any v in U, we now have a scalar V for which v = V.u, which induces a linear map (U|:{scalars}) given by v-> V. This depends on u, of course; it is the inverse of u. Call it n. It's in dual(U). It's non-zero (it maps u to 1, for a start). So [n] is a basis of dual(U). Equipped with u, we have non-zero members of S and R, namely ×[u,…,u] of the appropriate lengths: S and R are likewise 1-dimensional, so these provide bases of S and R. Let s in S and r in R be these basis members. We have s×r = r×s from the associativity of ×, without any shuffling of order. We have, equally, members of dual(S) and dual(R) which, respectively, map s and r to 1. {linear (dual(S)| :R)} is R&tensor;S {linear (dual(R)| :S)} is S&tensor;R these are the same &tensor;[U,…,U] space, via associativity (which only rearranges bracketing, not the order of terms, (a*b)*c to a*(b*c) and similar). A linear (S| f :dual(R)) is characterized by f(s).r, which is a scalar: but it does depend on our choice of u, i.e. our basis of U. The associated linear (R| g :dual(S)) is characterized by having g(r).s = f(s).r, because f and g are this scalar times s×r = r×s. Since r and s are members of our 1-dimensional space

• q'*p' = (TT| t-> (TT| e-> q'(t)*p'(e) :S) :{linear (TT| :S)})
• D1(h,q,p) and D0(H,q,p)

If S and TT are both {scalars}, this just says that Q and P are mutually dual: in any case it defines a linear contraction between {linear (TT| :Q)} and {linear (TT| :P) yielding answers in S, just as * yields answers in B.

So consider the contraction of q' = (TT| t-> q'(t) :Q) with (TT| p' :P), yielding q'*p' = (TT| t-> q'(t)*p'(t) :B) in linear (TT| :B). Our other contraction, yielding In any case, I expect S to be one-dimensional: it may even be (or involve) the dual of {time-tangents}.

## Older ramblings … notational mess.

Problem: there's a presumption that we have some function (Q:time:E) for which we only care about (D|q:Q) if time&on;q is the identity on D.

Presume:

• a scalar domain, R,
• R-linear spaces V and W (both of which, with E and C below, are R for Lagrange)
• a compact subset, D (typically a time-interval), of an R-linear space E with dual C, with boundary ∂D,
• a measure, ({(D::R)}: m :W), on D, extending to ({(D::U)}: :W⊗U), for any linear space U, via tensor action: m also induces a measure ({(∂D::C)}: M :W) via the machinery of Stokes' theorem.
• a manifold Q, with tangent and gradient bundles (Q|q->T(q):) and (Q|q->G(q):), respectively, and
• a function L= (Q: q-> (C⊗T(q)| r-> (D| t-> L(q,r,t) :V) :) :) with (|L) presumed big enough to let us have plenty of free choice of (D|:L) – that is, (D|q:Q) with L accepting, as its first input, any output q produces in Q.

From these we construct:

{(D| :L)}

the collection of mappings from our compact measure domain to our manifold for which every output is acceptable as L's first argument. For any given boundary condition (∂D| b :Q), we can restrict to {(D|q:L): (∂D:q:) = b}: where I use {(D| :L)} below I may well presume such a restriction.

for any (D|q:Q), we have q' = (D| t-> q'(t) :C⊗T(q(t)))

this is D-tangent formation: for given (D|q:Q) and trajectory (R:s:D) which passes through t in D with tangent v in E, v·q'(t) is then in T(q(t)): it is the Q-tangent of (R: q&on;s :Q) as it passes through q(t).

S = ((D|:L)| q-> m(D| t-> L(q(t), q'(t), t) :V) :W⊗V)

this is known as the (Lagrange) action induced by L on the domain D.

We can also define derivatives of L

∂0(L)
(Q: q-> (C⊗T(q)| r-> (D| t-> ∂0(L,q,r,t) :G(q)⊗V) :) :)

so it accepts the same three inputs as L but produces ∂0(L,q,r,t) in G(q)⊗V rather than L(q,r,t) in V. If some trajectory (R:x:Q) has x(y) = q, then x'(y) is a tangent to Q at q, so x'(y) is in T(q). Then (R: z-> L(x(z),r,t) :V) has derivative x'(y)·∂0(L,q,r,t) at y: ∂0(L,q,r,t) is the appropriate linear operator accepting small changes in q and returning the consequent small changes in L, with r and t fixed.

The other derivatives of L likewise accept the same three inputs as L, with

∂1(L,q,r,t)

in dual(C⊗T(q))⊗V, a typical change in r being a member of C⊗T(q), and

∂2(L,q,r,t)

in C⊗V, since C is dual(E) and E subsumes D.

We now look at how small changes in (D|q:L) alter the value of S(q), with particular attention to when the value of (∂D:q:) is kept fixed. To this end, consider a function (R: f :{(D|:L)}) whose transpose (D| :{(R::L)}) produces only smooth trajectories {(R::Q)} as outputs, with f(0) = q and (|f) containing some open neighbourhood of 0. Let v = (D| t-> (R: r-> f(r,t) :L)'(0) :T(q(t))), so v(t) = ∂0(f,0,t) is the tangent of the trajectory (R: transpose(f,t) :L) as it passes through q(t) = f(0,t) = transpose(f,t,0). [So v(t).delta, for some small delta, would be the q-displacement if Q were a linear space.] If each (∂D: f(r) :) is equal to (∂D: q :), we obtain (∂D: v |{0}).

I now suppose we have a derivative of v, (D| t-> w(t) :C⊗T(q(t))) = (D| t-> v(t) :T(q(t)))' = (D| t-> ∂0(f,0,t) :)' = ∂1(∂0(f),0) = ∂0(∂1(f),0) = (R| r-> ∂1(f,r) :)'(0) = (R| r-> (D| t-> f(r,t) :L)' :)'(0) = (R| r-> f(r)' :)'(0) write p = (D| t-> ∂1(L,q(t),q'(t),t) :dual(C⊗T(q(t)))⊗V) and likewise suppose we have p' at our disposal and permute its tensor factors to make p'(t) be in C⊗G(q(t))⊗E⊗V. Note that p(t) is a linear map from D-tangents at q(t) in Q to V: so q'(t)·p(t) is in V. Apply Stokes' theorem as m(D| t-> w(t)·∂1(L,q(t),q'(t),t) :V) = M(∂D| t-> v(t)·∂1(L,q(t),q'(t),t) :) −m(D| t-> v(t)·p'(t) :) with v zero on ∂D, the M() term vanishes and we obtain (S&on;f)'(0) = (R: r-> S(f(r)) :W⊗V)'(0) = (R: r-> m(D| t-> L(f(r,t), f(r)'(t), t) :V) :)'(0) = m(D| t-> v(t)·∂0(L,q(t),q'(t),t) +w(t)·∂1(L,q(t),q'(t),t) :W⊗V) = m(D| t-> v(t)·(∂0(L,q(t),q'(t),t) −p'(t)) :W⊗V) Since v(t) is arbitrary, we obtain: (S&on;f)'(0) = 0 precisely if p'(t) = ∂0(L,q(t),q'(t),t).

That gives us Lagrange's equations. The importance of the Lagrangian approach rests on the fact that there are plenty of real systems which are well-modelled by a description in terms of an action on a trajectory – or, equivalently, by differential equations having the form of Lagrange's (or, as we'll see, Hamilton's) equations.

It is of note that a model in terms of an integral along a trajectory being stationary is about as simple a general specification of a curve as one can construct on a smooth manifold: and the above construction actually works just fine with D more interesting than just a trajectory. The integral could be of a simpler function – L could depend only on q, for instance: but the result would tell us static truths about the sub-manifold (D:q|) of Q rather than dynamic truths about the trajectory q itself. The minimal extra complexities one can add to L are then dependence on when, in your trajectory, you pass through q and on the velocity at which the trajectory does so: these are just the complexities introduced for the Lagrangian.

For any t in D, q(t) is a position on the manifold Q: p(t) = ∂1(L,q(t),q'(t),t) is in G(q(t))⊗E⊗V or, strictly, dual(C⊗T(q(t)))⊗V. For any point m in Q, the (D|:Q) trajectories, q, through m can have any member of C⊗T(m) as their q' as they pass through m. If V is 1-dimensional (and possibly under some other circumstances) we may find (C⊗T(m)| r-> (D| t-> ∂1(L,m,r,t) :G(m)⊗E⊗V) :) invertible in the sense of providing a mapping, which I'll call tod(m), (G(m)⊗E⊗V| p-> (D| ∂1(L,m,r,t) = p, t->r :C⊗T(m)) :) at each point, m, of Q. This defines a mapping (Q|tod:), of course, which lets us define H(m,p) = tod(m,p)·p −L(m,tod(m,p)) or, more strictly, H(m,p) = (D| t-> tod(m,p,t)·p −L(m,tod(m,p,t),t) :V). This makes H = (Q: m-> (G(m)⊗E⊗V| p-> (D| t-> H(m,p,t) :V) :) :).

So Hamilton's formalism introduces a function H which wants inputs m in Q followed by p in G(m)⊗E⊗V: which is just the kind of thing that arises as the gradient, at m, of a function (M| :E⊗V), with E being the tangent bundle of D. H(q,p) = q'·p −L(q,q') and obtain ∂0(H,q,p) = −∂0(L,q,q') = −p', ∂1(H,q,p) = q' with p = ∂1(L,q,q') S(q) = m(D| t-> q'(t)·∂1(L,q(t),q'(t),t) −L(q(t),q'(t),t) :V) stationary under variation in q precisely if

## The Poisson Bracket

(Q| q-> (dual(C⊗T(q))⊗V| p-> u(q,p) :U(q))

Define, for each q in Q, P(q) = dual(C⊗T(q))⊗V: we have p in P. Given functions u = (Q| q-> (P(q)| p-> u(q,p) :U(q)) :) and x = (Q| q-> (P(q)| p-> x(q,p) :X(q)) :) we have ∂0(u) = (Q| q-> (P(q)| p-> ∂0(u,q,p) :{linear (T(q)|:U(q))}) :) ∂1(u) = (Q| q-> (P(q)| p-> ∂1(u,q,p) :{linear (P(q)|:U(q))}) :) and similar for x. We can express {linear (P(q)|:U(q))} as U(q)⊗dual(P(q)) = U(q)&tensor;dual(V)&tensor;C⊗T(q) and contract any linear (T(q)| h :U(q)) with any linear (P(q)| f :X(q)) to get τ([3,*,2,1,0,*], [U,G,X,dual(V),C,T], h×f) a member of U⊗X⊗dual(V)⊗C or {linear (E⊗V| :U⊗X)}. Doing this with f = ∂0(u,q,p) and h = ∂1(x,q,p) gives τ[3,*,2,1,0,*](∂0(u)⊗∂1(x)) as a mapping (Q| q-> (P(q)| :{linear (E⊗V| :U(q)⊗X(q))}) :), which may be read (by shuffling the order in which it accepts its arguments, exploiting that E⊗V doesn't depend on q) as a (Q| q-> :{linear (E⊗V| :{(P(q)| :U(q)⊗X(q))})}), hence a section of {linear (E⊗V| :{(P| :U⊗X)})}. Note that the mappings (P| :U⊗X) aren't constrained to be linear. Likewise, τ([3,1,0,*,2,*], [U,dual(V),C,T,X,G]) contracts any linear (P|:U)×(T|:X) to give a U⊗X⊗dual(V)⊗C. Hence we obtain τ[3,1,0,*,2,*](∂1(u)×∂0(x)) as a section of {linear (E⊗V| :{(P| :U⊗X)})}. likewise, we can contract (T(q)| h :U(q)) with linear (G(q)| x×t :X(q)) to get x×h(t) in U(q)⊗X(q). Consequently, we have well-defined operators [u,x]-> ∂0(u)·∂1(x), i.e. τ([1,*,0,*], [U,G,X,T], ∂0(u)×∂1(x)) ∂1(u)·∂0(x), i.e. τ([1,*,0,*], [U,T,X,G], ∂1(u)×∂0(x)) both in U⊗X, so we can take their difference and obtain ∂0(u)·∂1(x) −∂1(u)·∂0(x)

Now consider k = (Q| q-> (T(q)| p-> k(q,p) :{scalars}) :) and examine what we get by applying the above operator to u×k.x and u.k×x. We have ∂0(k.x) = ∂0(k)×x +k.∂0(x) and similar, so [u, k.x]-> ∂0(u)·∂1(k)×x +∂0(u).k·;∂1(x) −∂1(u)·∂0(k)×x −∂1(u).k·;∂0(x) [u.k, x]-> u×∂0(k)·∂1(x) +∂0(u).k·∂1(x) −u×∂1(k)·∂0(x) −∂1(u).k·∂0(x) differing by ∂0(u)·∂1(k)×x +u×∂1(k)·∂0(x) −∂1(u)·∂0(k)×x −u×∂0(k)·∂1(x) which is (∂0(u)·∂1(k) −∂1(u)·∂0(k))×x +u×(∂1(k)·∂0(x) −∂0(k)·∂1(x)) or pb(u,k)×x −u×pb(k,x)

Indeed,

pb(u, x×w)
= ∂0(u)·∂1(x×w) −∂1(u)·∂0(x×w)
= ∂0(u)·∂1(x)×w +τ([*,2,1,*,0], [G,U,X,T,W], ∂0(u)×x×∂1(w)) −∂1(u)·∂0(x)×w −τ([*,2,1,*,0], [T,U,X,G,W], ∂1(u)×x×∂0(w))
= pb(u,x)×w +τ([2,0,1], [U,W,X], pb(u,w)×x)
pb(u×x, w)
= ∂0(u×x)·∂1(w) −∂1(u×x)·∂0(w)
= τ([*,2,1,*,0], [G,U,X,T,W], ∂0(u)×x×∂1(w)) +u×(∂0(x)·∂1(w)) −τ([*,2,1,*,0], [T,U,X,G,W], ∂1(u)×x×∂0(w)) −u×(∂1(x)·∂0(w))
= τ([2,0,1], [U,W,X], pb(u,w)×x) +u×pb(x,w)
pb(u, x×w) −pb(u×x, w)
= pb(u,x)×w −u×pb(x,w)
pb(t×u, x×w)
= pb(t×u, x)×w +τ([3,2,0,1], [T,U,W,X], pb(t×u,w)×x)
= t×pb(u,x)×w +τ([3,1,2,0], [T,X,U,W], pb(t,x)×u×w) +τ([3,2,0,1], [T,U,W,X], t×pb(u,w)×x +τ([3,1,2,0], [T,W,U,X], pb(t,w)×u×x))
= t×pb(u,x)×w +τ[2,3,1,0](u×pb(t,x)×w) +τ[3,2,0,1](t×pb(u,w)×x) +τ[2,3,0,1](u×pb(t,w)×x)
= t×pb(u, x×w) +τ([2,3,1,0], [U,T,X,W], u×pb(t, x×w))
= t×(pb(u,x)×w +τ([2,0,1], [U,W,X], pb(u,w)×x)) +τ([2,3,1,0], [U,T,X,W], u×(pb(t,x)×w +τ([2,0,1], [T,W,X], pb(t,w)×x)) )
= t×pb(u,x)×w +τ[3,2,0,1](t×pb(u,w)×x) +τ[2,3,1,0](u×pb(t,x)×w) +τ[2,3,0,1](u×pb(t,w)×x)

Valid CSS ? Valid HTML ? Written by Eddy.