Other related fragments

- tensors, transposition and trace
- linear structure
- scalars
- foundations, denotations and primitives.
- `up'

This page is due to be retired in favour of a newer treatment of the same material … but that may take some time.

For any linear spaces U, V, there is a natural mapping (: ({linear (U:|V)}: u×w = (U: u.w(v) ←v |V) ←w |dual(V)) ←u |U) which may be construed as a multiplication between members of U and members of V's dual. The span of the products × produces is denoted U⊗dual(V), which I'll intermittently write as U&tensor;dual(V) until I kick the habit or discover how to make it as likely to display sensibly as is ⊗. Unless both U and V are infinite-dimensional, U&tensor;dual(V) is the whole of {linear (U:|V)}.

Define multi = (: (: multi(s,W) ←W :{linear spaces}) ← ({linear spaces}:s:) :{non-empty lists}) by: multi([U],W) = {linear (W: |U)}, multi([U]@s,W) = {linear (multi(s,W): |U)}. Then multi([U,...,V],W) is the space of multilinear maps from U,... and V to W and unite(|multi([U,...,V]):) is the collection of multilinear maps from U,... and V.

Consider a linear space U and some u in U. For any linear space W this induces a linear map (W: f(u) ←(W:f|U) :{linear}). So `supply as input to' is a linear map (: W is a linear space; (W: f(u)←(W:f|U) :{linear}) ←u |U) for each linear space U. Likewise, for any list, [U,...,V], of linear spaces and any linear space W, any [u,...,v] with u in U, ..., v in V induces a linear map (W: w(u,...,v) ←w |multi([U,...,V],W)).

The foregoing shows that each list [u,...,v] induces a linear map (W: w(u,...,v) ←w :multi([U,...,V],W)) for each linear space W For given linear spaces U,... and V we have multilinear maps from U,... and V to any linear space, W; Thus any list [u,...,v] in Renee([U,...,V]) induces a polymorphic

Define supply = (: (: x(u,...,v) ←x :) ←[u,...,v] :{lists}) i.e. supply([a]) = (: x(a) ←x :) for any value a and supply(s@t) = (: supply(t, supply(s, x)) ←x :) for all lists s and t, with @ defined by [a,...,m]@[n,...,z] = [a,...,m,n,...,z]. Define (: M :{non-empty lists of linear spaces}) by: M([U]) = {linear (: :U)}, M([U]@s) = {linear (M(s): :U)} for any linear space U and non-empty list, s, of linear spaces: M([U,...,V]) is the union, over linear spaces W, of the space of multilinear maps from U, ... and V to W. We can then examine, for given linear spaces U, ..., V,

- (: (: s :M([U,...,V])) ←s :)&on;(: supply :Renee([U,...,V]))

which restricts supply([u,...,v]) down to its action on multilinear
maps from U, ... and V; on this space, supply([u,...,v]) acts as a linear map,
mapping any linear (...{linear (W: :V)}...: w :U) to w(u,...,v) which depends
linearly on w (for fixed u, ..., v), for fixed choice of W (we can't add a
multilinear with outputs in W to a multilinear with outputs in some other linear
space).
(U×dual(V)| [u,w]->
u**×**w :{linear (V|:U)}) defined by u**×**w = (V| v->
u.w(v) :U). Evaluating w(v) gives a scalar, which we multiply by u to get a
member of U. The span of the embedding is denoted U⊗dual(V) - in which
⊗ wants to be a × in a circle - and known as the **tensor
product** of [U,dual(V)]: I'll also write ⊗([U,dual(V)]) for the
same thing, leading on towards, ⊗(W) for at least some mappings (: W
:{linear spaces}). Every member of U⊗dual(V) may be written as sum(n|
i-> u(i)**×**w(i) :{linear (V|:U)}) for some (n|u:U) and
(n|w:dual(V)).

Notational aside: I'll sometimes combine linear *spaces* using
×, the Cartesian product (but, much
more frequently, I'll combine them with ⊗): I'll often combine their
*members* with this new operator **×**. In most contexts, it'll
be clear which entities are being dealt with as linear spaces and which as their
*members*, making it unnecessary to distinguish between × and
**×**: in such contexts (primarily when only one of them would have
been in use), I may use × for both and leave the reader to work out which
meaning it has in each context. Both ⊗ and **×** are known as
`the tensor product', the first on linear spaces, the second on their members.
There is, again, seldom any opportunity for confusion: indeed, it is more
conventional for them to share the symbol on which ⊗ is based, ideally
using a bigger circle for the operator between linear spaces (but the symbol
× is familiar to more browsers than is ⊗, and I haven't found
⨷ yet). On some pages I use &tensor; for ⊗ - I pronounce ⊗
as `tensor', so that's the entity name I remember.

**Theorem**: if u×v is zero, one of u, v is zero;
otherwise, if u×v = x×y, there is some scalar k for which u=k.x and
k.v=y. In this, u, x are taken to be members of some S-linear space, U, and v,
y to be in V, also S-linear.

Proof:if u×v is zero then, for any w in dual(V), 0 = 0(w) = (u×v)(w) = u.v(w) so either u=0 or, for every w in dual(V), v(w)=0, which makes v=0.Otherwise, u×v = x×y isn't zero so there's

somew in dual(V) for which u.v(w) = x.y(w) isn't zero. For any such w, v(w) is a non-zero scalar so y(w) = k.v(w) for some scalar k. This makes u.v(w) = x.k.v(w) with v(w) non-zero so cancellable, leaving u=x.k. Now, x.y(w) is non-zero, hence so is x: whence, for every t in dual(V), x.y(t) = u.v(t) = x.k.v(t) and x is cancellable, leaving y(t) = k.v(t); so y=k.v.

If I have linear spaces U and W but no linear space V with W=dual(V), I can
still define U⊗W: formally, this is done by defining U⊗**W**,
where **W** is the subspace of dual(dual(W)) spanned by W's natural embedding
in dual(dual(W)) - typically all of dual(dual(W)). In such a case I'll
generally write U⊗**W** as U⊗W, unless I'm being fussy:
**W** and W are isomorphic.

Most of the power of the tensor product lies in the fact that we can nearly
always get away with discussing `typical' members of U⊗V, of form
u×v, and inferring what happens to `general' members, obtained by adding
together several typical members, from linearity. Note: the identity linear map
(V|:V) is a member of V⊗dual(V) but is not typical (unless V is
one-dimensional). On the other hand, scaling a typical member always gives a
typical member (because k.(u×v) = (k.u)×v, which is typical): the
span of a collection of values, U⊗V = span({u×v: u in U, v in V}),
is defined in terms of *scaling* some of those values and summing the
scaled values, but we can discuss U⊗V purely in terms of typical values
without having to scale these.

Consider any (V| u**×**t :U) and linear (W|a:V), so u**×**t is
composable after a. (V|t:scalars) linear gives t&on;a linear (W|:scalars),
so t&on;a is in dual(W). We obtain

- (u
**×**t)&on;a - = (W| w-> (u
**×**t)(a(w)) = u.(t(a(w))) :U) - = u
**×**(t&on;a).

Likewise, for linear (U|b:X) and (V| u**×**t :U), we have

- b&on;(u
**×**t) - = b&on;(V| v-> u.t(v) :U)
- = (V| v->
- b(u.t(v)), each t(v) is a scalar and b is linear, so this
- = b(u).t(v)
- :W)

- = b(u)
**×**t,

so the tensor product interacts straightforwardly with composition of linear maps. In particular, a's interaction with the tensor product is entirely described by its interaction with t: u is ignored and left in peace: in this sense, when we compose a linear(V|:U) after a linear(W|:V), they need know little more about each other than what dual(V) and V have to know about each other.

In the special case where W is just scalars, u**×**(t&on;a) is just
u.(t&on;a), as t&on;a, in dual(W), is just a scalar. Indeed, in general, any
linear map (scalars|f:X) is synonymous with its value at 1 since f(r) = f(r.1) =
r.f(1) then gives its value for any other scalar input, r. This gives a natural
isomorphism between {linear (scalars|:X)} and X. This follows readilly from the
definition: (W|a:V) is now (scalar|a:V) which means a = (: r-> r.a(1) :V) is
entirely determined by linearity and a(1), which can be any member of V: so,
FAPP, a is a(1) in V and t&on;a is likewise (t&on;a)(1) = t(a(1)); as above,
(u**×**t)&on;a = u.(t&on;a) as mappings (scalars|:U); and their values at 1
are both u.t(a(1)), by which they are wholly determined.

In general, for a composable list of linear maps, *eg*
b&on;(u**×**t)&on;a, with a tensor product as one member of the list, the
composition separates around it, to give b(u)**×**(t&on;a). A second tensor
product in a composable list, *eg*
c&on;(u**×**t)&on;b&on;(v**×**w)**×**a, has t&on;b&on;v scalar, giving
(c&on;u)**×**(t&on;b&on;v).(w&on;a), so that b's part in proceedings is
entirely described by t&on;b&on;v, as a simple linear scaling on
c&on;(u**×**w)&on;a.

The important property of the tensor product is that, for all practical
purposes, it implies that we can separate discussion of {linear (U|:V)} into
discussion of dual(U) and V: the basic premis of tensor algebra is that
U⊗dual(V) is, at least for all practical purposes, all of {linear
(V|:U)}; likewise, that **W** is all of dual(dual(W)). When V and U are
finite-dimensional (making dual(dual(V)) = **V**), U⊗dual(V) = {linear
(V|:U)}; and when they aren't, U⊗dual(V) is, at least, `dense' in {linear
(V|:U)}, and we generally keep to contexts in which linear maps are continuous
and, hence, wholly determined by their values on any dense subspace.

For fixed w in dual(V), (U| u-> u**×**w :U⊗dual(V)) is a
linear map, since (V| v-> u.w(v) :U) is linear and, for w, x in dual(V) and k
in S:

- u
**×**(w+x) = (V| v->- u.((w+x)(v)), use the definition of + on function spaces: (w+x)(v) = w(v)+x(v)
- = u.(w(v)+x(v)), and . distributes over +: u.(a+b) = u.a + u.b
- = u.w(v) + u.x(v), use the definition of tensor product ...
- = (u×w)(v) + (u×x)(v), read + on function spaces again
- = (u×w + u×x)(v)

**×**w + u**×**x -
- u
**×**(k.w) - = (V| v-> u.(k.w(v)) = (u.k).w(v) :U)
- = (u.k)
**×**w, so, assuming u.k = k.u, these are equal to

- u
- (k.u)
**×**w = (V| v-> k.(u.w(v)) :U) = k.(u**×**w). This justifies referring to this quantity as k.u**×**w.

Consequently, (U| u-> (dual(V)| w-> u×w :U⊗dual(V))
:{linear (dual(V)| :U⊗dual(V))}) is `linear in both arguments'. For any
S-linear U, V, W and mapping (U| f :{(V| :W)}), we can ask whether f is linear
(using the linear structure on U and on {(V|:W)}) and we can ask whether f's
outputs are linear: these are separate questions, but when both f and its
outputs *are* linear, then f is described as a **bilinear**
map from U and V to W. Clearly the tensor product is, thus, a bilinear map from
the two input spaces to their tensor product space, (U| :{linear (V|
:U⊗V)}). Indeed, as I'll now show, one can factorise any bilinear map
from U and V as a composite of the tensor product before some (U⊗V| :W).

I now want to show how the tensor product serves as a natural description of bilinearity. This amounts to showing that {bilinear from U then V to W} is functionally {linear (U⊗V| :W)}, by exhibiting a natural isomorphism between these two linear spaces. First, observe that (U⊗V| F :W) induces (U| u> (V| v-> F(u×v) :W) :) which is bilinear from U then V to W. It turns out to be a little trickier to construct the inverse of this.

Given f bilinear from U then V to W, I need to construct `that (U⊗V| F :W) for which f(u,v) = F(u×v)'. This requires F to be a mapping; which turns out to be the hard thing to prove. At the outset, we can define F on `typical' members of U⊗V using our goal, f(u,v) = F(u×v); but we need to define F on all of U⊗V, which includes sums of typical members; and we need to show that when two sums are equal as members of U⊗V, they give equal answers in W. Thus the proof of the naturality of the tensor product amounts to a proof that we can define a mapping on `typical' members of U⊗V and, provided it is bilinear, know that we can extend it linearly to get consistent answers on all of U⊗V.

To define F on a sum(n| u(i)×v(i) :) of typical members, we need its value to be sum(n| i-> f(u(i),v(i)) :). Now, if we have sum(n| u×v :) = sum(m| x×y :), as may arise, we need to be sure that sum(f(u,v)) = sum(f(x,y)). This is easy for n, m in {1, 0}, aka 2, but requires proof at each step to expand that to {2, 1, 0} = 3, then to {3, 2, 1, 0} = 4 and so on. The proof at each step, expanding it to 1+N = {N,...,0}, has, however, a common form: it exploits the knowledge from previous steps - that it works for N, ..., and 0, - to enable us to prove that it works for 1+N also.

So, for some natural N, suppose we're far enough throught the proof already
that we know that equality of sums *shorter* than N always works: if this
enables us to show that sums of length *at most* N all work, we can move
on to N+1 knowing that all sums shorter than N+1 will work.

So our generic sum(n| u×v :) = sum(m| x×y :) is given to yield sum(f(u,v)) = sum(f(x,y)) whenever n and m are both members of some natural N, and we need to show that the same still holds when n and m are at most N. In this, the only relevant cases are: n = N = m and; one of n, m is N, the other is in N. To prove these cases from the given cases, n and m both in N, we need to break up the sums, infer a shorter pair of sums and exploit the fact that we can already cope with all shorter sums.

So consider:

- a bilinear f from U then V to W and
- a natural number N for which
- for any n, m in N, (n| u :U), (m| x :U), (n| v :V) and (n| y :V)
- "sum(u×v) = sum(x×y)" implies "sum(f(u,v)) = sum(f(x,y))".

and see what happens if we allow n and m to be either members of N, as given, or equal to N: say m = N and n is either in N or equal to N.

Given f bilinear from U and V to W, we can define a relation (U⊗V: F :W) by

- for any u in U and v in V, F relates u
**×**v to f(u,v), and - for any mapping G which is a sub-relation of F (and for which we know how to sum G's inputs, which may well mean that G has finitely many inputs, hence outputs), F relates the sum of G's inputs to the sum of G's outputs.

For any u, x in U, v, y in V and scalar k, we have some mapping (U⊗V: G :W) which is a sub-relation of F and for which

- G((u+x)
×v) = f(u+x,v) = f(u,v) + f(x,v) = G(u×v) + G(x×v)- G(u
×(v+y)) = G(u×v) + G(u×y), likewise;- G(u
×k.v) = f(u,k.v) = k.f(u,v) = k.G(u×v)- G(k.u
×v) = f(k.u,v) = k.f(u,v) = k.G(u×v)Now, suppose x

×y = u×v. If either u or v is zero, this makes one of x, y zero; hence both f(u,v) = 0 = f(x,y). Otherwise, we have some scalar k with u=k.x, k.v=y giving f(u,v) = f(k.x,v) = k.f(x,v) = f(x,k.v) = f(x,y). Consequently, whichever case we seee, u×v = x×y implies f(x,y) = f(u,v). In particular, F relates 0 to 0.Now, U

×V is the span of `for u in U, v in V, map [u,v] to u×v', so every member of it is a finite sum of terms of form u×v. We need to show that if two such sums are equal, the values to which each sum tells us F relates the sum are also equal and, ultimately, that F only relates a given value in U×V to one member of W.For any natural number N, introduce a relation (U⊗V: E(N) :W) which, for any (n|u:U) and (n|v:V) with n in N: E(N) relates sum(n| i-> u(i)

×v(i) :U×V) to sum(n| i-> f(u(i),v(i)) :W). Observe that F is unite(natural: n-> E(n) |) and each n<N has E(n) a sub-relation of E(N): furthermore, E(0) and E(1) are mappings, from the discussion above (which explicitly deals with E(1): and E(0) is just F's action on the empty sum, whose value is zero, which E(0) maps to zero by definition; which fortunately agrees with E(1) mapping 0×v and u×0 to zero).For any natural number N for which we are given that E(n) is a mapping for each n in N (

egN = 0, 1 or 2, so far: but suppose N is at least 1), consider any (N|u:U), (N|v:V), (n|x:U) and (n|y:V) with n either equal to N or in N and sum(N| i-> u(i)×v(i) :) = sum(n| i-> x(i)×y(i) :). This equality, applied to any w in dual(V), gives us a linear dependence between the outputs of x and those of u, which we can re-arrange to express, for some j in N, u(j) as a weighted sum of N+n-1 terms, u(j) = sum(N: i-> k(i).u(i) :V) - sum(n| i-> h(i).x(i) :V) for some (n| h :scalars) and (N: k :scalars) for which (|k) is all but j of N.Since j isn't in (|k), but everything else in N is, there is some monic (N-1| m :N) isoCleanly composable before k, using which we can define (N-1| r :V) to map i-> v(m(i))+k(m(i)).v(j). Defining s = (n| i-> y(i)+h(i).v(j) :V) we now find that

- sum(N-1| i-> u(m(i))
×r(i) :)- = sum(N-1| i-> u(m(i))
×(v(m(i))+k(m(i)).v(j)) :)- = sum(N| i-> u(i)
×v(i) :) - u(j)×v(j) + sum(N: i-> u(i).k(i) :)×v(j)- = sum(n| i-> x(i)
×y(i) :) + sum(n| i-> h(i).x(i) :)×v(j)- = sum(n| i-> x(i)
×(y(i)+h(i).v(j)) :)- = sum(n| i-> x(i)
×s(i) :).We thus have a `shorter' pair of equal tensors: with xnew = u&on;m, ynew = r, unew = x and vnew = s, we find

- sum(N| i-> f(u(i),v(i)) :) - sum(n| i-> f(x(i),y(i)) :)
- = sum(N-1| i-> f(m(i),v(i)) :) + f(u(j),v(j)) - sum(n| i-> f(x(i),y(i)) :)
- = sum(N-1| i-> f(u(m(i)),v(m(i))) :)

- + sum(N-1| i-> f(k(m(i)).u(m(i)),v(j)) :)
- - sum(n| i-> f(h(i).x(i),v(j)) :)
- - sum(n| i-> f(x(i),y(i)) :)
- = sum(N-1| i-> f(u(m(i)), v(m(i))+k(m(i)).v(j)) :) - sum(n| i-> f(x(i), y(i)+h(i).v(j)) :)
- = sum(N-1| i-> f(xnew(i),ynew(i)) :V) - sum(n| i-> f(unew(i), vnew(i)) :V)
While n might be n, N-1 is definitely in N: if n is also in N, E(N-1) accepts unew

×vnew and xnew×ynew, which have equal sums, and our inductive hypothesis tells us that E(N-1) is a mapping, making the sum of f(xnew,ynew) equal to the sum of f(unew,vnew): the above difference is zero, so the sum of f(u,v) must equal the sum of f(x,y); so the two answers E(N) could have given, for the sums of unew×vnew and of xnew×ynew, agree. Our inductive hypothesis was that if two tensor sums shorter than N are equal, the resulting sums of f's outputs are equal: we now have that if either of the sums is shorter than N and the other no longer, the f-sums are again equal.The remaining case n=N gives us equal sums for xnew

×ynew and unew×vnew, one of which is shorter than N, the other still being only as long as N, but we now know that that makes the two sums of f's outputs equal; so, in turn the same holds for the sums of f(u,v) and f(x,y). So, finally, we find that E(N) is a mapping, just like each E(n) with n in N. Consequently 1+N is a natural number for which every n in 1+N has E(n) a mapping: the inductive hypothesis stated for N is thence deduced to hold for 1+N. Since we knew it was true for N = 0, 1, 2, we now know it for all natural N and, since each E(N) subsumes each E(n) with n in N, the union({naturals}| n-> E(n) |) = F must also be a mapping. That it is linear follows promptly from its definition.This gives us a mapping, ({linear (U| :{linear (V|:W)})}: f-> F :{linear (U⊗V| :W)}) and we find that ({linear (U⊗V| :W)}| F-> (U| u-> (V| v-> F(u

×v) :W) :) :) is inverse to it, producing outputs exclusively in {linear (U| :{linear (V|:W)})}, so we find that {linear (U| :{linear (V|:W)})} is isomorphic to {linear (U⊗V| :W)}: a bilinear map from U and V to W can always be factorisedviathe standard bilinear map from U and V to U⊗V.

Examine dual(U⊗V) = {linear (U⊗V| :S)}, which the above shows
equivalent to {linear (U| :{linear (V| :S)})} = {linear (U| :dual(V))} which is
dual(V)⊗dual(U). More concretely, from dual(V)⊗dual(U), we have
the mapping w**×**x-> (U⊗V| u**×**v-> w(v).u(x)
:S) whose outputs are manifestly in dual(U⊗V): and any F in
dual(U⊗V) gives us (U| u-> (V| v-> F(u**×**v) :S)
:dual(V)).

Now, let's look at (U⊗dual(V))⊗dual(W) = {linear (W| :{linear
(V| :U)})} and U⊗(dual(V)⊗dual(W)), which we've now seen
equivalent to U⊗dual(W⊗V) = {linear (W⊗V| :U)}, which is,
in turn, equivalent to {linear (W| :{linear (V| :U)})}, so that
U⊗dual(V)⊗dual(W) is a reasonable name for these isomorphic
spaces. Any given u**×**x**×**y in it appears as

- (W| w-> (V| v-> u.x(v).y(w) :U) :{linear (V|:U)})
- (W| w-> u
**×**x.y(w) :U⊗dual(V)), - (W⊗V| w
**×**v-> u.x(v).y(w) :U) - ({linear (dual(V)| :W)}| f-> u.y(f(x)) :U)

so discussions of tensor products do need to indicate which meaning of each term to use: however the ambiguities are easy to avoid and to resolve when unavoidable.

Written by Eddy.