The Lorentz Transformation

It used to be supposed that, if two observers were in a state of steady motion relative to one another, applying a translation to what one sees at any given moment would convert it to what the other sees, with the translation growing linearly with time. Thus the space-time co-ordinate system each uses is obtained from the other's by a linear transformation called a shear, across the time axis.

This naïve assumed relationship became untennable when physics revealed that the speed of light, c, is the same for all observers. The transformation of co-ordinates which replaces it lies at the heart of special relativity.

When we say that light has the same speed for all observers, we tacitly presume that all observers have an agreed-upon standard of how to measure lengths and times, for use in measuring the speed at which light propagates. The nature of this common standard is such that we can presume that, however each observer actually parameterises space and time, there is a one-to-one well-behaved transformation between their description and (a cartesian) one in which space and time are described independently, spatial positions are described by three co-ordinates varying in mutually orthogonal directions and the spatial distance between two positions is the square root of the sum of the squares of the three co-ordinates' values differences between the two positions. Having the speed of light equal according to both systems amounts to parameterising the spatial co-ordinates consistently with each other and the time co-ordinate. The physical import of the claim is then that, when we use such a system of co-ordinates, the laws of physics take a straightforward form.

The new transformation

Consider two observers, the first of whom sees the second moving with velocity V.c. Choose an event – in each frame it happens at some point at some given moment of time – and have each frame measure time from that moment and use the given point as origin of spatial co-ordinates. In the first obvserver's frame, at time t, the second observer's spatial origin is at position V.c.t. We may naïvely expect that, conversely, in the second observer's frame, at time T, the first observer's spatial origin is at position −V.c.T; but we shall not assume this. None the less, a rod aligned parallel to V in the first observer's frame and (stationary or) moving with a velocity parallel to V (i.e. parallel to the rod's own length), with that frame's spatial origin at some point on the rod at the initial moment, necessarily has the other frame's origin as a point on the rod at that initial time and moving parallel to the rod's alignment and (possibly zero, which is parallel to all directions) velocity. The second frame thus also sees any such rod as moving (if at all) parallel to its own alignment and parallel to the first frame's origin's movement, albeit possibly in the opposite direction.

By considering such rods we may infer, in the second frame, a spatial direction it makes sense to describe as V-ward; and the first frame's spatial origin moves in the opposite direction to this, as seen by the second frame. Whether they agree about the magnitudes, or indeed whether these directions are strictly-speaking parallel, can remain an open question. Furthermore, by considering such rods, we can infer that the two frames must agree about which events happen on the V-ward line through the spatial origin at some time, for all that they may disagree about the time or distance from the origin along the line at which the event happens. In short, the time-and-V-ward displacements in one frame are time-and-V-ward also in the other – albeit some that have no time-component in one might have a time-component in the other; and some that have no V-ward component in one do have a V-ward component in the other. Crucially, when we complement our time and V-ward co-ordinates in each frame with spatial co-ordinates, that vary orthogonal to the V-ward co-ordinate and take value zero at the chosen reference event used as origin, the two frames events at which these co-ordinates are zero coincide and are fully parameterised, in each, by time and V-ward component.

Entangled co-ordinates

Let the variables used by the two frames for time and V-wards displacement be [t, x] and [T, X] for the two observers. Let v be the magnitude of V, so that X is 0 where x = v.c.t. Start by considering only events with the other spatial co-ordinates zero and suppose an arbitrary linear relation between [t, x] and [T, X]:

T = a.t +b.x
X = e.t +f.x

The shear previously assumed would have T = t, X = v.t +x, with v being the magnitude of V; instead of assuming this, we'll now work out the actual transformation from the fact that the speed of light is the same for both. That implies that the set of points described by c.t = x must also satisfy c.T = X; thus e+f.c = c.(a+b.c). Likewise, c.t = −x must correspond to c.T = −X, so e−f.c = −c.(a−b.c). Halving the sum and difference of these equations we get e = c.b.c and f.c = c.a. At the spatial origin of the [T, X] frame, where x = v.c.t, we have 0 = X = (b.c +a.v).c.t for all t, whence b.c +a.v = 0, so b = −a.v/c, e = −a.v.c and we have T = a.(t −v.x/c), X = a.(x −v.c.t).

Conversely, with u as the magnitude of the first frame's velocity (in the opposite to V-ward direction) as seen by the second, we can infer t = A.(T +u.X/c), x = A.(X +u.c.T) for some constant A. Combining these two, we obtain:

t = A.(T +u.X/c): = A.a.(t −v.x/c +u.(x −v.c.t)/c); = A.a.(t.(1 −u.v) +(u −v).x/c)
x = A.(X +u.c.T): = A.a.(x −v.c.t +u.c.(t −v.x/c)); = A.a.(x.(1 −u.v) +(u −v).c.t)

from either of which we may infer (since t and x are independent) that u −v is zero, so u = v, and A.a = 1/(1 −v.v). We thus infer (rather than assuming) that the velocity of the first frame, as seen by the second, is equal and opposite (in the V-wards sense) to the velocity of the second as seen by the first. This gives us a symmetry between the two frames that begs to have A and a of equal magnitude, leading to a = 1/√(1 −v.v) = A. We can thus state the transformation of co-ordinates as

c.T = (c.t −v.x)/√(1−v.v)
X = (x −v.c.t)/√(1−v.v)

This is known as the Lorentz transformation, after the late Victorian scientist who pointed out that it would suffice to explain the non-result of the Michelson-Morley experiment – an attempt to detect our motion relative to the æther, or frame of reference in which Maxwell's equations take their simple form.

Comparison

If we leave out the factors of √(1−v.v) and the v.x/c term in the first of these, we get the shear previously presumed. Since c.v is the speed of relative movement, factor 1−v.v is almost indistinguishable from 1 for speeds tiny compared to c. Satellites in geostationary orbit have speeds of around 3 km/s or c/100000; the Earth's orbital speed about the Sun is about a factor of ten bigger and the Sun's motion relative to the cosmic microwave background is a further factor of about ten bigger (so roughly c/1000); our local group of galaxies, moving about twice as fast relative to that background, still only has v ≈ 1/480, yielding 1−v.v = 0.9999956, with square root 0.9999978, whose inverse is 1.0000022. Even such huge speeds (over 600 km/s) only produce a tiny perturbation: detecting the effects of more modest ones is a good deal harder.

To observe the v.x/c term's effect we would need accurately synchronized clocks far enough apart that, when viewed from a moving frame, the v.x/c term is not hidden by imprecision in the clocks. The Earth's diameter is about 1/24 light seconds and we can reasonably put sensitive equipment that far apart (we've managed a few cases of instruments further apart than that, but it's not yet easy or cheap); observed from low Earth orbit, with v around 1/40000, we would need to be able to discern microsecond discrepancies. Even this would have been difficult for the Victorians, and they couldn't put things into Earth orbit. Using the fastest-moving platforms that Victorian scientists could have hoped to use – trains, with c.v of order 100 km per hour, or around 30 m/s, so v is about one part in ten million – it would be necessary to discern discrepancies of order one part in a few hundred million. Thus the Lorentz transform's deviations from the prior theory were smaller than the sensitivity of existing measurements, allowing all the data, that had previously been understood as supporting the old theory, to stand equally in support of the new.

Shortening and slowing

One consequence of the transformation is that the lengths of objects depend on velocities: if an object, of which some linear dimension is L when measured in its frame of rest, is observed moving at speed v.c parallel to the given dimension, then measurement of that linear dimension will yield L.√(1−v.v). [Proof: the two ends have positions X = 0 and X = L in its own frame of reference, for all T; these give values for x−v.t.c of 0 and L.√(1−v.v); for any given t, the two values of x thus differ by L.√(1−v.v).] This is known as the Lorentz-Fitzgerald contraction.

Equally, moving clocks must run slow by the same factor √(1−v.v). [Proof: using the clock's position to define X = 0, successive ticks happen with T = n.k for integer n and some fixed tick-interval k; the resulting values of t = (T +v.X/c)/√(1−v.v) are n.k/√(1−v.v) with tick-interval k/√(1−v.v), which exceeds k; so the clock appears to run slow, by the given factor.] This is known as time dilation. It implies that an unstable particle, decaying with some known half-life T in its rest frame, will decay with a slowed half-life T/√(1−v.v) when moving at speed v.c.

It is a remarkable tribute to the progress of measuring technology in the twentieth century that, at its start, none of the above discrepancies had the remotest chance of being measured (although the not totally unrelated precession of the orbit of Mercury had been measured, and was unexplained): while, by its end, all had been measured – and found to be in accord with the predictions of relativistic theory.

Sideways co-ordinates

Given that the effects of relative movement mess up our intuitions about independence between time and the spatial co-ordinate parallel to the relative motion, we have no immediate guarantee that our co-ordinates perpendicular to the motion aren't also entangled in the relationship between these two, nor that they are themselves unperturbed. So let us not even assume we know that distances from the x-and-X axis are unchanged, or that the two systems agree on which displacements are perpendicular to this axis (i.e. to V). However, our system is entirely symmetric under rotations about this axis, so it should suffice to consider one plane through the axis; we can probe the question by considering light trajectories in some such plane and the above tells us how the two frames are related to one another on the axis.

In the [t,x] frame, at t = 0, chose a plane in which the x-axis lies and, within that plane, use distance from the axis to define a co-ordinate y, positive on one side of the axis and negative on the other. For any value L for this y co-ordinate, consider the family of trajectories for flashes of light that pass through [c.t,x,y] = [0,0,L] either on their way to some point on the x-axis (at some positive t) or having been emitted from some point on the x-axis (at some negative t); these are parameterised by the angle, a, they make with the direction of V in the [x,y] plane and each has [x,y] = [0,L] +c.t.[Cos(a),Sin(a)], meeting the x-axis when 0 = y = L +c.t.Sin(a), so at [c.t,x] = −[1,Cos(a)].L/Sin(a) hence at [c.T,X] = −[1 +v.Cos(a), Cos(a) +v].L/Sin(a)/√(1−v.v). Since v is smaller than 1, we have some angle b with Cos(b) = v and Sin(b) = √(1−v.v) so that we can re-write this as

−[c.T,X] = [1 +Cos(b).Cos(a), Cos(a) +Cos(b)].L/Sin(a)/Sin(b).

The sign of T at this start-point is the opposite of the sign of L/Sin(a), just as was the case for t. In the second frame, consider the event that is the meeting-point of all our flash trajectories, corresponding to [c.t,x,y] = [0,0,L]; we don't even know (although we may reasonably expect) that this has [c.T,X] = [0,0]. However, we can at least construct a plane of constant T through this event and the X axis. Define a co-ordinate Y in this plane that measures distance from the X-axis, positively on one side and negatively on the other, making this choice in such a way as to give the meeting-point's Y co-ordinate the same sign as L. (We are only interested in the case L ≠ 0, since we already dealt with the L = 0 case above; and this puts the meeting-point off the x-axis, which is the X axis; hence the meeting-point does also have a non-zero Y co-ordinate.) Let the meeting-point have [c.T,X,Y] = [K,J,H]; the family of light trajectories through it and the X-axis is then given by [X,Y] = [J,H] +(c.T−K).[Cos(e),Sin(e)], parameterised by angle e that this frame deems the light's velocity to make with V. Each such meets the X-axis when 0 = Y = H +(c.T−K).Sin(e), giving c.T = K −H/Sin(e) and X = J +(c.T−K).Cos(e) = J −H.Cos(e)/Sin(e); and the set of such events must be equal to the set of [c.T,X] events given above, parameterised by a.

Applying X = J +(c.T−K).Cos(e) to the earlier −[c.T,X] values, and scaling by Sin(a).Sin(b)/L, we obtain

−(Cos(a) +Cos(b)): = X.Sin(a).Sin(b)/L; = (J −K.Cos(e)).Sin(a).Sin(b)/L −(1 +Cos(b).Cos(a)).Cos(e); = Sin(a).Sin(b).J/L −(1 +Cos(b).Cos(a) +Sin(a).Sin(b).K/L).Cos(e), whence
Sin(a).Sin(b).J/L +Cos(a) +Cos(b): = (1 +Cos(b).Cos(a) +Sin(a).Sin(b).K/L).Cos(e), so
Cos(e): = (Sin(a).Sin(b).J/L +Cos(a) +Cos(b))/(Sin(a).Sin(b).K/L +Cos(a).Cos(b) +1)

in which we can vary a freely yet must obtain a value between −1 and 1; this constrains J/L and K/L. Next, obtain:

H: = (K−c.T).Sin(e); = (K +(1 +Cos(a).Cos(b)).L/Sin(a)/Sin(b)).√((Sin(a).Sin(b).K/L +Cos(a).Cos(b) +1)² −(Sin(a).Sin(b).J/L +Cos(a) +Cos(b))²)/(Sin(a).Sin(b).K/L +Cos(a).Cos(b) +1); = L.√((Sin(a).Sin(b).K/L +Cos(a).Cos(b) +1)² −(Sin(a).Sin(b).J/L +Cos(a) +Cos(b))²)/Sin(a)/Sin(b); = √((Sin(a).Sin(b).K +L.Cos(a).Cos(b) +L)² −(Sin(a).Sin(b).J +L.Cos(a) +L.Cos(b))²)/Sin(a)/Sin(b); = √((Sin(a).Sin(b).K +L.Cos(a).Cos(b) +L −Sin(a).Sin(b).J −L.Cos(a) −L.Cos(b)).(Sin(a).Sin(b).K +L.Cos(a).Cos(b) +L +Sin(a).Sin(b).J +L.Cos(a) +L.Cos(b)))/Sin(a)/Sin(b); = √((Sin(a).Sin(b).(K −J) +L −L.Cos(a) −L.Cos(b) +L.Cos(a).Cos(b)).(Sin(a).Sin(b).(K +J) +L +L.Cos(a) +L.Cos(b) +L.Cos(a).Cos(b)))/Sin(a)/Sin(b); = √((Sin(a).Sin(b).(K −J) +L.(1 −Cos(a)).(1 −Cos(b))).(Sin(a).Sin(b).(K +J) +L.(1 +Cos(a)).(1 +Cos(b))))/Sin(a)/Sin(b); = √( Sin(a).Sin(a).Sin(b).Sin(b).(K −J).(K +J) +L.(K −J).Sin(a).(1 +Cos(a)).Sin(b).(1 +Cos(b)) +L.(K +J).Sin(a).(1 −Cos(a)).Sin(b).(1 −Cos(b)) +L.L.(1 +Cos(a)).(1 −Cos(a)).(1 +Cos(b)).(1 −Cos(b)) )/Sin(a)/Sin(b); in the last term of which we can exploit (1 +Cos(u)).(1 −Cos(u)) = 1 −Cos(u).Cos(u) = Sin(u).Sin(u) for u in {a, b}:; = √( Sin(a).Sin(a).Sin(b).Sin(b).(K −J).(K +J) +L.(K −J).Sin(a).(1 +Cos(a)).Sin(b).(1 +Cos(b)) +L.(K +J).Sin(a).(1 −Cos(a)).Sin(b).(1 −Cos(b)) +L.L.Sin(a).Sin(a).Sin(b).Sin(b) )/Sin(a)/Sin(b); = √(K.K −J.J +L.L +L.(K −J).(1 +Cos(a)).(1 +Cos(b))/Sin(a)/Sin(b) +L.(K +J).(1 −Cos(a)).(1 −Cos(b))/Sin(a)/Sin(b))

Now, this holds true for all e, hence for all a, so its value must be independent of a. As Cos(a) approaches 1, (1 −Cos(a))/Sin(a) is well approximated by −Cos'(a)/Sin'(a) = Sin(a)/Cos(a), which reaches 0 when Cos(a) is 1; likewise, near Cos(a) = −1, (1 +Cos(a))/Sin(a) ≈ Cos'(a)/Sin'(a) = −Sin(a)/Cos(a) also tends to zero. In each of these cases, one of the last two terms in the √(…) vanishes as Sin(a) approaches 0, but the other must grow without bound unless (given that v is smaller than 1, so Cos(b) isn't ±1) the relevant one of K −J and K +J is zero. For H to be independent of a, it cannot grow without bound at either end of a's range, so we must in fact have K +J = 0 = K −J, whence K = 0 = J. We then obtain H = L,

Cos(e): = (Cos(a) +Cos(b))/(Cos(a).Cos(b) +1)
Sin(e): = √(Cos(a).Cos(a).Cos(b).Cos(b) +2.Cos(a).Cos(b) +1 −Cos(a).Cos(a) −2.Cos(a).Cos(b) −Cos(b).Cos(b))/(Cos(a).Cos(b) +1); = √(1 −Cos(a).Cos(a) −Cos(b).Cos(b) +Cos(a).Cos(a).Cos(b).Cos(b))/(Cos(a).Cos(b) +1); = √((1 −Cos(a).Cos(a)).(1 −Cos(b).Cos(b)))/(Cos(a).Cos(b) +1); = Sin(a).Sin(b)/(Cos(a).Cos(b) +1)

(Technically, the sign of Sin(e) is ambiguous – but a cursory inspection of the geometry of the situation requires it to have the same sign as Sin(a), which is what this formula gives, taking the positive sign for √(1−v.v) in Sin(b)'s specification.)

Now, H = L with J = 0 = K, independent of our choice of initial L, tells us that the y-axis and the Y-axis coincide and y = Y on this axis. It then follows (by simple linear geometry in the two frames of reference, assuming lines deemed straight by either are by the other also) that y and Y coincide everywhere and that, likewise, all other spatial directions perpendicular to V are the same for both frames of reference.

The Doppler shift

If we observe a clock from a position with fixed value of x, and its ticks are evidenced by light flashes, the interval between moments at which we see these flashes will change as the clock passes us. This is known as the Doppler shift. Chose [X,T] origin so that the clock is at X = 0 and ticks at moments T = n.k for integer n and some tick-interval k. Chose [x,t] origin so that our observer is at x = 0 and the clock passes the observer at t = 0. [In principle, the clock could pass the observer at a moment when it does not tick; this will make no practical difference, but for now presume that we adjust the clock by whatever fraction of a tick it takes to ensure it does tick at the moment of passing the observer.]

For t < 0, which is equally T < 0, the light flashes we see are from light travelling forwards from an [X,T] = [0,n.k] with n < 0; forward light from such events follows paths [x,t] = u.[c,1] + [v.c,1].n.k/√(1−v.v) for u varying and thus reaches x = 0 when u = −n.k.v/√(1−v.v) yielding t = n.k.(1−v)/√(1−v.v) = n.k.√((1−v)/(1+v)). For t > 0, the light flashes we see are from light traveling backwards from [X,T] = [0,n.k] with n > 0 along trajectories [x,t] = u.[−c,1] +[v.c,1].n.k/√(1−v.v) which hit x = 0 when u = n.v.k/√(1−v.v) yielding t = n.k.(1+v)/√(1−v.v) = n.k.√((1+v)/(1−v)). This is exactly the factor we would get for the t < 0 case if v were negated, so we can just say that the time interval between observations of ticks is k.√((1−v)/(1+v)) when a clock with tick-interval k is observed approaching at speed v, with recession expressed as approaching at negative speed.

Thus, while the clock (once we infer the time interval between its ticks, by correcting for the time taken by light pulses in reaching us) runs slow by the factor √(1−v.v), we observe ticks from it at a rate which, as it approaches, is fast by the factor √((1+v)/(1−v)) but, once it has passed, is slow by the inverse of this factor. As for clocks, so for any other periodic process in time: if some constituent of a distant star's photosphere perturbs its spectrum (from its natural black-body form) at some given frequency f (corresponding to 1/k) then the spectrum of the star, as we observe it, will be likewise perturbed at a frequency f.√((1+v)/(1−v)) if the star is moving towards us at speed v.c. For v > 0, a star actually moving towards us, this frequency is > f; the increase in frequency is called a blue-shift because blue light is at the high-frequency end of the spectrum of visible light. For v < 0, meaning the star is really moving away from us at (positive) speed −v.c, the frequency is < f; the decrease is called a red-shift because red is at the low-frequency end of the visible spectrum.

Classifying displacements

Notice that, in the [X,T] or [x,t] plane, X = 0 iff x = v.c.t and T = 0 iff c.t = x.v. Thus, for any position whose spatial co-ordinates, collectively, have smaller magnitude than c times its time co-ordinate, there is some velocity v.c = x/t in whose frame the given position is at the origin; likewise, for any position whose spatial magnitude is greater than c times time co-ordinate, there is a choice of velocity v.c = c.c.t/x in whose frame the given position has zero time-coordinate. In the first case, there is only one such frame of reference; but in the second case, relative to a frame which sees the position have zero time co-ordinate, any velocity perpendicular to the spatial position vector yields another frame which also sees the position have zero time co-ordinate. Thus, insisting that the lines x = ±c.t coincide with their analogues in any other frame of reference (i.e. that light goes at the same speed in all frames) leads to the conclusion that:

if the spatial separation between two events is less than c times their time-separation, there is a unique velocity of movement whose frame of reference deems the two events to happen at the same place; and
if the spatial separation between two events is greater than c times their time-separation, there is a two-dimensional continuum of velocities of movement whose frames of reference deem the two events simultaneous.

We may, thus, divide displacements in space-time into three classes:

light-like: the ratio of spatial to temporal components has magnitude c, regardless of frame of reference;
space-like: there are frames of reference which deem events separated by such a displacement simultaneous;
time-like: there is a frame of reference which deems events separated by such a displacement to happen in the same place – and we can sub-divide this type into forward and backward variants according as the given frame of reference deems the start of the displacement to happen before or after its end.

Hyperbolic Rotation

Notice that

c.c.T.T −X.X: = ( (c.t −v.x).(c.t −v.x) −(x −v.c.t).(x−v.c.t) )/(1−v.v); = (c.c.t.t −2.v.c.t.x +v.v.x.x −x.x +2.x.v.c.t −v.v.c.c.t.t) / (1−v.v); = c.c.t.t −x.x

so this combination of our co-ordinates is an invariant, just as rotating a (cartesian orthonormal) spatial co-ordinate system preserves the sum of squares of co-ordinates, i.e. the squared radius. In the spatial case, the change of co-ordinates is a rotation and preservation of the radial parameter is a characteristic property: so we may ask what analogue of rotation we can find for our invariant under the Lorentz transformation. For a rotation, we would have seen [Y,Z] = [y.Cos(A) −z.Sin(A), y.Sin(A) +z.Cos(A)] and described the rotation by the angle A. The sum of squares of Cos and Sin is 1 for all angles. Their hyperbolic analogues cosh and sinh (which can be obtained from looking at Cos and Sin of imaginary angles) are defined by

cosh = (: exp(x) +exp(−x) ←x :)/2
sinh = (: exp(x) −exp(−x) ←x :)/2

and the difference of their squares is always one: for any x,

cosh(x).cosh(x) −sinh(x).sinh(x): = (exp(2.x) +2 +exp(−2.x) −exp(2.x) +2 −exp(−2.x)) / 4; = 1

making cosh and sinh natural substitutes for Cos and Sin. Indeed, a transformation of form

Y = cosh(b).y +sinh(b).z
Z = sinh(b).y +cosh(b).z

has Y.Y −Z.Z = y.y −z.z, as is easily verified. Using Y = c.T, y = c.t, Z = −X, z = −x puts the Lorentz transformation in this form with cosh(b) = 1/√(1−v.v) and sinh(b) = v/√(1−v.v). The ratio (: sinh(b)/cosh(b) ←b :) = tanh, analogous to the usual Tan of angles, gives us our speed as c.tanh(b), since v = sinh(b)/cosh(b) = tanh(b). Thus the Lorentz transformation is a hyperbolic rotation rather than the cross-time shear tacitly assumed by Galileo and Newton:

c.T = c.t.cosh(b) −x.sinh(b)
X = x.cosh(b) −c.t.sinh(b)

In these terms, the Doppler shift factor √((1+v)/(1−v)) for a source approaching the observer at speed v.c with v = tanh(b) becomes

√((1+tanh(b))/(1−tanh(b))): = √((cosh(b) +sinh(b))/(cosh(b) −sinh(b))); = √(exp(b) / exp(−b)); = exp(b)

so if we divide the observed frequency of a spectral line by the frequency it was at when produced, and take the log of the result, we get the hyperbolic angle whose tanh, when multiplied by c, gives the speed, towards us, of the source of the observed radiation. Equally, this can be computed as the difference in log between the two frequencies (each divided by some common unit). Furthermore, exp(b) = cosh(b) +sinh(b) = (1+v)/√(1−v.v) is just our Doppler shift ratio, so we can compute b directly from v as: b = log((1+v)/(1−v))/2; as long as v is smaller than 1, this is well-defined (and has the same sign as v).

Adding velocities

Any body has its own rest-frame, in which it sits at the spatial origin and its trajectory is just the time axis, parameterised by that body's proper time. As seen by a frame with respect to which it's moving at velocity c.V, with speed v = tanh(b), this is just this other frame's [c.t, x] = s.[c.cosh(b), sinh(b)], where s is the body's proper time and x is measured parallel to the velocity.

We can now look at this trajectorp as seen by another frame. Of course, this frame's velocity c.U relative to our first needn't be parallel to V; so we'll be using the co-ordinates derived from our velocity; V shall have some component parallel to U and we can use V's component perpendicular to U to determine one of our spatial co-ordinates perpendicular to U; let that be y and let the magnitude of U be u = tanh(d). We thus get our body's trajectory now as [c.t, x, y] = s.c.[cosh(b), sinh(b).cos(e), sinh(b).sin(e)] for some angle e, which transforms to

[c.T, X, Y] = s.c.[cosh(b).cosh(d) −sinh(b).cos(e).sinh(d), sinh(b).cos(e).cosh(d) −cosh(b).sinh(d), sinh(b).sin(e)]

That's not particularly simple. So let's now consider the cases where it's relatively simple: when U and V (in the first frame of reference) are either parallel or perpendicular, i.e. the angle e is a multiple of a quarter turn.

When sin(e) is 0, we can push the sign of cos(e) = ±1 onto b, with sinh(b) < 0 representing cos(e) = −1 by making v negative; v = tanh(b) is now the component of V parallel to U, rather than its magnitude. Our body's trajectory in the second frame becomes

[c.T, X]: = s.c.[cosh(b).cosh(d) −sinh(b).sinh(d), sinh(b).cosh(d) −cosh(b).sinh(d)]; = s.c.[(exp(b) +exp(−b)).(exp(d) +exp(−d)) −(exp(b) −exp(−b)).(exp(d) −exp(−d)), (exp(b) −exp(−b)).(exp(d) +exp(−d)) +(exp(b) −exp(−b)).(exp(d) −exp(−d))]/4; = s.c.[exp(b+d) +exp(b−d) +exp(d−b) +exp(−b−d) −exp(b+d) +exp(b−d) +exp(d−b) −exp(−b−d), exp(b+d) +exp(b−d) −exp(d−b) −exp(−b−d) −exp(b+d) +exp(b−d) −exp(d−b) +exp(−b−d)]/4; = s.c.[exp(b−d) +exp(d−b), exp(b−d) −exp(d−b)]/2; = s.c.[cosh(b−d), sinh(b−d)]

in the course of which I've incidentally derived the formulae for sinh and cosh of a difference of hyperbolic angles; the addition formulae can be obtained by using −d in place of d, with cosh(−d) = cosh(d) and sinh(−d) = −sinh(d). This trajectory's dX/d(c.T) is then tanh(b−d).

Of course, our body and the second frame were moving relative to the first frame, so we expect a difference of velocities here; the difference between velocities c.tanh(b) and c.tanh(d) is c.tanh(b−d); we can readily enough infer from this that a body moving at c.tanh(b) relative to the second frame shall move at c.tanh(b+d) relative to the first. Thus, for parallel velocities, the addition law is just to add the hyperbolic angles for the speeds in some common direction. Note that

tanh(b+d): = sinh(b+d) / cosh(b+d); = (sinh(b).cosh(d) +cosh(b).sinh(d)) / (cosh(b).cosh(d) +sinh(b).sinh(d)); = (tanh(b) +tanh(d)) / (1 +tanh(b).tanh(d))

so the sum of speeds c.u and c.v in parallel directions is c.(u +v) / (1 +u.v); when they are in opposite directions, dividing by the 1 +u.v factor scales the sum (which is, of course, a difference) up, when in the same direction the factor scales the sum down. For perpendicular velocities, sin(e) = 1 and cos(e) = 0, we have

[c.T, X, Y] = s.c.[cosh(b).cosh(d), −cosh(b).sinh(d), sinh(b)]

giving a spatial component of velocity/c, in [X, Y] co-ordinates, as [tanh(d), tanh(b)/cosh(d)] = V/cosh(d) −U. This effectively applies U's time-dilation to the first frame's observed velocity V of the body; it's a distance divided by time, the distance is the same in both frames, but they measure time at different rates.

So now let's consider the mixed case and, this time, actually do addition of velocities. We have a second frame moving at c.U with respect to a first, with the magnitude of U being u = tanh(d). We have a body moving, now with respect to the second frame, at a velocity with component c.v parallel to U and c.w perpendicular to it; its trajectory in this second frame is [c.T, X, Y] = c.T.[1, v, w], which transforms in our first frame to [c.t, x, y] = c.T.[cosh(b) +v.sinh(b), v.cosh(b) +sinh(b), w], so the [x, y] components of its velocity are

[v.cosh(b) +sinh(b), w]/(cosh(b) +v.sinh(b)): = [v +u.tanh(b), w/cosh(b)]/(1 +v.tanh(b)); = [v +u, w.√(1 −u.u)]/(1 +u.v)

So the perpendicular component is scaled down by the usual factor between the two frames, relative to the parallel component; and, after they've been added to the relative velocity between frames, the result is rescaled just as in the paralle addition rule.

Simultaneity

As noted above, the mixing of space and time means that the notion of simultaneity depends on frame of reference: events at the same time co-ordinate value in one frame shall typically have distinct time co-ordinate values in other frames moving at constant velocity with respect to the first – unless that velocity is (spatially) perpendicular to the displacement between events.

This is like the notion further forward for observers facing in different directions – albeit with observers unable to face in directions as much as a right angle apart from each other. Suppose we can only face in directions between north-east and north-west (the two diagonal directions serving in place of light-like) but you're facing a little to the west of north while I face a little to the east of it. To be sure, we both think everything to the north of us is in front of us, but there are things only a little north of west that are in front of you but behind me; and vice-versa for some things a little north of east.

With our (by now hopefully) familiar [c.T, X] = [c.t.cosh(b) −x.sinh(b), x.cosh(b) −c.t.sinh(b)], with positive b, the negative-x half of t = 0 is in the other frame's future (T > 0) while the positive half is in its past (T < 0). For any offset k into t's future, events with t = k but x > c.k/tanh(b) are still in the other frame's past (with T < 0); likewise when k is negative, making it an offset into t's past, events with t = k but x < c.k/tanh(b) (which is now negative) are in the other frame's past (T < 0).

Written by Eddy.