It used to be supposed that, if two observers were in a state of motion relative to one another, applying a translation to what one sees at any given moment would convert it to what the other sees, with the translation growing linearly with time. Thus the space-time co-ordinate system each uses is obtained from the other's by a linear transformation called a shear, across the time axis.
This naïve assumed relationship became untennable when physics revealed that the speed of light, c, is the same for all observers. The transformation of co-ordinates which replaces it lies at the heart of special relativity.
Consider two observers, one of whom sees the other moving with velocity V.c; chose some point at some given moment of time; have each measure time from that moment and use the given point as origin of spatial co-ordinates – one's origin will be the position the other describes as V.c times time. Let each observer measure one spatial co-ordinate parallel to V and their others perpendicular to it. Let the resulting variables used by the two frames for time, V-wards displacement and cross-V displacement be [t,x,w] and [T,X,W] for the two observers, with w and W being spatial vector quantities in the plane perpendicular to V while t, T, x and X are simple scalars. We get w = W as expected by our prior transformation, but the relation of [t,x] to [T,X] is more complex. Start by supposing an arbitrary linear relation between them:
The speed of light is the same for both, so the sets of points described by c.t = x must also satisfy c.T = X; thus e+f.c = c.(a+b.c). Likewise, c.t = −x must correspond to c.T = −X, so e−f.c = −c.(a−b.c). Halving the sum and difference of these equations we get e = c.b.c and f.c = c.a. For V to have its claimed meaning, the set of positions at which [X,W] is zero must, with v as the magnitude of V, satisfy x = v.c.t, yielding 0 = X = (b.c +a.v).c.t for all t, whence b.c +a.v = 0, so b = −a.v/c, e = −a.v.c and we have T = a.(t −v.x/c), X = a.(x −v.c.t). Since the transformation must be symmetric between the two observers, aside from changing the sign of v, we must equally have t = a.(T +v.X/c) and x = a.(X +v.c.T). Substituting the former into the latter we find t = a.a.(t −v.x/c +v.x/c −v.v.t) = a.a.(1−v.v).t and x = a.a.(x −v.c.t +v.c.t −v.v.x) = a.a.(1−v.v).x and infer that a = 1/√(1−v.v). We thus end up with the transformation of co-ordinates
This is known as the Lorentz transformation, after the late Victorian scientist who pointed out that it would suffice to explain the non-result of the Michelson-Morley experiment – an attempt to detect our motion relative to the æther, or frame of reference in which Maxwell's equations take their simple form.
If we leave out the factors of √(1−v.v) and the v.x/c term in the first of these, we get the relationship previously presumed. Since c.v is the speed of relative movement, factor 1−v.v is almost indistinguishable from 1 for speeds tiny compared to c. Satellites in geostationary orbit have speeds of around 3 km/s or c/100000; the Earth's orbital speed about the Sun is about a factor of ten bigger and the Sun's motion relative to the cosmic microwave background is a further factor of about 10 bigger; our local group of galaxies, moving about twice as fast relative to that background, still only has v ≅ 1/480, yielding 1−v.v = 0.9999956, with square root 0.9999978, whose inverse is 1.0000022. Even such huge speeds (over 600 km/s) only produce a tiny perturbation: detecting the effects of more modest ones is a good deal harder.
To observe the v.x/c term's effect we would need accurately synchronized clocks far enough apart that, when viewed from a moving frame, the v.x/c term is not hidden by imprecision in the clocks. The Earth's diameter is about 1/24 light seconds and we can reasonably put sensitive equipment that far apart; observed from low Earth orbit, with v around 1/40000, we would need to be able to discern microsecond discrepancies. Even this would have been difficult for the Victorians, and they couldn't put things into Earth orbit. Using the fastest-moving platforms that Victorian scientists could have hoped to use – trains, with c.v of order 100 km per hour, or around 30 m/s, so v is about one part in ten million – it would be necessary to discern discrepancies of order one part in a few hundred million. Thus the Lorentz transform's deviations from the prior theory were smaller than the sensitivity of existing measurements, allowing all the data, that had previously been understood as supporting the old theory, to stand equally in support of the new.
One consequence of the transformation is that objects' lengths depend on their velocities: if an object, of which some linear dimension is L when measured in its frame of rest, is observed moving at speed v.c parallel to the given dimension, then measurement of that linear dimension will yield L.√(1−v.v). [Proof: the two ends have positions X = 0 and X = L in its own frame of reference, for all T; these give values for x−v.t.c of 0 and L.√(1−v.v); for any given t, the two values of x thus differ by L.√(1−v.v).] This is known as the Lorentz-Fitzgerald contraction.
Equally, moving clocks must run slow by the same factor
√(1−v.v). [Proof: using the clock's position to define X = 0,
successive ticks happen with T = n.k for integer n and some fixed tick-interval
k; the resulting values of t = (T +v.X/c)/√(1−v.v) are
n.k/√(1−v.v) with tick-interval k/√(1−v.v), which
exceeds k; so the clock appears to run slow, by the given factor.] This is
known as time dilation
. It implies that an unstable particle, decaying
with some known half-life T in its rest frame, will decay with a slowed
half-life T/√(1−v.v) when moving at speed v.c.
It is a remarkable tribute to the progress of measuring technology in the twentieth century that, at its start, none of the above discrepancies had the remotest chance of being measured (although the not totally unrelated precession of the orbit of Mercury had been measured, and was unexplained): while, by its end, all had been measured – and found to be in accord with the predictions of relativistic theory.
If we observe a clock from a position with fixed value of x, and its ticks
are evidenced by light flashes, the interval between moments at which we see
these flashes will change as the clock passes us. This is known as the
Doppler shift
. Chose [X,T] origin so that the clock is at X = 0 and
ticks at moments T = n.k for integer n and some tick-interval k. Chose [x,t]
origin so that our observer is at x = 0 and the clock passes the observer at t =
0. [In principle, the clock could pass the observer at a moment when it does
not tick; this will make no practical difference, but for now presume that we
adjust the clock by whatever fraction of a tick it takes to ensure it does tick
at the moment of passing the observer.]
For t < 0, which is equally T < 0, the light flashes we see are from
light travelling forwards from an [X,T] = [0,n.k] with n < 0; forward light
from such events follows paths [x,t] = u.[c,1] +
[v.c,1].n.k/√(1−v.v) for u varying and thus reaches x = 0 when u =
−n.k.v/√(1−v.v) yielding t =
n.k.(1−v)/√(1−v.v) = n.k.√((1−v)/(1+v)). For t
> 0, the light flashes we see are from light traveling backwards from [X,T] =
[0,n.k] with n > 0 along trajectories [x,t] = u.[−c,1]
+[v.c,1].n.k/√(1−v.v) which hit x = 0 when u =
n.v.k/√(1−v.v) yielding t = n.k.(1+v)/√(1−v.v) =
n.k.√((1+v)/(1−v)). This is exactly the factor we would get for the
t < 0 case if v were negated, so we can just say that the time interval
between observations of ticks is k.√((1−v)/(1+v)) when a clock with
tick-interval k is observed approaching at speed v, with recession expressed as
approaching
at negative speed.
Thus, while the clock (once we infer the time interval between its ticks, by
correcting for the time taken by light pulses in reaching us) runs slow by the
factor √(1−v.v), we observe ticks from it at a rate which, as it
approaches, is fast by the factor √((1+v)/(1−v)) but, once it has
passed, is slow by the inverse of this factor. As for clocks, so for any other
periodic process in time: if some constituent of a distant star's photosphere
perturbs its spectrum (from its natural black-body form) at some given frequency
f (corresponding to 1/k) then the spectrum of the star, as we observe it, will
be likewise perturbed at a frequency f.√((1+v)/(1−v)) if the star is
moving towards us at speed v.c. For v > 0, a star actually moving towards
us, this frequency is > f; the increase in frequency is called a
blue-shift
because blue light is at the high-frequency end of the
spectrum of visible light. For v < 0, meaning the star is really moving away
from us at (positive) speed −v.c, the frequency is < f; the decrease is
called a red-shift
because red is at the low-frequency end of the visible
spectrum.
Notice that, in the [X,T] or [x,t] plane, X = 0 iff x = v.c.t and T = 0 iff c.t = x.v. Thus, for any position whose spatial co-ordinates, collectively, have smaller magnitude than c times its time co-ordinate, there is some velocity v.c = x/t in whose frame the given position is at the origin; likewise, for any position whose spatial magnitude is greater than c times time co-ordinate, there is a choice of velocity v.c = c.c.t/x in whose frame the given position has zero time-coordinate. In the first case, there is only one such frame of reference; but in the second case, relative to a frame which sees the position have zero time co-ordinate, any velocity perpendicular to the spatial position vector yields another frame which also sees the position have zero time co-ordinate. Thus, insisting that the lines x = ±c.t coincide with their analogues in any other frame of reference (i.e. that light goes at the same speed in all frames) leads to the conclusion that:
at the same place; and
simultaneous.
We may, thus, divide displacements in space-time into three classes:
in the same place– and we can sub-divide this type into
forwardand
backwardvariants according as the given frame of reference deems the
startof the displacement to happen before or after its
end.
Notice that
so this combination is an invariant, just as rotating a (cartesian
orthonorml) spatial co-ordinate system preserves the sum of squares of
co-ordinates, i.e. the squared radius parameter. In the spatial case, the
change of co-ordinates is a rotation and preservation of the radial parameter is
a characteristic property: so we may ask what analogue of rotation we can find
for our invariant under the Lorentz transformation. For a rotation, we would
have seen [Y,Z] = [y.Cos(A) −z.Sin(A), y.Sin(A) +z.Cos(A)] and described
the rotation by the angle A. The sum of squares of Cos and Sin is 1 for all
angles. Their hyperbolic
analogues cosh and sinh (which can be obtained
from looking at Cos and Sin of imaginary angles) are defined by
and the difference of their squares is always one: for any x,
making cosh and sinh natural substitutes for Cos and Sin. Indeed, a transformation of form
has Y.Y −Z.Z = y.y −z.z, as is easily verified. Using Y =
c.T, y = c.t, Z = X, z = x puts the Lorentz transformation in this form with
cosh(b) = 1/√(1−v.v) and sinh(b) = −v/√(1−v.v).
Thus the Lorentz transformation is a hyperbolic rotation
rather than a
cross-time shear.
Rather than adding velocities, to determine the velocity of a third observer
relative to a first when we know their velocities relative to a second, we must
now combine hyperbolic rotations: for parallel velocities this will amount to
adding the parabolic angles
, since
Thus it makes more sense to discuss hyperbolic angles than velocities. The ratio (: sinh(b)/cosh(b) ←b :) = tanh, analogous to the usual Tan of angles, gives us our speed as c.tanh(b); at variance with the above, I'll hereafter measure the hyperbolic angle in the sense that gives it the same sign as velocity.
In these terms, the Doppler shift factor √((1+v)/(1−v)) for a source approaching the observer at speed v.c with v = tanh(b) becomes
so if we divide the observed frequency of a spectral line by the frequency it was at when produced, and take the log of the result, we get the hyperbolic angle whose tanh, when multiplied by c, gives the speed, towards us, of the source of the observed radiation.
Given a first frame of reference, a second with velocity c.v relative to it and a third with velocity c.u relative to the second, solve tanh(b) = v, tanh(a) = u and the velocity of the third with respect to the first will be
This, then, is the addition law for velocities: at speeds tiny compared to the speed of light (i.e. the distance travelled in a nano-second must be tiny compared to the foot) the factor 1+u.v will be so close to 1 that it will be hard to detect, so the addition law will be indistinguishable from the simple addition law that was expected by a shear-based transformation of co-ordinates. Thus (again) the new rule, though different from the old, is not contradicted by the large body of evidence previously collected in support of the old.
In particular, replacing velocities with hyperbolic angles implies that we should describe the expansion of the universe by the rate of change, with distance, of hyperbolic angle or, equivalently, of the log of the Doppler shift factor. This will be the usually quoted Hubble constant, 2.3 ± .2 atto Hz, divided by the speed of light, which yields 73 pico / light year; the hyperbolic angle changes by 1 in 13.8 giga light years, give or take 10%.
See also: constant acceleration.
We have already seen that, when x and X are the spatial co-ordinates
parallel to the relative velocity of two frames sharing a common origin, with t
and T as the time co-ordinates, the equation c.c.t.t −x.x = c.c.T.T
−X.X holds. Since the spatial co-ordinates perpendicular to the relative
motion are untransformed, we can as readilly add the sum of their squares to the
square of x or X to get the usual square of distance from the origin, r.r and
its equivalent R.R, and infer that c.c.t.t −r.r = c.c.T.T −R.R,
regardless of our frame of reference; it no longer matters whether the velocity
is parallel to some co-ordinate direction. We may thus interpret this quanity
as the natural metric
of space-time, just as r.r = x.x +y.y +z.z served
as the metric of space; note, however, that the metric of a space-like
displacement is negative while that of a time-like displacement is
positive.
For any two displacements u and v, we can compute our metric for u+v and subtract from it the sum of its values for u and v separately; let g(u,v) be defined to be half of the remainder. It is clear enought that g must be symmetric, since vector and scalar addition are. When u = v, the metric of u+v = 2.u is 4 times that of u, since the metric is quadratic in the components of u; subtracting the sum of u and v's several metrics just subtracts 2 times the metric of u, which we halve, finding thus that g(u,u) is simply u's metric.
The analogous construction for the familiar spatial metric gives a g(u,v)
which we can construe as the product of lengths of u and v times Cos of the
angle between them; however, if we take two light-like vectors with equal time
component but opposite space component, we get a positive g(u,v) despite each of
g(u,u) and g(v,v) being zero; so we cannot generally interpret the inner
product
g gives us quite so simply. However, if we look at two time-like
displacements, of sizes t and T along the time axes of two frames of reference,
we see that g does combine them to yield c.t.c.T.cosh(b) where b is the
hyperbolic angle – i.e. c.tanh(b) is the speed of relative motion –
between the two frames.
We've thus far worked with co-ordinates which abide by an implicit rule: we
have one time-wards co-ordinate vector and three mutually perpendicular spatial
co-ordinate vectors; and all of them are unit vectors in appropriate senses. In
a frame with co-ordinates [t,x,y,z] we have implicit vectors in the four
directions described; a displacement whose co-ordinates in this frame are
[t,x,y,z] is really a vector which we express as a sum of four terms, t times a
time-wards unit
vector with co-ordinates [1,0,0,0] in this frame, plus
each of x, y and z times its own spatial unit
vector. These four vectors
form a basis of our vector space of displacements in space-time.
When we transform to the co-ordinates [T,X,Y,Z] of a frame of reference
moving at speed c.tanh(b) relative to this first one, we have y = Y, z = Z, x =
X.cosh(b) +c.T.sinh(b) and t = T.cosh(b) +sinh(b).X/c, from which we may infer
that the second frame's basis vectors, expressed in the co-ordinates of the
first, are [0,0,1,0] for Y, [0,0,0,1] for Z, [sinh(b)/c,cosh(b),0,0] for X and
[cosh(b),c.sinh(b),0,0] for T. The values of c.c.t.t −r.r for these are
−1 for the spatial ones and c.c for the time-wards one, as for the
original frame's basis. If we work with a time co-ordinate c.t, measured in
units of length, we replace our time basis vector with 1/c times the original
and get a c.c.t.t −r.r value for it of 1. When we compute g(u,v) with any
two of our basis vectors, we get zero; so, aside from the spatial basis vectors
having metric −1, we see that the bases we have been working with are
orthonormal
in much the same way as spatial bases made of mutually
orthogonal unit vectors.
Let D be the vector space of displacements in special relativity's flat
space-time. As for any linear space, it has a dual
, the space of linear
maps from D to {reals}, dual(D) = {linear map ({reals}:|D)}. Although g, as
defined, takes two vectors and produces a number, we can construe it as taking a
first vector, u, and yielding a member of dual(D), g(u), which is specified by
the fact that, when fed a second vector, v, as input it yields g(u,v) as output.
We thereby construe g as a linear map from D to its dual, (dual(D):g|D).
In general, we can replace our orthonormal bases with an arbitrary basis of space-time: this will give us co-ordinates just as an orthonormal basis does, but it will make the metric a somewhat messier quadratic form, in the new co-ordinates, than the simple combination of squares of co-ordinates that we had for an orthonormal basis. Alongside a basis (D:b|n), where n is the dimension of D, we get a dual basis (dual(D):q|n), for which q(i)·b(j) is 1 when i = j and 0 otherwise. We can write any displacement r in D as r = sum(: u(i).b(i) ←i :) for some ({reals}:u:n); contracting q(j) with this for any j in n yields q(j)·r = u(j), so we can infer
Contracting g with the identity on D then yields
in which each g(b(i)) is in dual(D). Exploiting sum(q×b) as an identity on dual(D) analogous to sum(b×q) on D, we can thus express g as
in which each g(b(j),b(i)) is a scalar. When we use an orthonormal basis, each of these scalars is ±1 when i = j, else 0; for other bases, this need not be the case.
Now, any vector in D is a thing in its own right, regardless of its components with respect to any particular basis. When we change basis between two orthonormal co-ordinate systems, the two sets of components for the vector are related to one another in a manner determined by the appropriate Lorentz transformation: but this just describes the relationship between the bases used by the two co-ordinate systems and the duals of these two bases. The vector is still itself: as is its image, in dual(D), under g. As for members of D, so also for members of spaces derived from it by mere multiplication by dimensioned quantities, notably including ones with units of mass/time.
Consider the trajectory a body follows in space-time. The description of that trajectory in terms of co-ordinates will depend on frame of reference: but the trajectory itself does not. If the body is not accelerating, then its velocity (as seen by any frame of reference) does not change; and its trajectory is a straight line. Now, all displacements between points on a straight line are multiples of any non-zero displacement along the line; and we know that the trajectories of bodies are time-like, so the metric of any such non-zero displacement is positive. So take a forward time-like displacement along the body's trajectory and divide it by the (positive) square root of its metric. Since the metric is quadratic in the components of the vector, its square root is proportional to the vector, among parallel vectors; so we will get the same result no matter what forward displacement along the line we chose; and this result will be a unit vector.
If we look at this unit vector in the rest frame of the body whose trajectory it describes, it is manifestly a time-wards vector of size 1 (or size c if we want to measure its components in units of time, instead of units of length). As seen by a second observer with relative speed c.tanh(b), for some b, this vector is a vector with time-ward component cosh(b) and spatial component sinh(b). The space/time slope is just the speed/c, tanh(b). If the second observer constructs a vector parallel to it with time-wards component 1, the spatial component of this will be tanh(b), the velocity/c of the moving observer.
Thus the unit forward time-like vector parallel to the trajectory of a body emerges as a natural entity to take the rôle of the velocity/c of a body. The hyperbolic angle between it and any given rest-frame's time axis is just its inner product with the given frame's origin's corresponding unit vector. It is independent of frame of reference and describes the body's state of movement. We should thus expect that multiplying it by the body's mass should give us a natural replacement for the (spatial) momentum/c, which should be independent of our frame of reference.
Starting with the rest-frame of the body, let us look at the product of its (rest) mass, m, and the given unit vector; call this vector p. Seen in another frame of reference with relative speed c.tanh(b), it will have spatial component m.sinh(b) and time component m.cosh(b). Notice that the latter is bigger than m, and the former is bigger than the body's m.speed/c by the same factor, cosh(b). Thus, as in the rest frame, the momentum is a multiple of a vector whose time-wards component is 1 and whose spatial part is the body's (apparent) velocity/c. The multiplier, m.cosh(b), thus presents itself as the body's apparent mass, as seen by this frame: it is the rest-frame's mass scaled by the same factor, cosh(b), as the time intervals between clock ticks. We are thus led to the conclusion that bodies become more massive when moving.
Now, taking v = tanh(b), we have cosh(b) = 1/√(1−v.v) which is, for small v, well approximated by 1+v.v/2. The apparent mass of our body is thus m +m.v.v/2; the surplus is simply the kinetic energy of the body divided by the square of the speed of light. This leads to the conclusion that energy is able to manifest itself as mass and begs the question of whether there is any difference between mass and energy, aside from a factor of the square of the speed of light.
We should also consider Maxwell's equations and whether they are invariant under the Lorentz transformation. In order to do so, we need to examine how the electric and magnetic fields transform.
In the presence of a uniform parallel magnetic field B, a charged particle's velocity parallel to B is unaffected, while its movement projected onto the plane perpendicular to B follows a circular path. [Proof: let its velocity, at some moment, comprise a component u parallel to B and a component v perpendicular to B; its charge being q, the force on it is B^(v+u).q = B^v.q, since B^u = 0; this force is perpendicular to B, hence to u, so does not disturb u; it is also perpendicular to B and v, yielding m.dv/dt = B^v.q in the two-dimensional plane perpendicular to B, which is just a simple harmonic oscillator m.m.ddv/dt/dt = m.B^dv/dt.q = B^(B^v).q.q which, with v perpendicular to B, is just g(q.B,q.B).v, so that ddv/dt/dt is just v scaled by the (negative) constant g(q.B/m,q.B/m). Now I need to sort out the effects of accleration-transformation …]
To this end, we must consider an experimenter performing electromagnetic experiments, as observed from a relatively moving frame of reference. Let the experimenter's velocity, as seen by this frame of reference, be c.v with v = tanh(b) for some b.
First, let the experimenter hold two bodies of mass m, with charges +q and −q, a distance h apart. From their rest frame, we know they experience a mutually attractive force square(q/h)/(4.π.ε0). The observer who sees the experimenter moving will view each charge as a current

Written by Eddy.