Maxwell's equations have an elegant simplicity; but it was rapidly noticed that, subject to assumptions usual at the time, if the electric and magnetic fields observed and measured by one experimenter were to follow his equations, then another experimenter, moving at a constant non-zero velocity relative to the first while observing and measuring the same phenomena, would infer (from the motions of charges and currents) fields that did not (quite) obey Maxwell's equations. This caused some consternation. Resolving the problems this raised was to lead to a great break-through in our understanding of the universe.

The simplest problem came with the fact that Maxwell's equations predict electromagnetic radiation that should propagate at a quite specific speed – a speed inferred from the magnetic permeability and electric permittivity of space, without reference to anyone's state of motion, nor even to that of the electromagnetic radiation. It so happens the speed in question was a good match for the best available measured values, at the time, for the speed of light; and in diverse respects, light behaves exactly as Maxwell predicted this electromagnetic radiation would behave. Whether the radiation predicted by the theory was actually light, however, didn't make any difference to the problem; whether or not it was light, the theory predicted its speed, apparently regardless of who was observing it. This flew in the face of how everyone supposed one experimenter's observations would relate to those of another moving at constant velocity with respect to the first. If two friends are sat on different trains observing a bird fly along between them, all going in the same direction but at different speeds, and the drivers of our trains report their respective speeds, it was expected that the two would to get the same answers if each added their measured (relative to their own train) speed of the bird to the reported speed of their train; and, as for the bird, so for a flash of light or a fluctuation in the electromagnetic field. But then, if the trains speeds are different, adding the same constant speed (of light) to each cannot yield the same total for the speed of the light relative to the ground; nor can that total be the same as the constant we each added to our own speed.

Now, the speed of light is very big – a million feet per millisecond, more or less – which had always presented problems for measuring it accurately, let alone accurately enough to spot (as something other than experimental error) differences in the measured value on the scale of relative speeds of different observers. The fastest any two laboratories could have been travelling, with respect to one another, while measuring anything, at the time, would have been if they'd both been on Earth's equator, diametrically opposite one another; that gives a relative speed of almost one kilometre per second (a little over two thousand miles per hour) and the difference could just about have been made up by putting the two laboratories onto trains going East at top speed. Even at 1 km/s, however, noticing any discrepancy in the speed of light, as measured by each, would have required precision better than one part in three hundred thousand. A laboratory (optionally on a train, travelling East at full speed, on the equator) could have done a bit better by measuring the speed of light from some consistent distant source at two points in the year, half a hear apart, once while moving towards the source, once away; that bumps up the relative speed to nearly one five thousandth of the speed of light. Even so, they'd have needed to be sure they could distinguish a discrepancy of one fiftieth of a percent from experimental error, which would have proven challenging at the time.

So it seemed entirely possible that light was moving at a constant
speed *relative to* some (as yet unnoticed) frame of reference,
dubbed the æther

, in which Maxwell's equations were exactly
correct; and our Earthly observations of electromagnetic effects merely almost
exactly followed Maxwell's equations. All experiments did confirm Maxwell's
predictions, without taking into account the experimenter's motion relative to
the æther, but that might just be because the corrections needed to
account for their motion would all have been too tiny for anyone to
measure. So people thought about ways to measure our state of motion relative
to The One True Frame – but, when they came up with ways of doing so,
they always got a null result; apparently, light goes at the same speed in all
directions, which should only be true for someone at rest relative to the
æther. The experiment Michelsen and Morley performed would have been
sure to spot any discrepancy on the order of the Earth's spinning and orbit
around the sun; yet it saw none, and it didn't seem likely that we were in
fact at rest with respect to anything sensible, much less the æther,
while doing all that spinning and orbiting.

Lorentz, Poincare, FitzGerald and others worked out how to reconcile
the *apparent* fit to Maxwell's equations, in all experiments, by
having the universe conspire to make us measure lengths and times slightly
wrong, to a degree dependent on our speed relative to the æther, so that
all our experiments will produce results consistent with Maxwell's theory
despite the fact that we're moving relative to the one frame of reference in
which Maxwell's equations *actually* hold. Eventually, someone had to
notice that if there's no way to tell the illusion apart from reality, it's
real

(paraphrasing a quote I dimly remember as attributed to
Poincaré, but can't find any confirming reference); Einstein took it as
a physical fact that the speed of light (in vacuum) is the same for all
observers and in all directions.

That rendered the æther redundant. Since it also conflicted with the expected rules for relative velocities, those had to go too; which meant the relationship between different observers' descriptions of events had to be revisited. Previously it was supposed that – if one observer were moving at speed v.c relative to another, in a fixed direction, using this direction for the x co-ordinate of an orthonormal system of spatial co-ordinates, combined with a time co-ordinate t – then transforming everything this observer measures to use a co-ordinate X = x −v.c.t in place of x, with the same time co-ordinate and other spatial co-ordinates, would yield a faithful description of what the other observer sees. This proves incompatible with the speed of light being the same to all observers, however; so it is necessary to consider how the first observer's spatial co-ordinates and time, [x, y, z, t], are related to those of the second, [X, Y, Z, T]. The transformation previously assumed had [X, Y, Z, T] = [x−v.c.t, y, z, t], which is formally a shear.

Einstein showed that, by assuming the constancy of the speed of light, and that the relation between the two systems is linear, one can infer the form of the transformation; in place of the shear, one obtains:

- c.T = (c.t −v.x)/√(1 −v.v)
- X = (x −v.c.t)/√(1 −v.v)

with [Y, Z] = [y, z]. This is known as the Lorentz tranformation, as it's the rule Lorentz and others had found it necessary to impose to keep their illusions intact. For the inverse transformation, expressing [x, t] in terms of [X, T], it suffices to replace v with −v (which doesn't change 1−v.v) and swap the rôles of [x, t] and [X, T] in the above.

The tranformation can be restated, by
exploiting the hyperbolic

analogues of sine and cosine, sinh = (:
exp(x) −exp(−x) ←x :)/2 and cosh = (: exp(x) +exp(−x)
←x :)/2, as:

- c.T = c.t.cosh(b) −x.sinh(b)
- X = −c.t.sinh(b) +x.cosh(b)

where b = log((1+v)/(1−v))/2, so that

- cosh(b)
- = (√((1+v)/(1−v)) +√((1−v)/(1+v)))/2
- = (1 +v +1 −v )/2/√(1 −v.v)
- = 1/√(1 −v.v)
- sinh(b)
- = (√((1+v)/(1−v)) −√((1−v)/(1+v)))/2
- = (1 +v −1 +v )/2/√(1 −v.v)
- = v/√(1 −v.v)

The ratio, tanh(b) = sinh(b)/cosh(b) = v, then gives the velocity as
c.tanh(b). In this form, the transformation is analogous to a rotation, with
b taking the place of the angle of rotation; if a third system of co-ordinates
is obtained, for an observer moving at speed c.tanh(a) relative to the second,
then this is related to the first exactly as above but with a+b in place of b;
so this hyperbolic angle

parameter takes over being additive where
velocity fails.

The Lorentz transformation leads to the Doppler effect; light emitted with frequency f by a source moving towards an observer at speed c.tanh(b) has frequency f.exp(b) when received. To put it another way, the logarithm of the frequency (divided by some specified unit) shifts by b; the logarithm of the wavelength thus likewise shifts by −b. The transformation also leads to a classification of displacements into time-like, light-like and space-like according as the displacement's time component, times the speed of light, is bigger than, equal to or smaller than the space-like component

One can show that, when x and X are
the spatial co-ordinates parallel to the relative velocity of two frames
sharing a common origin, with t and T as the time co-ordinates, the equation
c.c.t.t −x.x = c.c.T.T −X.X holds. Since the spatial co-ordinates
perpendicular to the relative motion are
untransformed, we can as readilly add the sum of their squares to the
square of x or X to get the usual square of distance from the origin, r.r and
its equivalent R.R, and infer that c.c.t.t −r.r = c.c.T.T −R.R,
regardless of our frame of reference; it no longer matters whether the
velocity is parallel to some co-ordinate direction. We may thus interpret
this quanity as the natural metric

of space-time, just as r.r = x.x
+y.y +z.z served as the metric of space; note, however, that the metric of a
space-like displacement is negative while that of a time-like displacement is
positive.

For any two displacements u and v, we can compute our metric for u+v and subtract from it the sum of its values for u and v separately; let g(u,v) be defined to be half of the remainder. It is clear enought that g must be symmetric, since vector and scalar addition are. When u = v, the metric of u+v = 2.u is 4 times that of u, since the metric is quadratic in the components of u; subtracting the sum of u and v's several metrics just subtracts 2 times the metric of u, which we halve, finding thus that g(u,u) is simply u's metric.

The analogous construction for the familiar spatial metric gives a g(u,v)
which we can construe as the product of lengths of u and v times Cos of the
angle between them; however, if we take two light-like vectors with equal time
component but opposite space component, we get a positive g(u,v) despite each
of g(u,u) and g(v,v) being zero; so we cannot generally interpret the inner
product

g gives us quite so simply. However, if we look at two time-like
displacements, of sizes t and T along the time axes of two frames of
reference, we see that g does combine them to yield c.t.c.T.cosh(b) where b is
the hyperbolic angle – i.e. c.tanh(b) is the speed of relative motion
– between the two frames.

We've thus far worked with co-ordinates which abide by an implicit rule:
we have one time-wards co-ordinate vector and three mutually perpendicular
spatial co-ordinate vectors; and all of them are unit vectors in appropriate
senses. In a frame with co-ordinates [t,x,y,z] we have implicit vectors in
the four directions described; a displacement whose co-ordinates in this frame
are [t,x,y,z] is really a vector which we express as a sum of four terms, t
times a time-wards unit

vector with co-ordinates [1,0,0,0] in this
frame, plus each of x, y and z times its own spatial unit

vector. These four vectors form a basis of our vector space of displacements
in space-time.

When we transform to the co-ordinates [T,X,Y,Z] of a frame of reference
moving at speed c.tanh(b) relative to this first one, we have y = Y, z = Z, x
= X.cosh(b) +c.T.sinh(b) and t = T.cosh(b) +sinh(b).X/c, from which we may
infer that the second frame's basis vectors, expressed in the co-ordinates of
the first, are [0,0,1,0] for Y, [0,0,0,1] for Z, [sinh(b)/c,cosh(b),0,0] for X
and [cosh(b),c.sinh(b),0,0] for T. The values of c.c.t.t −r.r for these
are −1 for the spatial ones and c.c for the time-wards one, as for the
original frame's basis. If we work with a time co-ordinate c.t, measured in
units of length, we replace our time basis vector with 1/c times the original
and get a c.c.t.t −r.r value for it of 1. When we compute g(u,v) with
any two of our basis vectors, we get zero; so, aside from the spatial basis
vectors having metric −1, we see that the bases we have been working
with are orthonormal

in much the same way as spatial bases made of
mutually orthogonal unit vectors.

Let D be the vector space of displacements in special relativity's flat
space-time. As for any linear space, it has a dual

, the space of
linear maps from D to {reals}, dual(D) = {linear map ({reals}:|D)}. Although
g, as defined, takes two vectors and produces a number, we can construe it as
taking a first vector, u, and yielding a member of dual(D), g(u), which is
specified by the fact that, when fed a second vector, v, as input it yields
g(u,v) as output. We thereby construe g as a linear map from D to its dual,
(dual(D):g|D).

In general, we can replace our orthonormal bases with an arbitrary basis of space-time: this will give us co-ordinates just as an orthonormal basis does, but it will make the metric a somewhat messier quadratic form, in the new co-ordinates, than the simple combination of squares of co-ordinates that we had for an orthonormal basis. Alongside a basis (D:b|n), where n is the dimension of D, we get a dual basis (dual(D):q|n), for which q(i)·b(j) is 1 when i = j and 0 otherwise. We can write any displacement r in D as r = sum(: u(i).b(i) ←i :) for some ({reals}:u:n); contracting q(j) with this for any j in n yields q(j)·r = u(j), so we can infer

- r = sum(: b(i).(q(i)·r) ←i |n), and
- sum(: b(i)×q(i) ←i |n) is the identity on D.

Contracting g with the identity on D then yields

- g = sum(: g(b(i))×q(i) ←i |n)

in which each g(b(i)) is in dual(D). Exploiting sum(q×b) as an identity on dual(D) analogous to sum(b×q) on D, we can thus express g as

- g = sum(: g(b(j),b(i)).q(j)×q(i) ← [i,j] :)

in which each g(b(j),b(i)) is a scalar. When we use an orthonormal basis, each of these scalars is ±1 when i = j, else 0; for other bases, this need not be the case.

Now, any vector in D is a thing in its own right, regardless of its components with respect to any particular basis. When we change basis between two orthonormal co-ordinate systems, the two sets of components for the vector are related to one another in a manner determined by the appropriate Lorentz transformation: but this just describes the relationship between the bases used by the two co-ordinate systems and the duals of these two bases. The vector is still itself: as is its image, in dual(D), under g. As for members of D, so also for members of spaces derived from it by mere multiplication by dimensioned quantities, notably including ones with units of mass/time.

Consider the trajectory a body follows in space-time. The description of that trajectory in terms of co-ordinates will depend on frame of reference: but the trajectory itself does not. If the body is not accelerating, then its velocity (as seen by any frame of reference) does not change; and its trajectory is a straight line. Now, all displacements between points on a straight line are multiples of any non-zero displacement along the line; and we know that the trajectories of bodies are time-like, so the metric of any such non-zero displacement is positive. So take a forward time-like displacement along the body's trajectory and divide it by the (positive) square root of its metric. Since the metric is quadratic in the components of the vector, its square root is proportional to the vector, among parallel vectors; so we will get the same result no matter what forward displacement along the line we chose; and this result will be a unit vector.

If we look at this unit vector in the rest frame of the body whose trajectory it describes, it is manifestly a time-wards vector of size 1 (or size c if we want to measure its components in units of time, instead of units of length). As seen by a second observer with relative speed c.tanh(b), for some b, this vector is a vector with time-ward component cosh(b) and spatial component sinh(b). The space/time slope is just the speed/c, tanh(b). If the second observer constructs a vector parallel to it with time-wards component 1, the spatial component of this will be tanh(b), the velocity/c of the moving observer.

Thus the unit forward time-like vector parallel to the trajectory of a body emerges as a natural entity to take the rôle of the velocity/c of a body. The hyperbolic angle between it and any given rest-frame's time axis is just its inner product with the given frame's origin's corresponding unit vector. It is independent of frame of reference and describes the body's state of movement. We should thus expect that multiplying it by the body's mass should give us a natural replacement for the (spatial) momentum/c, which should be independent of our frame of reference.

Starting with the rest-frame of the body, let us look at the product of its (rest) mass, m, and the given unit vector; call this vector p. Seen in another frame of reference with relative speed c.tanh(b), it will have spatial component m.sinh(b) and time component m.cosh(b). Notice that the latter is bigger than m, and the former is bigger than the body's m.speed/c by the same factor, cosh(b). Thus, as in the rest frame, the momentum is a multiple of a vector whose time-wards component is 1 and whose spatial part is the body's (apparent) velocity/c. The multiplier, m.cosh(b), thus presents itself as the body's apparent mass, as seen by this frame: it is the rest-frame's mass scaled by the same factor, cosh(b), as the time intervals between clock ticks. We are thus led to the conclusion that bodies become more massive when moving, by the same factor as the time dilation resulting from their movement.

Now, taking v = c.tanh(b), we have cosh(b) = 1/√(1−v.v) which is, for small v, well approximated by 1+v.v/2. The apparent mass of our body is thus m +m.v.v/2; the surplus is simply the kinetic energy of the body divided by the square of the speed of light. This leads to the conclusion that energy is able to manifest itself as mass and encourages one to doubt whether there is any difference between mass and energy, aside from a factor of the square of the speed of light.

We should also consider Maxwell's equations and whether they are invariant under the Lorentz transformation. In order to do so, we need to examine how the electric and magnetic fields transform.

In the presence of a uniform parallel magnetic field B, a charged particle's velocity parallel to B is unaffected, while its movement projected onto the plane perpendicular to B follows a circular path. [Proof: let its velocity, at some moment, comprise a component u parallel to B and a component v perpendicular to B; its charge being q, the force on it is B^(v+u).q = B^v.q, since B^u = 0; this force is perpendicular to B, hence to u, so does not disturb u; it is also perpendicular to B and v, yielding m.dv/dt = B^v.q in the two-dimensional plane perpendicular to B, which is just a simple harmonic oscillator m.m.ddv/dt/dt = m.B^dv/dt.q = B^(B^v).q.q which, with v perpendicular to B, is just g(q.B,q.B).v, so that ddv/dt/dt is just v scaled by the (negative) constant g(q.B/m,q.B/m). Now I need to sort out the effects of accleration-transformation … ]

To this end, we must consider an experimenter performing electromagnetic experiments, as observed from a relatively moving frame of reference. Let the experimenter's velocity, as seen by this frame of reference, be c.v with v = tanh(b) for some b.

First, let the experimenter hold two bodies of mass m, with charges +q and
−q, a distance h apart. From their rest frame, we know they experience
a mutually attractive force square(q/h)/(4.π.ε_{0}). The
observer who sees the experimenter moving will view each charge as a current