Compression programs take files and make them smaller, generally producing a
binary file that uses the full range of values for the output bytes. Such files
are not always suitable for transmission over a network and one can't usefully
print them out. There are other situations where using the full range of values
open to bytes is undesirable. Generally, one handles this (e.g. in e-mail
attachments) by using some encoding which reduces the file to purely printable
ASCII; most commonly, base 64 encoding
. This is easy to do – every
three bytes of the input yield four characters of output, in a way that's
computationally easy to perform – but somewhat inefficient, since there
are 100 printable ASCII characters, six of which are spaces.
In practice, it makes sense to leave out the six space characters: various applications treat them specially and may mess with them; and some applications don't like to see lines of length greater than some limit, so it's desirable that we be able to insert newlines at arbitrary points in our compressed file and ignore them when uncompressing. It's also desirable that we be able to print out a compressed file and recover it thereafter; the form-feed, carriage return, and vertical tab have special meaning to printers which would mess this up; the horizontal tab is apt to do the same and, when it doesn't, to be mistaken for a plain space. We could keep the space character, but some applications strip trailing space at the end of lines; and leaving it out ensures that simply inserting newlines at regular intervals in the file produces something that, in a fixed-width font, prints out with a neat right margin, making it easy to recognise variation in line-length as corruption. We are thus left with 94 characters.
One could, of course, construct a base 94
encoding; however (since 94
isn't a power of two) this would be somewhat clumsy to implement. It would be
far better to have compression programs actually know to target the reduced
character-set and use it as efficiently as possible. At output, we can add
newlines at regular intervals; and, optionally, form-feed characters (which tell
printers to start a new line) at regular intervals after newlines so as to make
printing work nicely. The implied file format would be related to the usual
form of the compression program, but different enough to warrant a distinct file
extension and MIME type. It would thus be interesting to see how readilly
various existing compression programs could be adapted to support a common
command-line option
Use alternate ASCII-armoured output
format. When compressing, if =cols
is specified, insert a newline
after each cols
characters of output unless cols
is
zero (in which case no newlines are inserted); the default is to act as if
cols
were given as the value of the COLUMNS
environment variable, if set, else 72. If ,lines
is also given,
add a form-feed after every lines
lines of output; by default, no
form-feeds are added. When uncompressing, [=cols[,lines]]
is
ignored if given; all spacing characters in the input are ignored.