Buffer over-runs and culpable negligence

Many bugs in programs are buffer over-runs: many security exploits are based on these, as they enable an attacker to deliver code of their own chosing to be executed by the program. There really is no excuse for writing code which makes the relevant mistake - I intend to disect the mistake and lay bare its workings here.

In general, a buffer is a portion of memory reserved by the computer program in which to hold some data, usually temporarily. The problem comes when the data to be stored in that piece of memory is bigger than the piece of memory - in which case, if the program doesn't notice this, the data fills up the reserved piece of memory and over-writes whatever was written in the next portions of memory along. This is called a buffer over-run. It may damage other data being used by the program, which may lead to the program crashing; but, in the worst case, it over-writes the program itself, so that the surplus data gets executed in place of the programmer's instructions. This last case is the usual one for security exploits, as the attacker supplies some data which are, in fact, instructions for the machine, in machine code.

This problem is widely understood among programmers, as is its natural solution - allocate a big enough buffer once you know how big your data will be, and check that you aren't over-running your space when there's any doubt. If a buffer has enough space to hold 80 characters of text, don't try to write more than 80 characters into it; and, wherever possible, find out how much data you'll be processing and allocate a big enough buffer for it.

Static arrays considered harmful

All the same, one can make mistakes: so it's a good idea to ensure that, in the event of an over-run, the memory you'll be trampling on isn't the program itself but something less critical - corrupting data and crashing is bad, but surrendering control of the machine to an attacker is worse. It happens, in fact, that the discipline needed to achieve this brings with it some further benefits. To illustrate, let me show you some buggy code written in C:

static int tohex(char ch) {
  if (isdigit(ch)) return ch - '0';
  if (islower(ch)) return ch - 'a' + 10;
  if (isupper(ch)) return ch - 'A' + 10;
  return -1;
}

#define BUFSIZ 100
extern size_t urldecode(char *text) {
  /* returns the lengh of text it managed to decode */
  static char buffer[BUFSIZ];
  size_t r = 0, w = 0; /* for Read (from text) and Write (to buffer) */
  while (text[r] && w < BUFSIZ) {
    char ch = text[r++];

    switch (ch) {
    case '%':
      /* over-simplified buggy %-decoding */
      ch = tohex(text[r++]); /* sequence point */
      ch = 16 * ch + tohex(text[r++]);
      break;

    case '+':
      ch = ' ';
      break;

    default:
      break;
    }

    buffer[w++] = ch;
  }
  memcpy(text, buffer, w);
  if (w < r) text[w] = '\0';
  return r;
}
#undef BUFSIZ

This is the classic buffer over-run scenario: a function which processes an array of data, needs some temporary workspace for it and assumes that the data will always be less than some fixed size (here, 100 bytes - the size of array buffer). If the green code to check against it were omitted (a not uncommon mistake), the code could over-run its buffer: as it is, it may lose some data. There's several incidental bugs in its caricature of %-decoding, so don't go using it ! [Furthermore, replacing each use of buffer with text in the above, while removing the green code and the call to memcpy, would be perfectly sensible - w never exceeds r and each read from the latter happens before the write to the former - the buffer isn't really needed. I would say that makes the above a contrived example: but I've seen plenty of code as clumsy as the above, and the Code Red bug seems to depend on such a bug in a URL-decoding routine.] There are, however, situations where one genuinely needs a transitory buffer in a function, via which to perform some transformation on the memory containing the supplied data; when there's some strong enough reason to be confident we know how bit that needs to be, a fixed-size array is more convenient (and efficient) than allocating and freeing buffers of the right size every time.

The author has assumed that the text, once urldecoded, will never be more than 100 characters long. This may well be usually true (albeit, these days, many URLs are so long the assumption is unwise: an URL can be arbitrarily long) but if an attacker can arrange for the decoded text to exceed 100 characters, the data would (without the green check) over-run our buffer array. As it is, the defence against unexpected size obliges the function to return the length of the decoded portion of the original, while over-writing some of the original with its decoded form; which will complicate life for its caller, but mainly because my chosen example is a caricature, in which a mistake has been patched up clumsily.

Now, the two functions are declared to be static and extern, respectively. This means that urldecode is accessible to code in other source files of the program (it has external linkage), while tohex is only accessible to the code in urldecode's source file (it has local linkage). Both functions will appear, when the program is running, in the run-time loaded image of the program. When a function is called, the program allocates some space, on a so-called stack, for the local variables of the function - text, r, w and ch - which are known as automatic variables. The stack is a separate portion of memory from the one in which the program itself is stored: and, if two threads of execution are using these functions in parallel with one another (in a single running process) each has its own stack, hence its own local variables, though they share the memory which contains the program itself.

However, declaring buffer to be static (which I've done in red to highlight my disapproval of it, which I'll come to) tells the compiler to arrange for this array to be in part of the same memory that holds the program itself - the executable instructions into which tohex and urldecode get compiled. This has several consequences. The reason many programmers use static arrays in this way is that there is a small performance advantage, especially if one wants a larger buffer than just 100 bytes. There are, however, some costs.

Forcing the array to be part of the code memory means that there will only be one array, so different threads of execution may trample on one another's data if they try to execute urldecode at the same time as one another. However, most programmers assume their code is never going to be run in a multi-threaded context - with the result that they write things which can't be re-used in such a context. Even then, using static means that the program takes up the 100 bytes of space reserved for the array even when it's not using the function that requires the array: 100 bytes may not seem like much, but the widespread use of static arrays in a program will bloat up the memory it needs to load. With an automatic array, in contrast, the array only takes up memory when the program is executing the code which uses the array.

A modern computer uses a cache of memory into which it copies memory that it accesses a lot, such as the memory with the instructions on it making up our functions. Our array is part of that memory, and our code writes data into the array. When that happens, the computer has to synchronise the cached memory with the memory of which it's a copy. This wouldn't happen if the buffer were, like the other local variables, automatic - which is what it'd be if we left out the red use of static. So, even without threading, the code above forces gratuitous caching activity when it writes to its static array: if the array were automatic, the code memory would only ever be read, never written to; only memory on the stack would get modified.

Furthermore, because the array is static, the memory adjacent to it, on which any over-run will trample, is the memory which holds the code which is being executed. If this gets over-written, the changed version may be executed - the excess data may get used as instructions to the computer, over-riding the program. This is the classic approach to buffer over-run attacks. For comparison, an automatic array is held in memory which holds the stack of local data for functions which have been invoked but are waiting for another to complete before they can resume doing their computation. Sufficient memory is set aside for the automatic array as the program calls urldecode, and relinquished when the function returns. The memory adjacent to this is local data of other functions: messing with it may well cause the program to crash. However, over-running the array won't change the memory which contains code - at least, not not without exploiting parts of the stack (if any) which are used to control what code gets executed. Thus, bloating code with static arrays not only hampers threading and cacheing, it also raises the stakes if memory over-run happens.

Written by Eddy.