Making C Programs Memory Safe with MemInstrument

Find the MemInstrument code, an extension to the LLVM Compiler Infrastructure, at GitHub.

TL;DR

MemInstrument is a framework to compile memory safe executables from C code. It automatically inserts memory safety checks at compile-time. These checks are executed at runtime and ensure that the program reports a memory safety violation instead of accessing invalid memory. MemInstrument provides several previously-published memory safety instrumentations in a common framework. For memory safety mechanism designers, it provides common instrumentation functionality and thereby enables an easy and fair comparison to previous work.

What is memory safety?

In languages such as C, using a pointer to access memory outside of its allocation bounds or accessing it after the allocation was freed is undefined behavior. In many cases, this will go unnoticed by the programmer. However, it is a threat to security, because such an unintended access can leak or alter information. A program is considered memory safe if no such memory access error can occur for any input.

You might wonder if memory safety is still an issue, and, unfortunately, it is. The CWE Top 25 list several memory safety errors (such as "Out-of-bounds Write/Read", "Use After Free") in their ranking of the most dangerous software vulnerabilities. Considered as a single aspect, they outclass all other errors by far.

How to ensure memory safety in C?

A variety of tools have been proposed to ensure memory safety in C. They typically work as an extension to your C compiler, and so does MemInstrument. During the compilation of the program, it is instrumented, i.e. code is inserted into the program. This inserted code checks the safety of the execution at runtime. In the following example, we focus on spatial safety properties, i.e. ensuring that all accesses are within their bounds. Consider the init function below:

void init(int *ar, int size) {
  for (int i = 0; i < size; i += 2) {
    if (i+1 < size) {
      ar[i+1] = 1;
    }
    ar[i] = 0;
  }
}

The function receives a pointer ar and a size, and fills the memory from ar up to ar + size - 1 with alternating zeros and ones. A memory safety instrumentation ensures that the memory accesses are safe. For this purpose, it places calls to a check function right before every access. In the instrumented init function below, we indicated this with a call to checkIB. The arguments to this function are the base and bound of the allocation, the memory location accessed, as well as the width of the access. Base and bound values are obtained through the loadBase and loadBound function that the instrumentation provides.

void init(int *ar, int size) {
  intptr_t baseAr = loadBase(ar);
  intptr_t boundAr = loadBound(ar);

  for (int i = 0; i < size; i += 2) {
    if (i+1 < size) {
      checkIB(baseAr, boundAr, ar+i+1, sizeof(int));
      ar[i+1] = 1;
    }
    checkIB(baseAr, boundAr, ar+i, sizeof(int));
    ar[i] = 0;
  }
}

During the execution of the program, the checkIB calls fail if they detect an out-of-bounds access, thereby preventing a memory access to an invalid location.

What are the benefits of MemInstrument?

MemInstrument abstracts common tasks for memory safety instrumentations, such as finding locations to place the check calls and the bounds loads and stores. This allows programmers to easily implement their safety mechanism without redoing all the tedious work. At the same time, having the checks at the same code locations enables a fair comparison of the implemented approaches. Additionally, by providing an abstraction from the used mechanism, it is possible to provide shared optimizations. For example, MemInstrument has an optimization that removes checks at compile-time whenever the same memory location has already been checked at a dominating code location.

If you implement a new mechanism, you can easily compare to the ones already implemented in the framework, on the fair ground of a common LLVM version and with the same check placement, without having to reimplement the related work.

Which memory safety mechanisms does MemInstrument provide?

MemInstrument currently provides three spatial safety mechanisms:

Code

People