SELinux Memory Protection Tests

Ulrich Drepper
2006-4-3

SELinux restricts certain memory protection operation if the appropriate boolean values enable these checks. The current status of the booleans can be seen set set using four pseudo files:

/selinux/booleans/allow_execheap
/selinux/booleans/allow_execmem
/selinux/booleans/allow_execmod
/selinux/booleans/allow_execstack

There are four additional error classes. If triggered, the audit logs will show something like this (this is on x86):

type=AVC msg=audit(1134060846.670:49): avc:  denied  { execmod } for  pid=11659 comm="testprog" name="inputfile" dev=hda5 ino=292941 scontext=user_u:system_r:unconfined_t:s0-s0:c0.c255 tcontext=user_u:object_r:user_home_t:s0 tclass=file
type=SYSCALL msg=audit(1134060846.670:49): arch=40000003 syscall=125 success=no exit=-13 a0=b7f70000 a1=32 a2=5 a3=11acc0 items=0 pid=11659 auid=4294967295 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 comm="testprog" exe="/home/drepper/testprog"
type=AVC_PATH msg=audit(1134060846.670:49):  path="/home/drepper/inputfile"

The amount of output might be intimidating, but most of the data is not really important. The above data tells us that:

This is the specific information for calls which have execmod problems. It looks slightly different for other types of errors. The error types related to memory protection are:

execmod

The program mapped from a file with mmap and the MAP_PRIVATE flag and write permission. Then the memory region has been written to, resulting in copy-on-write (COW) of the affected page(s). This memory region is then made executable with code like this:

#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>

int
main (void)
{
  int fd = open ("inputfile", O_RDWR);
  char *p = mmap (NULL, 42, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
  p[0] = 'a';
  int r = mprotect (p, 42, PROT_READ | PROT_EXEC);
  printf ("mprotect = %d\n", r);
  return 0;
}

The mprotect call will fail with EACCES in this case. Note that this happens even though the mprotect call removes the write permissions. To fix the problem one could map the file for execution right from the beginning and avoid the problem. But this is very dangerous: programs should never use memory regions which are writable and executable at the same time. Assuming that it is really necessary to generate executable code while the program runs the method employed should be reconsidered.

One frequent cause for execmod problems at least on x86 is the existence of text relocation. Text relocations are a sign that the binary or DSO is not built correctly. More on this on this page dedicated to this problem. In short, bug whoever is responsible for building the binary.

execmem

There are two situations when this error can appear:

If the program really needs this behavior there is no really easy way out. One possibility is to create an anonymous file (just unlink it after creation), size the file using ftrunctate, and then map the file in two places. In one place map it with MAP_SHARED and write permission but without execution. For the second mapping use execution permissions but no write permissions. This might be a bit confusing at first but can be handled. The program must be adjusted to write to one location and expect to execute code in another one. This is reasonably safe in case the two mappings are allowed to be randomied. The example code in the next section illustrates how this should work.

In the case of an anonymous mapping the best thing to do is to explicitly copy the file into an anonymous file and then proceed as described in the previous paragraph.

execstack

As the name suggests, this error is raised if a program tries to make its stack (or parts thereof) executable with an mprotect call. This should never, ever be necessary. Stack memory is not executable on most OSes these days and this won't change. Executable stack memory is one of the biggest security problems. An execstack error might in fact be most likely raised by malicious code.

See my overview of security features in Fedora and RHEL for more information, specifically appendix A. It explains how to avoid executable stacks.

execheap

The POSIX specification does not permit it, but the Linux implementation of mprotect allows changing the access protection of memory on the heap (e.g., allocated using malloc). This error indicates that heap memory was supposed to be made executable. Doing this is really a bad idea. If anonymous, executable memory is needed it should be allocated using mmap which is the only portable mechanism.

Example code to avoid execmem violations

The following is a little program which illustrates how to avoid the execmem violations. As described in short in the execmem description, we map the same memory twice, write into one mapping and execute the second one. The generated code simply does a getppid system call and then passes the value to print_int.

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/memfd.h>

static void
__attribute__ ((noinline))
print_int (int v)
{
  printf ("%d\n", v);
}

int
main (void)
{
#ifndef __NR_memfd_create
  char tmpfname[] = "/home/drepper/.execmemXXXXXX";
  int fd = mkstemp (tmpfname);
  unlink (tmpfname);
#else
  int fd = syscall(__NR_memfd_create, "test", MFD_CLOEXEC);
#endif
  ftruncate (fd, 1000);
  char *writep = mmap (NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  char *execp = mmap (NULL, 4096, PROT_READ|PROT_EXEC, MAP_SHARED, fd, 0);

#if __x86_64__
  writep[0] = '\x48';
  writep[1] = '\x83';
  writep[2] = '\xec';
  writep[3] = '\x08';
  writep[4] = '\xb8';
  *(int *) (writep + 5) = __NR_getppid;
  writep[9] = '\x0f';
  writep[10] = '\x05';
  writep[11] = '\x89';
  writep[12] = '\xc7';
  writep[13] = '\x48';
  writep[14] = '\xb8';
  *(char **) (writep + 15) = (char *) &print_int;
  writep[23] = '\xff';
  writep[24] = '\xd0';
  writep[25] = '\x48';
  writep[26] = '\x83';
  writep[27] = '\xc4';
  writep[28] = '\x08';
  writep[29] = '\xc3';
#elif __i386__
  writep[0] = '\xb8';
  *(int *) (writep + 1) = __NR_getppid;
  writep[5] = '\xcd';
  writep[6] = '\x80';
  writep[7] = '\x50';
  writep[8] = '\xe8';
  *(int *) (writep + 9) = (char *) &print_int - (execp + 13);
  writep[13] = '\x58';
  writep[14] = '\xc3';
#else
# error "architecture not supported"
#endif

  fputs ("asm code: ", stdout);
  asm volatile ("call *%0" : : "r" (execp) : "ax", "cx", "dx");

  fputs ("getppid: ", stdout);
  print_int (getppid ());

  munmap (writep, 4096);
  munmap (execp, 4096);
  close (fd);
  return 0;
}

The code contains two different code paths to create the mapping, or more correctly, to get the file descriptor to perform the mapping. Before the 3.17 kernel the only way to do this was to create an actual file. A requirement for this code to work is that the filesystem on which the file is created must allow execution. Some sites might (wisely) decide to mount filesystems like /tmp and /var/tmp with the noexec option. One can either detect this (the second mmap calls fails with EPERM) or actively look for directory with appropriate permissions. The example code opens the file in my home directory which is executable. The file is immediately unlinked and therefore will not leave any clutter behind, even if the program crashes.

The modern way is to use the memfd_create system call. It creates a file descriptor that can be (mostly) used like that for a file but no file is created. Everything takes place in memory. As can be seen in the code the changes needed compared to the old code are minimal. But of course this approach is better since no filesystem needs to allow execution, one less possible source of problems. Reliable in-memory code generation is now possible.

One noteworthy part of the code is the generation of the call to print_int. The instruction itself is in writep[8]. The following word is the relative address. To compute the relative address it is necessary to use the address of the following instruction in the executable mapping pointed to by execp, not the writable mapping pointed to by writep.

Using this approach instead of one mapping which is writable and executable at the same time is safer because the attacker has to know two independently randomized addresses (this assumes mmap is allowed to perform the randomizations).