Xen hypervisor calls from OCaml

Downloads

The problem

The Xen hypervisor is a small kernel which sits below the Linux kernel(s) when Xen is running. You can make hypercalls into the Xen hypervisor - they are rather like system calls. The hypercalls allow you to do things like pulling out a list of running domains, allocating a new domain, assigning domains to physical CPUs and so on.

In libvirt we make a lot of use of talking directly to the Xen hypervisor through hypercalls.

The problems are two-fold: Firstly Xen change the binary format of hypercalls frequently and whenever they feel like it. Secondly, C is a clumsy language in which to either describe the size / alignment / endianness of binary structures, or to deal with structures which have multiple versions (because of the frequent changes).

A better approach is to use a Domain-Specific Language (DSL) for describing binary structures, and have this language compiled into highly efficient code to pack and unpack such structures.

pa_bitfields

pa_bitfields is a camlp4 macro which adds bitfield/binary structures to the OCaml language, and automatically writes functions for packing and unpacking them.

For example to code to make the hypercall to detect the current version of Xen being used is reduced to the definition of the hypercall operation structure and code to fill that structure and make the hypercall:

type struct hypercall = {
  op : int64;    (* Operation: 64 bit int with native endianness *)
  arg[5] : int64 (* Up to 5 parameters. *)
}

(* Open the special file for making hypercalls. *)
let fd = Unix.openfile "/proc/xen/privcmd" [Unix.O_RDWR] 0

(* Construct the hypercall operation structure and pack it. *)
let hc = bits_of_hypercall {
  zero_hypercall with
    op = 17_L;   (* 17 is hypervisor_xen_version. *)
    arg0 = 0_L;
}

(* Do the hypercall, return the major:minor. *)
let r = ioctl fd ioctl_privcmd_hypercall hc

type struct xen_packed_version = {
  major : int:16;
  minor : int:16
}

let { major = major; minor = minor } = xen_packed_version_of_int32 r ;;

printf "hypervisor version: major = %d  minor = %d\n" major minor;

In this example, bits_of_hypercall is the automatically generated packing function for the hypercall operation structure, and xen_packed_version_of_int32 is another automatically generated unpacking function for the packed major << 16 | minor returned by Xen.

(Note: full documentation can be found in the download at the top of this page)

mlock

For large structures passed and returned to the Xen hypervisor, Xen requires that the structures are locked into memory using mlock(2).

The basic reason for this is as follows: the hypervisor sits below the kernel, but can make use of the pagetables currently in effect to convert the virtual addresses used by the process into physical addresses. However because the hypervisor isn't in the kernel, what it cannot do is swap memory back in (requires kernel functions). So all memory that the hypervisor might need to touch must be first swapped in.

OCaml doesn't support mlock/munlock, but adding bindings was trivial. A further problem is that the garbage collector can kick in at any time and move memory around -- obviously a big problem if we are trying at the same time to construct a series of interlinked structures to pass to a hypercall. OCaml lets you turn off compaction, so memory is never moved (but over the long term this is less efficient).

So I wrote two functions which improve on C's memory locking by allowing us to wrap code which performs hypervisor operations so that memory is locked and compaction is temporarily disabled. Even if the code throws exceptions, locks are released and compaction is restored automatically:

open Mlock

let do_a_hypercall () =
  with_no_compaction (
    fun () ->
      (* Allocate memory structures. *)
      let r = String.create (maxids * sizeof_xen_getdomaininfo) in

      let operation = bits_of_system_operation {
	zero_system_operation with
	  cmd = xen_v2_op_getdomaininfolist;
	  interface_version = sys_interface_version;
	  first_domain = first;
	  max_domains = maxids;
	  domain_info = address_of r;
	  num_domains = maxids
      } in

      let hc = bits_of_hypercall {
	zero_hypercall with
	  op = hypervisor_sysctl;
	  arg0 = address_of operation
      } in

      with_mlock [r; operation; hc] (
	fun () ->
          (* Make the hypercall. *)
	  ioctl fd ioctl_privcmd_hypercall hc
      )
  )

The special operators with_no_compaction and with_mlock protect code from compaction and lock memory respectively, and are safe w.r.t. exceptions.

ioctl

This code contains simple bindings for ioctl(2). We also use pa_bitfields to emulate the Linux _IOC macro for constructing ioctl command words.

To-do

Parts of pa_bitfields are only really sketched out at the moment. In particular: packing bitfields into strings, writing all the various types of sizes of integers in various endianness, and unpacking bitfields.

Support inheritence so we can do common parts of structures, like this:

type common = {
  operation : int32;
  length : int32;
}
and t1 = {
  header : common;
  t1data : int32;
}
and t2 = {
  header : common;
  t2data : char;
}

Optional fields. Because C alignment depends on architecture you can get a situation where a structure like this:

struct {
  int32 a;
  int64 b;
};

is packed together on a 32 bit platform, but has a 4 byte space in the middle on a 64 bit platform. We can either treat this simply as two different versions of the struct, or else hack it with an optional padding field which only appears on the 64 bit platform. I'm not sure which is better.

Lots more optimisations.

Versioning

Currently using multiple versions of a structure is painful, particularly at runtime. Consider a simple case where we can detect up to two different versions of Xen at runtime. (In reality there would be many more versions). We may have a structure which is different between the two versions, as in this made-up example:

type struct v1_call {
  v1_domid : int32;
  v1_flags : int32;
  v1_cpu_time : int32;
}
type struct v2_call {
  v2_domid : int32;
  v2_flags : int32;
  v2_cpu_time : int64;
}

There is only a small change between structures here, but now our code has to do this:

if xen_version = 1 then (
  let hc = bits_of_v1_call {
    v1_domid = Int32.of_int domid;
    v1_flags = Int32.of_int flags;
    v1_cpu_time = Int32.of_int cpu_time;
  } in
  do_v1_hypercall hc
) else (
  let hc = bits_of_v2_call {
    v2_domid = Int32.of_int domid;
    v2_flags = Int32.of_int flags;
    v2_cpu_time = Int64.of_int cpu_time;
  } in
  do_v2_hypercall hc
)

Similarly the do_*_hypercall functions are also duplicated, which may involve a lot of extra code.

What we'd really like to do is have a standard structure, perhaps a "maximal" one (ie. all fields, extended to their max size), and just change the packing functions to understand which fields are relevant. (Although this weakens type guarantees, which is itself a problem).

Another thought was to solve this with functors, something like this:

module type Input =
sig
  type t (* The structure type *)
  val bits_of_t : t -> string
end

module type S =
sig
  type t (* The structure type *)
  val do_hypercall : t -> unit
end

module Make (Inp : Input) : S with type t = Inp.t

The caller does:

module V1 = Make (
  struct
    type t = v1_call
    let bits_of_t = bits_of_v1_call
  end
)
module V2 = Make (
  struct
    type t = v2_call
    let bits_of_t = bits_of_v2_call
  end
)

(* ... *)
if xen_version = 1 then (
  let hc = {
    v1_domid = Int32.of_int domid;
    v1_flags = Int32.of_int flags;
    v1_cpu_time = Int32.of_int cpu_time;
  } in
  V1.do_hypercall hc
) else (
  let hc = {
    v2_domid = Int32.of_int domid;
    v2_flags = Int32.of_int flags;
    v2_cpu_time = Int64.of_int cpu_time;
  } in
  V2.do_hypercall hc
)

This eliminates code duplication in do_hypercall but the packing (and unpacking) and type conversion is still explicit.

A further thought along these lines using a kind of "textual trick":

module type Input =
sig
  type call
  val bits_of_call : call -> string
end

module type S =
sig
  type call
  val do_hypercall : call -> unit
end

module Make (Inp : Input) : S with type call = Inp.call

module V1 = Make (
  struct
    type struct call = {
      domid : int32;
      flags : int32;
      cpu_time : int32;
    }
  end
)

module V2 = Make (
  struct
    type struct call = {
      domid : int32;
      flags : int32;
      cpu_time : int64;
    }
  end
)

(* etc. *)

rjones AT redhat DOT com

$Id: index.html,v 1.5 2007/07/07 16:24:52 rjones Exp $