Libunbin - library / toolkit for reading binary things



Libunbin (library for "unbinarying") is a library / toolkit for safely reading binary structures. Its primary use so far is for reading partitions and filesystems in Xen disk images, but certainly it could be applied in other areas. For instance, as a safe way to snoop and decode network traffic.

The primary goals are security and ease of use. For security we want to ensure that malicious data structures cannot compromise the reading program (a frequent cause of vulnerabilities in ethereal / wireshark for example). Coupled with the security goal, we want to keep descriptions of binary structures easy to audit and separate from other code which just uses them. For ease of use we want to be able to easily import descriptions of binary structures — we can import C structures directly out of C code and C header files, including common annotations for bitfields, signedness and endianness. In general libunbin ought to be as easy or easier to use than direct C structures.

This web page is all the documentation which exists so far, but we hope to have much more comprehensive documentation in future.

Overview and filetypes

Libunbin is controlled through a master program called libunbin which has various subcommands to do things like importing C, printing the auditable structures, generating stubs, and so on.

There are also various files, either ones which you must write, or ones which are generated:

File types used and generated by libunbin
Filename Made? Description
file.ubd Supplied by user Libunbin description file for importing C structures. This file is basically C code with some additional macros. You can place #include statements here to include ordinary C header files, or you can write C structures directly. The format is described in detail in the next sections.
file.ubb Generated Libunbin file containing the description of the binary structures (often called the "UBB file"). This is the intermediate file format between the various import formats and the various outputs like code stubs. Although this file is a binary file, you can dump out its contents using libunbin -print-ubb file.ubb.
file.ubm Supplied by user, optional This file contains metadata which can be applied to UBB files to add or change the binary structure. The primary use of this is to annotate imported C structures. For example if the C struct doesn't contain information about endianness, then you can annotate fields from here.
file_stubs.c file_stubs.h Generated The generated C code for accessing binary structures. Generated The generated OCaml code for accessing binary structures.
file.xml Supplied by user, optional An alternate way to describe binary structures is to write an XML description. Useful if describing a binary structure where you have no suitable C header file, and it also allows you much more control. Supplied by user or generated An alternate way to describe binary structures is to dump out the UBB file and edit it by hand, then reimport it. This gives you ultimate flexibility, but is not very portable (ie. will break between libunbin releases).

Note about UBB files

UBB files are not compatible between libunbin releases. For each new release you should regenerate them from source.


The diagram below shows how the various files are related. In this case we have written code to decode the EXT2/3 superblock. You can also find this example in the libunbin source distribution.

workflow diagram for ext3

Importing C structs, UBD files

To import a C struct you must write a .ubd file which contains C code and includes and some special macros. Here is ext3.ubd for importing the ext3 superblock and a constant:

#include <linux/magic.h>
#include <linux/ext3_fs.h>

typedef struct ext3_super_block LIBUNBIN_EXPORT(ext3_super_block);

Any type (struct or union) which is to be exported must be written as a typedef with LIBUNBIN_EXPORT(name) macro around the type, where name is the type name as it should be exported.

Any constant should use one of the LIBUNBIN_CONSTANT_* macros. These are defined in libunbin-ubd-prefix.h which is included automatically in any UBD file.

For Linux kernel header files, it is often useful to define the following symbols in your UBD file:

#define __CHECKER__      1
#define __CHECK_ENDIAN__ 1

The effect of defining these is to enable the special __be32, __le32, etc types to be recognised by libunbin as indicating endianness.

To compile a UBD file to UBB, do:

libunbin -ubd file.ubd

Annotating C structs with metadata, UBM files

You can write a .ubm file (or several) to transform or annotate C structures. This is because C structures typically don't contain enough information to generate stubs properly (eg. they are missing endianness information, or often they don't have any information about dependencies between parts of the struct).

For example in the ext3 superblock there is a large s_reserved field at the end of the structure which serves no purpose other than to pad the C structure out to the right size. Unless told otherwise libunbin will generate stubs for reading this field, but that is not really useful and just makes reading slower. One solution to this would be to edit the Linux system header file, but that is quite intrusive. A better solution is to write a metadata file instructing libunbin to remove the unwanted field:

delete ext3_super_block(s_reserved);

To transform a UBB file using metadata, do:

libunbin -transform file.ubb file.ubm

(This updates file.ubb).

Further documentation for metadata files to follow ...

Writing XML descriptions

You can describe structures directly in XML (no C required). This gives you a portable way to describe structures, accessing the full power of libunbin.

Further documentation for XML files to follow ...

Generating stub code

To generate C stubs from an UBB file, do:

libunbin -stubs file.ubb

Pay careful attention to any warnings, since they often indicate security issues. If you see warnings then read the section on auditing below.

The above command generates file_stubs.c and file_stubs.h. You can examine the header file to find out what functions are available.

To generate OCaml stubs, do:

libunbin -ml-stubs file.ubb

This generates

Other topics

Hand-modifying UBB files

The UBB file format is not portable, meaning that it can change between releases of libunbin. Nevertheless it is possible to modify UBB files by hand. Firstly dump out the UBB file as text:

libunbin -print-ubb file.ubb >

Now you can make the required edits to (refer to in the libunbin source to find out how it works). Then reimport the edited file to UBB:

libunbin -ml

(This generates or overwrites file.ubb).

Security auditing

Libunbin is designed to generate secure stub code. However this does not stop the need to audit structures. For example, if a binary file contains an integer field indicating that an array of a billion structures follows, libunbin will try to read it (you will have got a warning when the stubs were being generated, but the programmer may have ignored it).

It is important therefore to audit the binary structures and if necessary to add assertions about the valid range of numbers in the file, to avoid situations like the above.

You should audit the UBB file after it has been imported from source and any metadata transforms applied. Once you have such an UBB file, do:

libunbin -print-ubb file.ubb | less

More about auditing to follow ...

rjones AT redhat DOT com

$Id: index.html,v 1.5 2007/10/30 17:11:05 rjones Exp $