From: Mandeep Singh Baines The verity target provides transparent integrity checking of block devices using a cryptographic digest. dm-verity is meant to be setup as part of a verified boot path. This may be anything ranging from a boot using tboot or trustedgrub to just booting from a known-good device (like a USB drive or CD). dm-verity is part of ChromeOS's verified boot path. It is used to verify the integrity of the root filesystem on boot. The root filesystem is mounted on a dm-verity partition which transparently verifies each block with a bootloader verified hash passed into the kernel at boot. Changes in V3: * Discussion over irc (Alasdair G Kergon) * Implement ioctl hook Changes in V2: * https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert) * Use shash API instead of older hash API Signed-off-by: Will Drewry Signed-off-by: Elly Jones Signed-off-by: Mandeep Singh Baines Cc: Alasdair G Kergon Cc: Milan Broz Cc: Olof Johansson Cc: Steffen Klassert Cc: dm-devel@redhat.com --- Documentation/device-mapper/dm-bht.txt | 59 + Documentation/device-mapper/dm-verity.txt | 76 ++ drivers/md/Kconfig | 30 drivers/md/Makefile | 2 drivers/md/dm-bht.c | 559 +++++++++++++++ drivers/md/dm-verity.c | 1051 ++++++++++++++++++++++++++++++ drivers/md/dm-verity.h | 45 + include/linux/dm-bht.h | 166 ++++ 8 files changed, 1988 insertions(+) create mode 100644 Documentation/device-mapper/dm-bht.txt create mode 100644 Documentation/device-mapper/dm-verity.txt create mode 100644 drivers/md/dm-bht.c create mode 100644 drivers/md/dm-verity.c create mode 100644 drivers/md/dm-verity.h create mode 100644 include/linux/dm-bht.h Index: linux-3.3-rc3/Documentation/device-mapper/dm-bht.txt =================================================================== --- /dev/null +++ linux-3.3-rc3/Documentation/device-mapper/dm-bht.txt @@ -0,0 +1,59 @@ +dm-bht +====== + +dm-bht provides a block hash tree implementation. The use of dm-bht allows +for integrity checking of a given block device without reading the entire +set of blocks into memory before use. + +In particular, dm-bht supplies an interface for creating and verifying a tree +of cryptographic digests with any algorithm supported by the kernel crypto API. + +The `verity' target is the motivating example. + + +Theory of operation +=================== + +dm-bht is logically comprised of multiple nodes organized in a tree-like +structure. Each node in the tree is a cryptographic hash. If it is a leaf +node, the hash is of some block data on disk. If it is an intermediary node, +then the hash is of a number of child nodes. + +dm-bht has a given depth starting at 1 (ignoring the root node). Each level in +the tree is concretely made up of dm_bht_entry structs. Each entry in the tree +is a collection of neighboring nodes that fit in one page-sized block. The +number is determined based on PAGE_SIZE and the size of the selected +cryptographic digest algorithm. The hashes are linearly ordered in this entry +and any unaligned trailing space is ignored but included when calculating the +parent node. + +The tree looks something like: + +alg= sha256, num_blocks = 32767 + [ root ] + / . . . \ + [entry_0] [entry_1] + / . . . \ . . . \ + [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127] + / ... \ / . . . \ / \ + blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767 + +root is treated independently from the depth and the blocks are expected to +be hashed and supplied to the dm-bht. hash blocks that make up the entry +contents are expected to be read from disk. + +dm-bht does not handle I/O directly but instead expects the consumer to +supply callbacks. The read callback will always receive a page-align value +to pass to the block device layer to read in a hash value. + +Usage +===== + +The API provides mechanisms for reading and verifying a tree. When reading, all +required data for the hash tree should be populated for a block before +attempting a verify. This can be done by calling dm_bht_populate(). When all +data is ready, a call to dm_bht_verify_block() with the expected hash value will +perform both the direct block hash check and the hashes of the parent and +neighboring nodes where needed to ensure validity up to the root hash. Note, +dm_bht_set_root_hexdigest() should be called before any verification attempts +occur. Index: linux-3.3-rc3/Documentation/device-mapper/dm-verity.txt =================================================================== --- /dev/null +++ linux-3.3-rc3/Documentation/device-mapper/dm-verity.txt @@ -0,0 +1,76 @@ +dm-verity +========== + +Device-Mapper's "verity" target provides transparent integrity checking of +block devices using a cryptographic digest provided by the kernel crypto API. +This target is read-only. + +Parameters: payload= hashtree= alg= \ + salt= root_hexagiest= \ + [ hashstart= error_behavior= ] + + + This is the device that is going to be integrity checked. It may be + a subset of the full device as specified to dmsetup (start sector and count) + It may be specified as a path, like /dev/sdaX, or a device number, + :. + + + This is the device that that supplies the dm-bht hash data. It may be + specified similarly to the device path and may be the same device. If the + same device is used, the hash offset should be outside of the dm-verity + configured device size. + + + The cryptographic hash algorithm used for this device. This should + be the name of the algorithm, like "sha1". + + + Salt value (in hex). + + + The hexadecimal encoding of the cryptographic hash of all of the + neighboring nodes at the first level of the tree. This hash should be + trusted as there is no other authenticity beyond this point. + + + Start address of hashes (default 0). + + + 0 = return -EIO. 1 = panic. 2 = none. 3 = call notifier. + +Theory of operation +=================== + +dm-verity is meant to be setup as part of a verified boot path. This +may be anything ranging from a boot using tboot or trustedgrub to just +booting from a known-good device (like a USB drive or CD). + +When a dm-verity device is configured, it is expected that the caller +has been authenticated in some way (cryptographic signatures, etc). +After instantiation, all hashes will be verified on-demand during +disk access. If they cannot be verified up to the root node of the +tree, the root hash, then the I/O will fail. This should identify +tampering with any data on the device and the hash data. + +Cryptographic hashes are used to assert the integrity of the device on a +per-block basis. This allows for a lightweight hash computation on first read +into the page cache. Block hashes are stored linearly aligned to the nearest +block the size of a page. + +For more information on the hashing process, see dm-bht.txt. + + +Example +======= + +Setup a device; +[[ + dmsetup create vroot --table \ + "0 204800 verity payload=/dev/sda1 hashtree=/dev/sda2 alg=sha1 "\ + "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727" +]] + +A command line tool is available to compute the hash tree and return the +root hash value. + http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree Index: linux-3.3-rc3/drivers/md/Kconfig =================================================================== --- linux-3.3-rc3.orig/drivers/md/Kconfig +++ linux-3.3-rc3/drivers/md/Kconfig @@ -370,4 +370,34 @@ config DM_FLAKEY ---help--- A target that intermittently fails I/O for debugging purposes. +config DM_BHT + tristate "Block hash tree support" + select CRYPTO + select CRYPTO_HASH + ---help--- + Include support for device-mapper devices to use a block hash + tree for managing data integrity checks in a scalable way. + + Targets that use this functionality should include it + automatically. + + If unsure, say N. + +config DM_VERITY + tristate "Verity target support" + depends on BLK_DEV_DM + select DM_BHT + select CRYPTO + select CRYPTO_HASH + ---help--- + This device-mapper target allows you to create a device that + transparently integrity checks the data on it. You'll need to + activate the digests you're going to use in the cryptoapi + configuration. + + To compile this code as a module, choose M here: the module will + be called dm-verity. + + If unsure, say N. + endif # MD Index: linux-3.3-rc3/drivers/md/Makefile =================================================================== --- linux-3.3-rc3.orig/drivers/md/Makefile +++ linux-3.3-rc3/drivers/md/Makefile @@ -39,6 +39,8 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/ obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o +obj-$(CONFIG_DM_BHT) += dm-bht.o +obj-$(CONFIG_DM_VERITY) += dm-verity.o obj-$(CONFIG_DM_ZERO) += dm-zero.o obj-$(CONFIG_DM_RAID) += dm-raid.o obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o Index: linux-3.3-rc3/drivers/md/dm-bht.c =================================================================== --- /dev/null +++ linux-3.3-rc3/drivers/md/dm-bht.c @@ -0,0 +1,559 @@ + /* + * Copyright (C) 2011 The Chromium OS Authors + * + * Device-Mapper block hash tree interface. + * See Documentation/device-mapper/dm-bht.txt for details. + * + * This file is released under the GPLv2. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DM_MSG_PREFIX "dm bht" + + +/* + * Utilities + */ + +static u8 from_hex(u8 ch) +{ + if ((ch >= '0') && (ch <= '9')) + return ch - '0'; + if ((ch >= 'a') && (ch <= 'f')) + return ch - 'a' + 10; + if ((ch >= 'A') && (ch <= 'F')) + return ch - 'A' + 10; + return -1; +} + +/** + * dm_bht_bin_to_hex - converts a binary stream to human-readable hex + * @binary: a byte array of length @binary_len + * @hex: a byte array of length @binary_len * 2 + 1 + */ +static void dm_bht_bin_to_hex(u8 *binary, u8 *hex, unsigned int binary_len) +{ + while (binary_len-- > 0) { + sprintf((char *)hex, "%02hhx", (int)*binary); + hex += 2; + binary++; + } +} + +/** + * dm_bht_hex_to_bin - converts a hex stream to binary + * @binary: a byte array of length @binary_len + * @hex: a byte array of length @binary_len * 2 + 1 + */ +static void dm_bht_hex_to_bin(u8 *binary, const u8 *hex, + unsigned int binary_len) +{ + while (binary_len-- > 0) { + *binary = from_hex(*(hex++)); + *binary *= 16; + *binary += from_hex(*(hex++)); + binary++; + } +} + +static void dm_bht_log_mismatch(struct dm_bht *bht, u8 *given, u8 *computed) +{ + u8 given_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1]; + u8 computed_hex[DM_BHT_MAX_DIGEST_SIZE * 2 + 1]; + + dm_bht_bin_to_hex(given, given_hex, bht->digest_size); + dm_bht_bin_to_hex(computed, computed_hex, bht->digest_size); + DMERR_LIMIT("%s != %s", given_hex, computed_hex); +} + +/** + * dm_bht_compute_hash: hashes a page of data + */ +static int dm_bht_compute_hash(struct dm_bht *bht, struct page *pg, + unsigned int offset, u8 *digest) +{ + struct shash_desc *hash_desc = bht->hash_desc[smp_processor_id()]; + void *data; + int err; + + /* Note, this is synchronous. */ + if (crypto_shash_init(hash_desc)) { + DMCRIT("failed to reinitialize crypto hash (proc:%d)", + smp_processor_id()); + return -EINVAL; + } + data = kmap_atomic(pg); + err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE); + kunmap_atomic(data); + if (err) { + DMCRIT("crypto_hash_update failed"); + return -EINVAL; + } + if (crypto_shash_update(hash_desc, bht->salt, sizeof(bht->salt))) { + DMCRIT("crypto_hash_update failed"); + return -EINVAL; + } + if (crypto_shash_final(hash_desc, digest)) { + DMCRIT("crypto_hash_final failed"); + return -EINVAL; + } + + return 0; +} + +/* + * Implementation functions + */ + +static int dm_bht_initialize_entries(struct dm_bht *bht) +{ + /* last represents the index of the last digest store in the tree. + * By walking the tree with that index, it is possible to compute the + * total number of entries at each level. + * + * Since each entry will contain up to |node_count| nodes of the tree, + * it is possible that the last index may not be at the end of a given + * entry->nodes. In that case, it is assumed the value is padded. + * + * Note, we treat both the tree root (1 hash) and the tree leaves + * independently from the bht data structures. Logically, the root is + * depth=-1 and the block layer level is depth=bht->depth + */ + unsigned int last = bht->block_count; + int depth; + + /* check that the largest level->count can't result in an int overflow + * on allocation or sector calculation. + */ + if (((last >> bht->node_count_shift) + 1) > + UINT_MAX / max((unsigned int)sizeof(struct dm_bht_entry), + (unsigned int)to_sector(bht->block_size))) { + DMCRIT("required entries %u is too large", last + 1); + return -EINVAL; + } + + /* Track the current sector location for each level so we don't have to + * compute it during traversals. + */ + bht->sectors = 0; + for (depth = 0; depth < bht->depth; ++depth) { + struct dm_bht_level *level = &bht->levels[depth]; + + level->count = dm_bht_index_at_level(bht, depth, last) + 1; + level->entries = (struct dm_bht_entry *) + kcalloc(level->count, + sizeof(struct dm_bht_entry), + GFP_KERNEL); + if (!level->entries) { + DMERR("failed to allocate entries for depth %d", depth); + return -ENOMEM; + } + level->sector = bht->sectors; + bht->sectors += level->count * to_sector(bht->block_size); + } + + return 0; +} + +/** + * dm_bht_create - prepares @bht for us + * @bht: pointer to a dm_bht_create()d bht + * @depth: tree depth without the root; including block hashes + * @block_count:the number of block hashes / tree leaves + * @alg_name: crypto hash algorithm name + * + * Returns 0 on success. + * + * Callers can offset into devices by storing the data in the io callbacks. + */ +int dm_bht_create(struct dm_bht *bht, unsigned int block_count, + unsigned int block_size, const char *alg_name) +{ + struct crypto_shash *tfm; + int size, cpu, status = 0; + + bht->block_size = block_size; + /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */ + if ((block_size > PAGE_SIZE) || + (PAGE_SIZE % block_size) || + (to_sector(block_size) == 0)) + return -EINVAL; + + tfm = crypto_alloc_shash(alg_name, 0, 0); + if (IS_ERR(tfm)) { + DMERR("failed to allocate crypto hash '%s'", alg_name); + return -ENOMEM; + } + size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm); + + /* Pre-allocate per-cpu crypto contexts to avoid having to + * kmalloc/kfree a context for every hash operation. + */ + for (cpu = 0; cpu < nr_cpu_ids; ++cpu) { + struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL); + + bht->hash_desc[cpu] = hash_desc; + if (!hash_desc) { + DMERR("failed to allocate crypto hash contexts"); + status = -ENOMEM; + goto bad_hash_alloc; + } + hash_desc->tfm = tfm; + hash_desc->flags = 0x0; + } + bht->digest_size = crypto_shash_digestsize(tfm); + /* We expect to be able to pack >=2 hashes into a block */ + if (block_size / bht->digest_size < 2) { + DMERR("too few hashes fit in a block"); + status = -EINVAL; + goto bad_arg; + } + + if (bht->digest_size > DM_BHT_MAX_DIGEST_SIZE) { + DMERR("DM_BHT_MAX_DIGEST_SIZE too small for chosen digest"); + status = -EINVAL; + goto bad_arg; + } + + /* Configure the tree */ + bht->block_count = block_count; + if (block_count == 0) { + DMERR("block_count must be non-zero"); + status = -EINVAL; + goto bad_arg; + } + + /* Each dm_bht_entry->nodes is one block. The node code tracks + * how many nodes fit into one entry where a node is a single + * hash (message digest). + */ + bht->node_count_shift = fls(block_size / bht->digest_size) - 1; + /* Round down to the nearest power of two. This makes indexing + * into the tree much less painful. + */ + bht->node_count = 1 << bht->node_count_shift; + + /* This is unlikely to happen, but with 64k pages, who knows. */ + if (bht->node_count > UINT_MAX / bht->digest_size) { + DMERR("node_count * hash_len exceeds UINT_MAX!"); + status = -EINVAL; + goto bad_arg; + } + + bht->depth = DIV_ROUND_UP(fls(block_count - 1), bht->node_count_shift); + + /* Ensure that we can safely shift by this value. */ + if (bht->depth * bht->node_count_shift >= sizeof(unsigned int) * 8) { + DMERR("specified depth and node_count_shift is too large"); + status = -EINVAL; + goto bad_arg; + } + + /* Allocate levels. Each level of the tree may have an arbitrary number + * of dm_bht_entry structs. Each entry contains node_count nodes. + * Each node in the tree is a cryptographic digest of either node_count + * nodes on the subsequent level or of a specific block on disk. + */ + bht->levels = (struct dm_bht_level *) + kcalloc(bht->depth, + sizeof(struct dm_bht_level), GFP_KERNEL); + if (!bht->levels) { + DMERR("failed to allocate tree levels"); + status = -ENOMEM; + goto bad_level_alloc; + } + + bht->read_cb = NULL; + + status = dm_bht_initialize_entries(bht); + if (status) + goto bad_entries_alloc; + + /* We compute depth such that there is only be 1 block at level 0. */ + BUG_ON(bht->levels[0].count != 1); + + return 0; + +bad_entries_alloc: + while (bht->depth-- > 0) + kfree(bht->levels[bht->depth].entries); + kfree(bht->levels); +bad_level_alloc: +bad_arg: +bad_hash_alloc: + for (cpu = 0; cpu < nr_cpu_ids && bht->hash_desc[cpu]; ++cpu) + kfree(bht->hash_desc[cpu]); + crypto_free_shash(tfm); + return status; +} +EXPORT_SYMBOL(dm_bht_create); + +/** + * dm_bht_read_completed + * @entry: pointer to the entry that's been loaded + * @status: I/O status. Non-zero is failure. + * MUST always be called after a read_cb completes. + */ +void dm_bht_read_completed(struct dm_bht_entry *entry, int status) +{ + if (status) { + /* TODO(wad) add retry support */ + DMCRIT("an I/O error occurred while reading entry"); + atomic_set(&entry->state, DM_BHT_ENTRY_ERROR_IO); + /* entry->nodes will be freed later */ + return; + } + BUG_ON(atomic_read(&entry->state) != DM_BHT_ENTRY_PENDING); + atomic_set(&entry->state, DM_BHT_ENTRY_READY); +} +EXPORT_SYMBOL(dm_bht_read_completed); + +/** + * dm_bht_verify_block - checks that all nodes in the path for @block are valid + * @bht: pointer to a dm_bht_create()d bht + * @block: specific block data is expected from + * @pg: page holding the block data + * @offset: offset into the page + * + * Returns 0 on success, DM_BHT_ENTRY_ERROR_MISMATCH on error. + */ +int dm_bht_verify_block(struct dm_bht *bht, unsigned int block, + struct page *pg, unsigned int offset) +{ + int state, depth = bht->depth; + u8 digest[DM_BHT_MAX_DIGEST_SIZE]; + struct dm_bht_entry *entry; + void *node; + + do { + /* Need to check that the hash of the current block is accurate + * in its parent. + */ + entry = dm_bht_get_entry(bht, depth - 1, block); + state = atomic_read(&entry->state); + /* This call is only safe if all nodes along the path + * are already populated (i.e. READY) via dm_bht_populate. + */ + BUG_ON(state < DM_BHT_ENTRY_READY); + node = dm_bht_get_node(bht, entry, depth, block); + + if (dm_bht_compute_hash(bht, pg, offset, digest) || + memcmp(digest, node, bht->digest_size)) + goto mismatch; + + /* Keep the containing block of hashes to be verified in the + * next pass. + */ + pg = virt_to_page(entry->nodes); + offset = offset_in_page(entry->nodes); + } while (--depth > 0 && state != DM_BHT_ENTRY_VERIFIED); + + if (depth == 0 && state != DM_BHT_ENTRY_VERIFIED) { + if (dm_bht_compute_hash(bht, pg, offset, digest) || + memcmp(digest, bht->root_digest, bht->digest_size)) + goto mismatch; + atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED); + } + + /* Mark path to leaf as verified. */ + for (depth++; depth < bht->depth; depth++) { + entry = dm_bht_get_entry(bht, depth, block); + /* At this point, entry can only be in VERIFIED or READY state. + * So it is safe to use atomic_set instead of atomic_cmpxchg. + */ + atomic_set(&entry->state, DM_BHT_ENTRY_VERIFIED); + } + + return 0; + +mismatch: + DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)", + depth, block); + dm_bht_log_mismatch(bht, node, digest); + return DM_BHT_ENTRY_ERROR_MISMATCH; +} +EXPORT_SYMBOL(dm_bht_verify_block); + +/** + * dm_bht_is_populated - check that entries from disk needed to verify a given + * block are all ready + * @bht: pointer to a dm_bht_create()d bht + * @block: specific block data is expected from + * + * Callers may wish to call dm_bht_is_populated() when checking an io + * for which entries were already pending. + */ +bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block) +{ + int depth; + + for (depth = bht->depth - 1; depth >= 0; depth--) { + struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth, + block); + if (atomic_read(&entry->state) < DM_BHT_ENTRY_READY) + return false; + } + + return true; +} +EXPORT_SYMBOL(dm_bht_is_populated); + +/** + * dm_bht_populate - reads entries from disk needed to verify a given block + * @bht: pointer to a dm_bht_create()d bht + * @ctx: context used for all read_cb calls on this request + * @block: specific block data is expected from + * + * Returns negative value on error. Returns 0 on success. + */ +int dm_bht_populate(struct dm_bht *bht, void *ctx, unsigned int block) +{ + int depth, state; + + BUG_ON(block >= bht->block_count); + + for (depth = bht->depth - 1; depth >= 0; --depth) { + unsigned int index = dm_bht_index_at_level(bht, depth, block); + struct dm_bht_level *level = &bht->levels[depth]; + struct dm_bht_entry *entry = dm_bht_get_entry(bht, depth, + block); + state = atomic_cmpxchg(&entry->state, + DM_BHT_ENTRY_UNALLOCATED, + DM_BHT_ENTRY_PENDING); + if (state == DM_BHT_ENTRY_VERIFIED) + break; + if (state <= DM_BHT_ENTRY_ERROR) + goto error_state; + if (state != DM_BHT_ENTRY_UNALLOCATED) + continue; + + /* Current entry is claimed for allocation and loading */ + entry->nodes = kmalloc(bht->block_size, GFP_NOIO); + if (!entry->nodes) + goto nomem; + + bht->read_cb(ctx, + level->sector + to_sector(index * bht->block_size), + entry->nodes, to_sector(bht->block_size), entry); + } + + return 0; + +error_state: + DMCRIT("block %u at depth %d is in an error state", block, depth); + return -EPERM; + +nomem: + DMCRIT("failed to allocate memory for entry->nodes"); + return -ENOMEM; +} +EXPORT_SYMBOL(dm_bht_populate); + +/** + * dm_bht_destroy - cleans up all memory used by @bht + * @bht: pointer to a dm_bht_create()d bht + */ +void dm_bht_destroy(struct dm_bht *bht) +{ + int depth, cpu; + + for (depth = 0; depth < bht->depth; depth++) { + struct dm_bht_entry *entry = bht->levels[depth].entries; + struct dm_bht_entry *entry_end = entry + + bht->levels[depth].count; + for (; entry < entry_end; ++entry) + kfree(entry->nodes); + kfree(bht->levels[depth].entries); + } + kfree(bht->levels); + crypto_free_shash((bht->hash_desc[0])->tfm); + for (cpu = 0; cpu < nr_cpu_ids; ++cpu) + kfree(bht->hash_desc[cpu]); +} +EXPORT_SYMBOL(dm_bht_destroy); + +/* + * Accessors + */ + +/** + * dm_bht_set_root_hexdigest - sets an unverified root digest hash from hex + * @bht: pointer to a dm_bht_create()d bht + * @hexdigest: array of u8s containing the new digest in binary + * Returns non-zero on error. hexdigest should be NUL terminated. + */ +int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest) +{ + /* Make sure we have at least the bytes expected */ + if (strnlen((char *)hexdigest, bht->digest_size * 2) != + bht->digest_size * 2) { + DMERR("root digest length does not match hash algorithm"); + return -1; + } + dm_bht_hex_to_bin(bht->root_digest, hexdigest, bht->digest_size); + return 0; +} +EXPORT_SYMBOL(dm_bht_set_root_hexdigest); + +/** + * dm_bht_root_hexdigest - returns root digest in hex + * @bht: pointer to a dm_bht_create()d bht + * @hexdigest: u8 array of size @available + * @available: must be bht->digest_size * 2 + 1 + */ +int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available) +{ + if (available < 0 || + ((unsigned int) available) < bht->digest_size * 2 + 1) { + DMERR("hexdigest has too few bytes available"); + return -EINVAL; + } + dm_bht_bin_to_hex(bht->root_digest, hexdigest, bht->digest_size); + return 0; +} +EXPORT_SYMBOL(dm_bht_root_hexdigest); + +/** + * dm_bht_set_salt - sets the salt used, in hex + * @bht: pointer to a dm_bht_create()d bht + * @hexsalt: salt string, as hex; will be zero-padded or truncated to + * DM_BHT_SALT_SIZE * 2 hex digits. + */ +void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt) +{ + size_t saltlen = min(strlen(hexsalt) / 2, sizeof(bht->salt)); + + memset(bht->salt, 0, sizeof(bht->salt)); + dm_bht_hex_to_bin(bht->salt, (const u8 *)hexsalt, saltlen); +} +EXPORT_SYMBOL(dm_bht_set_salt); + +/** + * dm_bht_salt - returns the salt used, in hex + * @bht: pointer to a dm_bht_create()d bht + * @hexsalt: buffer to put salt into, of length DM_BHT_SALT_SIZE * 2 + 1. + */ +int dm_bht_salt(struct dm_bht *bht, char *hexsalt) +{ + dm_bht_bin_to_hex(bht->salt, (u8 *)hexsalt, sizeof(bht->salt)); + return 0; +} +EXPORT_SYMBOL(dm_bht_salt); + Index: linux-3.3-rc3/drivers/md/dm-verity.c =================================================================== --- /dev/null +++ linux-3.3-rc3/drivers/md/dm-verity.c @@ -0,0 +1,1051 @@ +/* + * Originally based on dm-crypt.c, + * Copyright (C) 2003 Christophe Saout + * Copyright (C) 2004 Clemens Fruhwirth + * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved. + * Copyright (C) 2011 The Chromium OS Authors + * All Rights Reserved. + * + * This file is released under the GPLv2. + * + * Implements a verifying transparent block device. + * See Documentation/device-mapper/dm-verity.txt + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dm-verity.h" + +#define DM_MSG_PREFIX "verity" + +/* Supports up to 512-bit digests */ +#define VERITY_MAX_DIGEST_SIZE 64 + +/* TODO(wad) make both of these report the error line/file to a + * verity_bug function. + */ +#define VERITY_BUG(msg...) BUG() +#define VERITY_BUG_ON(cond, msg...) BUG_ON(cond) + +/* Helper for printing sector_t */ +#define ULL(x) ((unsigned long long)(x)) + +#define MIN_IOS 32 +#define MIN_BIOS (MIN_IOS * 2) +#define VERITY_DEFAULT_BLOCK_SIZE 4096 + +/* Provide a lightweight means of specifying the global default for + * error behavior: eio, reboot, or none + * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify. + * This is matched to the enum in dm-verity.h. + */ +static const char * const allowed_error_behaviors[] = { "eio", "panic", "none", + "notify", NULL }; +static char *error_behavior = "eio"; +module_param(error_behavior, charp, 0644); +MODULE_PARM_DESC(error_behavior, "Behavior on error " + "(eio, panic, none, notify)"); + +/* Controls whether verity_get_device will wait forever for a device. */ +static int dev_wait; +module_param(dev_wait, bool, 0444); +MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device"); + +/* per-requested-bio private data */ +enum verity_io_flags { + VERITY_IOFLAGS_CLONED = 0x1, /* original bio has been cloned */ +}; + +struct dm_verity_io { + struct dm_target *target; + struct bio *bio; + struct delayed_work work; + unsigned int flags; + + int error; + atomic_t pending; + + u64 block; /* aligned block index */ + u64 count; /* aligned count in blocks */ +}; + +struct verity_config { + struct dm_dev *dev; + sector_t start; + sector_t size; + + struct dm_dev *hash_dev; + sector_t hash_start; + + struct dm_bht bht; + + /* Pool required for io contexts */ + mempool_t *io_pool; + /* Pool and bios required for making sure that backing device reads are + * in PAGE_SIZE increments. + */ + struct bio_set *bs; + + char hash_alg[CRYPTO_MAX_ALG_NAME]; + + int error_behavior; +}; + +static struct kmem_cache *_verity_io_pool; +static struct workqueue_struct *kveritydq, *kverityd_ioq; + +static void kverityd_verify(struct work_struct *work); +static void kverityd_io(struct work_struct *work); +static void kverityd_io_bht_populate(struct dm_verity_io *io); +static void kverityd_io_bht_populate_end(struct bio *, int error); + +static BLOCKING_NOTIFIER_HEAD(verity_error_notifier); + +/* + * Exported interfaces + */ + +int dm_verity_register_error_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&verity_error_notifier, nb); +} +EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier); + +int dm_verity_unregister_error_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&verity_error_notifier, nb); +} +EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier); + +/* + * Allocation and utility functions + */ + +static void kverityd_src_io_read_end(struct bio *clone, int error); + +/* Shared destructor for all internal bios */ +static void dm_verity_bio_destructor(struct bio *bio) +{ + struct dm_verity_io *io = bio->bi_private; + struct verity_config *vc = io->target->private; + bio_free(bio, vc->bs); +} + +static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask, + int nr_iovecs) +{ + return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs); +} + +static struct dm_verity_io *verity_io_alloc(struct dm_target *ti, + struct bio *bio) +{ + struct verity_config *vc = ti->private; + sector_t sector = bio->bi_sector - ti->begin; + struct dm_verity_io *io; + + io = mempool_alloc(vc->io_pool, GFP_NOIO); + if (unlikely(!io)) + return NULL; + io->flags = 0; + io->target = ti; + io->bio = bio; + io->error = 0; + + /* Adjust the sector by the virtual starting sector */ + io->block = to_bytes(sector) / vc->bht.block_size; + io->count = bio->bi_size / vc->bht.block_size; + + atomic_set(&io->pending, 0); + + return io; +} + +static struct bio *verity_bio_clone(struct dm_verity_io *io) +{ + struct verity_config *vc = io->target->private; + struct bio *bio = io->bio; + struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs); + + if (!clone) + return NULL; + + __bio_clone(clone, bio); + clone->bi_private = io; + clone->bi_end_io = kverityd_src_io_read_end; + clone->bi_bdev = vc->dev->bdev; + clone->bi_sector += vc->start - io->target->begin; + clone->bi_destructor = dm_verity_bio_destructor; + + return clone; +} + +/* If the request is not successful, this handler takes action. + * TODO make this call a registered handler. + */ +static void verity_error(struct verity_config *vc, struct dm_verity_io *io, + int error) +{ + const char *message; + int error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC; + dev_t devt = 0; + u64 block = ~0; + int transient = 1; + struct dm_verity_error_state error_state; + + if (vc) { + devt = vc->dev->bdev->bd_dev; + error_mode = vc->error_behavior; + } + + if (io) { + io->error = -EIO; + block = io->block; + } + + switch (error) { + case -ENOMEM: + message = "out of memory"; + break; + case -EBUSY: + message = "pending data seen during verify"; + break; + case -EFAULT: + message = "crypto operation failure"; + break; + case -EACCES: + message = "integrity failure"; + /* Image is bad. */ + transient = 0; + break; + case -EPERM: + message = "hash tree population failure"; + /* Should be dm-bht specific errors */ + transient = 0; + break; + case -EINVAL: + message = "unexpected missing/invalid data"; + /* The device was configured incorrectly - fallback. */ + transient = 0; + break; + default: + /* Other errors can be passed through as IO errors */ + message = "unknown or I/O error"; + return; + } + + DMERR_LIMIT("verification failure occurred: %s", message); + + if (error_mode == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) { + error_state.code = error; + error_state.transient = transient; + error_state.block = block; + error_state.message = message; + error_state.dev_start = vc->start; + error_state.dev_len = vc->size; + error_state.dev = vc->dev->bdev; + error_state.hash_dev_start = vc->hash_start; + error_state.hash_dev_len = vc->bht.sectors; + error_state.hash_dev = vc->hash_dev->bdev; + + /* Set default fallthrough behavior. */ + error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC; + error_mode = DM_VERITY_ERROR_BEHAVIOR_PANIC; + + if (!blocking_notifier_call_chain( + &verity_error_notifier, transient, &error_state)) { + error_mode = error_state.behavior; + } + } + + switch (error_mode) { + case DM_VERITY_ERROR_BEHAVIOR_EIO: + break; + case DM_VERITY_ERROR_BEHAVIOR_NONE: + if (error != -EIO && io) + io->error = 0; + break; + default: + goto do_panic; + } + return; + +do_panic: + panic("dm-verity failure: " + "device:%u:%u error:%d block:%llu message:%s", + MAJOR(devt), MINOR(devt), error, ULL(block), message); +} + +/** + * verity_parse_error_behavior - parse a behavior charp to the enum + * @behavior: NUL-terminated char array + * + * Checks if the behavior is valid either as text or as an index digit + * and returns the proper enum value or -1 on error. + */ +static int verity_parse_error_behavior(const char *behavior) +{ + const char * const *allowed = allowed_error_behaviors; + char index = '0'; + + for (; *allowed; allowed++, index++) + if (!strcmp(*allowed, behavior) || behavior[0] == index) + break; + + if (!*allowed) + return -1; + + /* Convert to the integer index matching the enum. */ + return allowed - allowed_error_behaviors; +} + +/* + * Reverse flow of requests into the device. + * + * (Start at the bottom with verity_map and work your way upward). + */ + +static void verity_inc_pending(struct dm_verity_io *io); + +static void verity_return_bio_to_caller(struct dm_verity_io *io) +{ + struct verity_config *vc = io->target->private; + + if (io->error) + verity_error(vc, io, io->error); + + bio_endio(io->bio, io->error); + mempool_free(io, vc->io_pool); +} + +/* Check for any missing bht hashes. */ +static bool verity_is_bht_populated(struct dm_verity_io *io) +{ + struct verity_config *vc = io->target->private; + u64 block; + + for (block = io->block; block < io->block + io->count; ++block) + if (!dm_bht_is_populated(&vc->bht, block)) + return false; + + return true; +} + +/* verity_dec_pending manages the lifetime of all dm_verity_io structs. + * Non-bug error handling is centralized through this interface and + * all passage from workqueue to workqueue. + */ +static void verity_dec_pending(struct dm_verity_io *io) +{ + if (!atomic_dec_and_test(&io->pending)) + goto done; + + if (unlikely(io->error)) + goto io_error; + + /* I/Os that were pending may now be ready */ + if (verity_is_bht_populated(io)) { + INIT_DELAYED_WORK(&io->work, kverityd_verify); + queue_delayed_work(kveritydq, &io->work, 0); + } else { + INIT_DELAYED_WORK(&io->work, kverityd_io); + queue_delayed_work(kverityd_ioq, &io->work, HZ/10); + } + +done: + return; + +io_error: + verity_return_bio_to_caller(io); +} + +/* Walks the data set and computes the hash of the data read from the + * untrusted source device. The computed hash is then passed to dm-bht + * for verification. + */ +static int verity_verify(struct verity_config *vc, + struct dm_verity_io *io) +{ + unsigned int block_size = vc->bht.block_size; + struct bio *bio = io->bio; + u64 block = io->block; + unsigned int idx; + int r; + + for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) { + struct bio_vec *bv = bio_iovec_idx(bio, idx); + unsigned int offset = bv->bv_offset; + unsigned int len = bv->bv_len; + + VERITY_BUG_ON(offset % block_size); + VERITY_BUG_ON(len % block_size); + + while (len) { + r = dm_bht_verify_block(&vc->bht, block, + bv->bv_page, offset); + if (r) + goto bad_return; + + offset += block_size; + len -= block_size; + block++; + cond_resched(); + } + } + + return 0; + +bad_return: + /* dm_bht functions aren't expected to return errno friendly + * values. They are converted here for uniformity. + */ + if (r > 0) { + DMERR("Pending data for block %llu seen at verify", ULL(block)); + r = -EBUSY; + } else { + DMERR_LIMIT("Block hash does not match!"); + r = -EACCES; + } + return r; +} + +/* Services the verify workqueue */ +static void kverityd_verify(struct work_struct *work) +{ + struct delayed_work *dwork = container_of(work, struct delayed_work, + work); + struct dm_verity_io *io = container_of(dwork, struct dm_verity_io, + work); + struct verity_config *vc = io->target->private; + + io->error = verity_verify(vc, io); + + /* Free up the bio and tag with the return value */ + verity_return_bio_to_caller(io); +} + +/* Asynchronously called upon the completion of dm-bht I/O. The status + * of the operation is passed back to dm-bht and the next steps are + * decided by verity_dec_pending. + */ +static void kverityd_io_bht_populate_end(struct bio *bio, int error) +{ + struct dm_bht_entry *entry = (struct dm_bht_entry *) bio->bi_private; + struct dm_verity_io *io = (struct dm_verity_io *) entry->io_context; + + /* Tell the tree to atomically update now that we've populated + * the given entry. + */ + dm_bht_read_completed(entry, error); + + /* Clean up for reuse when reading data to be checked */ + bio->bi_vcnt = 0; + bio->bi_io_vec->bv_offset = 0; + bio->bi_io_vec->bv_len = 0; + bio->bi_io_vec->bv_page = NULL; + /* Restore the private data to I/O so the destructor can be shared. */ + bio->bi_private = (void *) io; + bio_put(bio); + + /* We bail but assume the tree has been marked bad. */ + if (unlikely(error)) { + DMERR("Failed to read for sector %llu (%u)", + ULL(io->bio->bi_sector), io->bio->bi_size); + io->error = error; + /* Pass through the error to verity_dec_pending below */ + } + /* When pending = 0, it will transition to reading real data */ + verity_dec_pending(io); +} + +/* Called by dm-bht (via dm_bht_populate), this function provides + * the message digests to dm-bht that are stored on disk. + */ +static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst, + sector_t count, + struct dm_bht_entry *entry) +{ + struct dm_verity_io *io = ctx; /* I/O for this batch */ + struct verity_config *vc; + struct bio *bio; + + vc = io->target->private; + + /* The I/O context is nested inside the entry so that we don't need one + * io context per page read. + */ + entry->io_context = ctx; + + /* We should only get page size requests at present. */ + verity_inc_pending(io); + bio = verity_alloc_bioset(vc, GFP_NOIO, 1); + if (unlikely(!bio)) { + DMCRIT("Out of memory at bio_alloc_bioset"); + dm_bht_read_completed(entry, -ENOMEM); + return -ENOMEM; + } + bio->bi_private = (void *) entry; + bio->bi_idx = 0; + bio->bi_size = vc->bht.block_size; + bio->bi_sector = vc->hash_start + start; + bio->bi_bdev = vc->hash_dev->bdev; + bio->bi_end_io = kverityd_io_bht_populate_end; + bio->bi_rw = REQ_META; + /* Only need to free the bio since the page is managed by bht */ + bio->bi_destructor = dm_verity_bio_destructor; + bio->bi_vcnt = 1; + bio->bi_io_vec->bv_offset = offset_in_page(dst); + bio->bi_io_vec->bv_len = to_bytes(count); + /* dst is guaranteed to be a page_pool allocation */ + bio->bi_io_vec->bv_page = virt_to_page(dst); + /* Track that this I/O is in use. There should be no risk of the io + * being removed prior since this is called synchronously. + */ + generic_make_request(bio); + return 0; +} + +/* Submits an io request for each missing block of block hashes. + * The last one to return will then enqueue this on the io workqueue. + */ +static void kverityd_io_bht_populate(struct dm_verity_io *io) +{ + struct verity_config *vc = io->target->private; + u64 block; + + for (block = io->block; block < io->block + io->count; ++block) { + int ret = dm_bht_populate(&vc->bht, io, block); + + if (ret < 0) { + /* verity_dec_pending will handle the error case. */ + io->error = ret; + break; + } + } +} + +/* Asynchronously called upon the completion of I/O issued + * from kverityd_src_io_read. verity_dec_pending() acts as + * the scheduler/flow manager. + */ +static void kverityd_src_io_read_end(struct bio *clone, int error) +{ + struct dm_verity_io *io = clone->bi_private; + + if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error)) + error = -EIO; + + if (unlikely(error)) { + DMERR("Error occurred: %d (%llu, %u)", + error, ULL(clone->bi_sector), clone->bi_size); + io->error = error; + } + + /* Release the clone which just avoids the block layer from + * leaving offsets, etc in unexpected states. + */ + bio_put(clone); + + verity_dec_pending(io); +} + +/* If not yet underway, an I/O request will be issued to the vc->dev + * device for the data needed. It is cloned to avoid unexpected changes + * to the original bio struct. + */ +static void kverityd_src_io_read(struct dm_verity_io *io) +{ + struct bio *clone; + + /* Check if the read is already issued. */ + if (io->flags & VERITY_IOFLAGS_CLONED) + return; + + io->flags |= VERITY_IOFLAGS_CLONED; + + /* Clone the bio. The block layer may modify the bvec array. */ + clone = verity_bio_clone(io); + if (unlikely(!clone)) { + io->error = -ENOMEM; + return; + } + + verity_inc_pending(io); + + generic_make_request(clone); +} + +/* kverityd_io services the I/O workqueue. For each pass through + * the I/O workqueue, a call to populate both the origin drive + * data and the hash tree data is made. + */ +static void kverityd_io(struct work_struct *work) +{ + struct delayed_work *dwork = container_of(work, struct delayed_work, + work); + struct dm_verity_io *io = container_of(dwork, struct dm_verity_io, + work); + + /* Issue requests asynchronously. */ + verity_inc_pending(io); + kverityd_src_io_read(io); + kverityd_io_bht_populate(io); + verity_dec_pending(io); +} + +/* Paired with verity_dec_pending, the pending value in the io dictate the + * lifetime of a request and when it is ready to be processed on the + * workqueues. + */ +static void verity_inc_pending(struct dm_verity_io *io) +{ + atomic_inc(&io->pending); +} + +/* Block-level requests start here. */ +static int verity_map(struct dm_target *ti, struct bio *bio, + union map_info *map_context) +{ + struct dm_verity_io *io; + struct verity_config *vc; + struct request_queue *r_queue; + + if (unlikely(!ti)) { + DMERR("dm_target was NULL"); + return -EIO; + } + + vc = ti->private; + r_queue = bdev_get_queue(vc->dev->bdev); + + if (bio_data_dir(bio) == WRITE) { + /* If we silently drop writes, then the VFS layer will cache + * the write and persist it in memory. While it doesn't change + * the underlying storage, it still may be contrary to the + * behavior expected by a verified, read-only device. + */ + DMWARN_LIMIT("write request received. rejecting with -EIO."); + verity_error(vc, NULL, -EIO); + return -EIO; + } else { + /* Queue up the request to be verified */ + io = verity_io_alloc(ti, bio); + if (!io) { + DMERR_LIMIT("Failed to allocate and init IO data"); + return DM_MAPIO_REQUEUE; + } + INIT_DELAYED_WORK(&io->work, kverityd_io); + queue_delayed_work(kverityd_ioq, &io->work, 0); + } + + return DM_MAPIO_SUBMITTED; +} + +static void splitarg(char *arg, char **key, char **val) +{ + *key = strsep(&arg, "="); + *val = strsep(&arg, ""); +} + +/* + * Non-block interfaces and device-mapper specific code + */ + +/** + * verity_ctr - Construct a verified mapping + * @ti: Target being created + * @argc: Number of elements in argv + * @argv: Vector of key-value pairs (see below). + * + * Accepts the following keys: + * @payload: hashed device + * @hashtree: device hashtree is stored on + * @hashstart: start address of hashes (default 0) + * @block_size: size of a hash block + * @alg: hash algorithm + * @root_hexdigest: toplevel hash of the tree + * @error_behavior: what to do when verification fails [optional] + * @salt: salt, in hex [optional] + * + * E.g., + * payload=/dev/sda2 hashtree=/dev/sda3 alg=sha256 + * root_hexdigest=f08aa4a3695290c569eb1b0ac032ae1040150afb527abbeb0a3da33d82fb2c6e + * + * TODO(wad): + * - Boot time addition + * - Track block verification to free block_hashes if memory use is a concern + * Testing needed: + * - Regular slub_debug tracing (on checkins) + * - Improper block hash padding + * - Improper bundle padding + * - Improper hash layout + * - Missing padding at end of device + * - Improperly sized underlying devices + * - Out of memory conditions (make sure this isn't too flaky under high load!) + * - Incorrect superhash + * - Incorrect block hashes + * - Incorrect bundle hashes + * - Boot-up read speed; sustained read speeds + */ +static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv) +{ + struct verity_config *vc = NULL; + int ret = 0; + sector_t blocks; + unsigned int block_size = VERITY_DEFAULT_BLOCK_SIZE; + const char *payload = NULL; + const char *hashtree = NULL; + unsigned long hashstart = 0; + const char *alg = NULL; + const char *root_hexdigest = NULL; + const char *dev_error_behavior = error_behavior; + const char *hexsalt = ""; + int i; + + for (i = 0; i < argc; ++i) { + char *key, *val; + DMWARN("Argument %d: '%s'", i, argv[i]); + splitarg(argv[i], &key, &val); + if (!key) { + DMWARN("Bad argument %d: missing key?", i); + break; + } + if (!val) { + DMWARN("Bad argument %d='%s': missing value", i, key); + break; + } + + if (!strcmp(key, "alg")) { + alg = val; + } else if (!strcmp(key, "payload")) { + payload = val; + } else if (!strcmp(key, "hashtree")) { + hashtree = val; + } else if (!strcmp(key, "root_hexdigest")) { + root_hexdigest = val; + } else if (!strcmp(key, "hashstart")) { + if (strict_strtoul(val, 10, &hashstart)) { + ti->error = "Invalid hashstart"; + return -EINVAL; + } + } else if (!strcmp(key, "block_size")) { + unsigned long tmp; + if (strict_strtoul(val, 10, &tmp) || + (tmp > UINT_MAX)) { + ti->error = "Invalid block_size"; + return -EINVAL; + } + block_size = (unsigned int)tmp; + } else if (!strcmp(key, "error_behavior")) { + dev_error_behavior = val; + } else if (!strcmp(key, "salt")) { + hexsalt = val; + } else if (!strcmp(key, "error_behavior")) { + dev_error_behavior = val; + } + } + +#define NEEDARG(n) \ + if (!(n)) { \ + ti->error = "Missing argument: " #n; \ + return -EINVAL; \ + } + + NEEDARG(alg); + NEEDARG(payload); + NEEDARG(hashtree); + NEEDARG(root_hexdigest); + +#undef NEEDARG + + /* The device mapper device should be setup read-only */ + if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) { + ti->error = "Must be created readonly."; + return -EINVAL; + } + + vc = kzalloc(sizeof(*vc), GFP_KERNEL); + if (!vc) { + /* TODO(wad) if this is called from the setup helper, then we + * catch these errors and do a CrOS specific thing. if not, we + * need to have this call the error handler. + */ + return -EINVAL; + } + + /* Calculate the blocks from the given device size */ + vc->size = ti->len; + blocks = to_bytes(vc->size) / block_size; + if (dm_bht_create(&vc->bht, blocks, block_size, alg)) { + DMERR("failed to create required bht"); + goto bad_bht; + } + if (dm_bht_set_root_hexdigest(&vc->bht, root_hexdigest)) { + DMERR("root hexdigest error"); + goto bad_root_hexdigest; + } + dm_bht_set_salt(&vc->bht, hexsalt); + vc->bht.read_cb = kverityd_bht_read_callback; + + /* payload: device to verify */ + vc->start = 0; /* TODO: should this support a starting offset? */ + /* We only ever grab the device in read-only mode. */ + ret = dm_get_device(ti, payload, + dm_table_get_mode(ti->table), &vc->dev); + if (ret) { + DMERR("Failed to acquire device '%s': %d", payload, ret); + ti->error = "Device lookup failed"; + goto bad_verity_dev; + } + + if ((to_bytes(vc->start) % block_size) || + (to_bytes(vc->size) % block_size)) { + ti->error = "Device must be block_size divisble/aligned"; + goto bad_hash_start; + } + + vc->hash_start = (sector_t)hashstart; + + /* hashtree: device with hashes. + * Note, payload == hashtree is okay as long as the size of + * ti->len passed to device mapper does not include + * the hashes. + */ + if (dm_get_device(ti, hashtree, + dm_table_get_mode(ti->table), &vc->hash_dev)) { + ti->error = "Hash device lookup failed"; + goto bad_hash_dev; + } + + /* arg4: cryptographic digest algorithm */ + if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >= + CRYPTO_MAX_ALG_NAME) { + ti->error = "Hash algorithm name is too long"; + goto bad_hash; + } + + /* override with optional device-specific error behavior */ + vc->error_behavior = verity_parse_error_behavior(dev_error_behavior); + if (vc->error_behavior == -1) { + ti->error = "Bad error_behavior supplied"; + goto bad_err_behavior; + } + + /* TODO: Maybe issues a request on the io queue for block 0? */ + + /* Argument processing is done, setup operational data */ + /* Pool for dm_verity_io objects */ + vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool); + if (!vc->io_pool) { + ti->error = "Cannot allocate verity io mempool"; + goto bad_slab_pool; + } + + /* Allocate the bioset used for request padding */ + /* TODO(wad) allocate a separate bioset for the first verify maybe */ + vc->bs = bioset_create(MIN_BIOS, 0); + if (!vc->bs) { + ti->error = "Cannot allocate verity bioset"; + goto bad_bs; + } + + ti->num_flush_requests = 1; + ti->private = vc; + + /* TODO(wad) add device and hash device names */ + { + char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE]; + bdevname(vc->hash_dev->bdev, hashdev); + bdevname(vc->dev->bdev, vdev); + DMINFO("dev:%s hash:%s [sectors:%llu blocks:%llu]", vdev, + hashdev, ULL(vc->bht.sectors), ULL(blocks)); + } + return 0; + +bad_bs: + mempool_destroy(vc->io_pool); +bad_slab_pool: +bad_err_behavior: +bad_hash: + dm_put_device(ti, vc->hash_dev); +bad_hash_dev: +bad_hash_start: + dm_put_device(ti, vc->dev); +bad_bht: +bad_root_hexdigest: +bad_verity_dev: + kfree(vc); /* hash is not secret so no need to zero */ + return -EINVAL; +} + +static void verity_dtr(struct dm_target *ti) +{ + struct verity_config *vc = (struct verity_config *) ti->private; + + bioset_free(vc->bs); + mempool_destroy(vc->io_pool); + dm_bht_destroy(&vc->bht); + dm_put_device(ti, vc->hash_dev); + dm_put_device(ti, vc->dev); + kfree(vc); +} + +static int verity_ioctl(struct dm_target *ti, unsigned int cmd, + unsigned long arg) +{ + struct verity_config *vc = (struct verity_config *) ti->private; + return __blkdev_driver_ioctl(vc->dev->bdev, vc->dev->mode, cmd, arg); +} + +static int verity_status(struct dm_target *ti, status_type_t type, + char *result, unsigned int maxlen) +{ + struct verity_config *vc = (struct verity_config *) ti->private; + unsigned int sz = 0; + char hashdev[BDEVNAME_SIZE], vdev[BDEVNAME_SIZE]; + u8 hexdigest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 }; + + dm_bht_root_hexdigest(&vc->bht, hexdigest, sizeof(hexdigest)); + + switch (type) { + case STATUSTYPE_INFO: + break; + case STATUSTYPE_TABLE: + bdevname(vc->hash_dev->bdev, hashdev); + bdevname(vc->dev->bdev, vdev); + DMEMIT("/dev/%s /dev/%s %llu %u %s %s", + vdev, + hashdev, + ULL(vc->hash_start), + vc->bht.depth, + vc->hash_alg, + hexdigest); + break; + } + return 0; +} + +static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm, + struct bio_vec *biovec, int max_size) +{ + struct verity_config *vc = ti->private; + struct request_queue *q = bdev_get_queue(vc->dev->bdev); + + if (!q->merge_bvec_fn) + return max_size; + + bvm->bi_bdev = vc->dev->bdev; + bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin; + + /* Optionally, this could just return 0 to stick to single pages. */ + return min(max_size, q->merge_bvec_fn(q, bvm, biovec)); +} + +static int verity_iterate_devices(struct dm_target *ti, + iterate_devices_callout_fn fn, void *data) +{ + struct verity_config *vc = ti->private; + + return fn(ti, vc->dev, vc->start, ti->len, data); +} + +static void verity_io_hints(struct dm_target *ti, + struct queue_limits *limits) +{ + struct verity_config *vc = ti->private; + unsigned int block_size = vc->bht.block_size; + + limits->logical_block_size = block_size; + limits->physical_block_size = block_size; + blk_limits_io_min(limits, block_size); +} + +static struct target_type verity_target = { + .name = "verity", + .version = {0, 1, 0}, + .module = THIS_MODULE, + .ctr = verity_ctr, + .dtr = verity_dtr, + .ioctl = verity_ioctl, + .map = verity_map, + .merge = verity_merge, + .status = verity_status, + .iterate_devices = verity_iterate_devices, + .io_hints = verity_io_hints, +}; + +#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI) + +static int __init dm_verity_init(void) +{ + int r = -ENOMEM; + + _verity_io_pool = KMEM_CACHE(dm_verity_io, 0); + if (!_verity_io_pool) { + DMERR("failed to allocate pool dm_verity_io"); + goto bad_io_pool; + } + + kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1); + if (!kverityd_ioq) { + DMERR("failed to create workqueue kverityd_ioq"); + goto bad_io_queue; + } + + kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1); + if (!kveritydq) { + DMERR("failed to create workqueue kveritydq"); + goto bad_verify_queue; + } + + r = dm_register_target(&verity_target); + if (r < 0) { + DMERR("register failed %d", r); + goto register_failed; + } + + DMINFO("version %u.%u.%u loaded", verity_target.version[0], + verity_target.version[1], verity_target.version[2]); + + return r; + +register_failed: + destroy_workqueue(kveritydq); +bad_verify_queue: + destroy_workqueue(kverityd_ioq); +bad_io_queue: + kmem_cache_destroy(_verity_io_pool); +bad_io_pool: + return r; +} + +static void __exit dm_verity_exit(void) +{ + destroy_workqueue(kveritydq); + destroy_workqueue(kverityd_ioq); + + dm_unregister_target(&verity_target); + kmem_cache_destroy(_verity_io_pool); +} + +module_init(dm_verity_init); +module_exit(dm_verity_exit); + +MODULE_AUTHOR("The Chromium OS Authors "); +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking"); +MODULE_LICENSE("GPL"); Index: linux-3.3-rc3/drivers/md/dm-verity.h =================================================================== --- /dev/null +++ linux-3.3-rc3/drivers/md/dm-verity.h @@ -0,0 +1,45 @@ +/* + * Copyright (C) 2011 The Chromium OS Authors + * All Rights Reserved. + * + * This file is released under the GPLv2. + * + * Provide error types for use when creating a custom error handler. + * See Documentation/device-mapper/dm-verity.txt + */ +#ifndef DM_VERITY_H +#define DM_VERITY_H + +#include + +struct dm_verity_error_state { + int code; + int transient; /* Likely to not happen after a reboot */ + u64 block; + const char *message; + + sector_t dev_start; + sector_t dev_len; + struct block_device *dev; + + sector_t hash_dev_start; + sector_t hash_dev_len; + struct block_device *hash_dev; + + /* Final behavior after all notifications are completed. */ + int behavior; +}; + +/* This enum must be matched to allowed_error_behaviors in dm-verity.c */ +enum dm_verity_error_behavior { + DM_VERITY_ERROR_BEHAVIOR_EIO = 0, + DM_VERITY_ERROR_BEHAVIOR_PANIC, + DM_VERITY_ERROR_BEHAVIOR_NONE, + DM_VERITY_ERROR_BEHAVIOR_NOTIFY +}; + + +int dm_verity_register_error_notifier(struct notifier_block *nb); +int dm_verity_unregister_error_notifier(struct notifier_block *nb); + +#endif /* DM_VERITY_H */ Index: linux-3.3-rc3/include/linux/dm-bht.h =================================================================== --- /dev/null +++ linux-3.3-rc3/include/linux/dm-bht.h @@ -0,0 +1,166 @@ +/* + * Copyright (C) 2011 The Chromium OS Authors + * + * Device-Mapper block hash tree interface. + * See Documentation/device-mapper/dm-bht.txt for details. + * + * This file is released under the GPLv2. + */ +#ifndef __LINUX_DM_BHT_H +#define __LINUX_DM_BHT_H + +#include +#include +#include + +/* To avoid allocating memory for digest tests, we just setup a + * max to use for now. + */ +#define DM_BHT_MAX_DIGEST_SIZE 128 /* 1k hashes are unlikely for now */ +#define DM_BHT_SALT_SIZE 32 /* 256 bits of salt is a lot */ + +/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other + * values are entry-related return codes. + */ +#define DM_BHT_ENTRY_VERIFIED 8 /* 'nodes' has been checked against parent */ +#define DM_BHT_ENTRY_READY 4 /* 'nodes' is loaded and available */ +#define DM_BHT_ENTRY_PENDING 2 /* 'nodes' is being loaded */ +#define DM_BHT_ENTRY_UNALLOCATED 0 /* untouched */ +#define DM_BHT_ENTRY_ERROR -1 /* entry is unsuitable for use */ +#define DM_BHT_ENTRY_ERROR_IO -2 /* I/O error on load */ + +/* Additional possible return codes */ +#define DM_BHT_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */ + +/* dm_bht_entry + * Contains dm_bht->node_count tree nodes at a given tree depth. + * state is used to transactionally assure that data is paged in + * from disk. Unless dm_bht kept running crypto contexts for each + * level, we need to load in the data for on-demand verification. + */ +struct dm_bht_entry { + atomic_t state; /* see defines */ + /* Keeping an extra pointer per entry wastes up to ~33k of + * memory if a 1m blocks are used (or 66 on 64-bit arch) + */ + void *io_context; /* Reserve a pointer for use during io */ + /* data should only be non-NULL if fully populated. */ + void *nodes; /* The hash data used to verify the children. + * Guaranteed to be page-aligned. + */ +}; + +/* dm_bht_level + * Contains an array of entries which represent a page of hashes where + * each hash is a node in the tree at the given tree depth/level. + */ +struct dm_bht_level { + struct dm_bht_entry *entries; /* array of entries of tree nodes */ + unsigned int count; /* number of entries at this level */ + sector_t sector; /* starting sector for this level */ +}; + +/* opaque context, start, databuf, sector_count */ +typedef int(*dm_bht_callback)(void *, /* external context */ + sector_t, /* start sector */ + u8 *, /* destination page */ + sector_t, /* num sectors */ + struct dm_bht_entry *); +/* dm_bht - Device mapper block hash tree + * dm_bht provides a fixed interface for comparing data blocks + * against a cryptographic hashes stored in a hash tree. It + * optimizes the tree structure for storage on disk. + * + * The tree is built from the bottom up. A collection of data, + * external to the tree, is hashed and these hashes are stored + * as the blocks in the tree. For some number of these hashes, + * a parent node is created by hashing them. These steps are + * repeated. + * + * TODO(wad): All hash storage memory is pre-allocated and freed once an + * entire branch has been verified. + */ +struct dm_bht { + /* Configured values */ + int depth; /* Depth of the tree including the root */ + unsigned int block_count; /* Number of blocks hashed */ + unsigned int block_size; /* Size of a hash block */ + char hash_alg[CRYPTO_MAX_ALG_NAME]; + unsigned char salt[DM_BHT_SALT_SIZE]; + + /* Computed values */ + unsigned int node_count; /* Data size (in hashes) for each entry */ + unsigned int node_count_shift; /* first bit set - 1 */ + /* There is one per CPU so that verified can be simultaneous. */ + struct shash_desc *hash_desc[NR_CPUS]; /* Container for the hash alg */ + unsigned int digest_size; + sector_t sectors; /* Number of disk sectors used */ + + /* bool verified; Full tree is verified */ + u8 root_digest[DM_BHT_MAX_DIGEST_SIZE]; + struct dm_bht_level *levels; /* in reverse order */ + /* Callback for reading from the hash device */ + dm_bht_callback read_cb; +}; + +/* Constructor for struct dm_bht instances. */ +int dm_bht_create(struct dm_bht *bht, + unsigned int block_count, + unsigned int block_size, + const char *alg_name); +/* Destructor for struct dm_bht instances. Does not free @bht */ +void dm_bht_destroy(struct dm_bht *bht); + +/* Basic accessors for struct dm_bht */ +int dm_bht_set_root_hexdigest(struct dm_bht *bht, const u8 *hexdigest); +int dm_bht_root_hexdigest(struct dm_bht *bht, u8 *hexdigest, int available); +void dm_bht_set_salt(struct dm_bht *bht, const char *hexsalt); +int dm_bht_salt(struct dm_bht *bht, char *hexsalt); + +/* Functions for loading in data from disk for verification */ +bool dm_bht_is_populated(struct dm_bht *bht, unsigned int block); +int dm_bht_populate(struct dm_bht *bht, void *read_cb_ctx, + unsigned int block); +int dm_bht_verify_block(struct dm_bht *bht, unsigned int block, + struct page *pg, unsigned int offset); +void dm_bht_read_completed(struct dm_bht_entry *entry, int status); + +/* Functions for converting indices to nodes. */ + +static inline unsigned int dm_bht_get_level_shift(struct dm_bht *bht, + int depth) +{ + return (bht->depth - depth) * bht->node_count_shift; +} + +/* For the given depth, this is the entry index. At depth+1 it is the node + * index for depth. + */ +static inline unsigned int dm_bht_index_at_level(struct dm_bht *bht, + int depth, + unsigned int leaf) +{ + return leaf >> dm_bht_get_level_shift(bht, depth); +} + +static inline struct dm_bht_entry *dm_bht_get_entry(struct dm_bht *bht, + int depth, + unsigned int block) +{ + unsigned int index = dm_bht_index_at_level(bht, depth, block); + struct dm_bht_level *level = &bht->levels[depth]; + + return &level->entries[index]; +} + +static inline void *dm_bht_get_node(struct dm_bht *bht, + struct dm_bht_entry *entry, + int depth, + unsigned int block) +{ + unsigned int index = dm_bht_index_at_level(bht, depth, block); + unsigned int node_index = index % bht->node_count; + + return entry->nodes + (node_index * bht->digest_size); +} +#endif /* __LINUX_DM_BHT_H */