Random IO and copy-on-write schemes

You have a new block device and you want to test it's performance. Probably the first thing you try is streaming reads and writes across it. The second thing you'll do is write lots of small IO blocks to random locations on the device. This latter test is a killer for the best of devices, but for anything using a copy-on-write scheme it will always cause disappointing performance.

A bit of maths

Random IO will hit every region of the disk very quickly. Causing a flurry of copy-on-write IO, which in turn will cause throughput to plummet. I don't think people have a good instinctive feel for just how quickly this occurs.

Let's say we have a block device, divided into d blocks. We then do random, small, write IOs one by one to this device. IOs are aligned such that they never span blocks. If an IO is the first to hit a block then that block must be copied.

Let $ C_n $ be the expected number of block copies after the nth IO.

$$ \begin{align} C_0 & = 0 \ C_n & = Prob_{copied} C_{n-1} + Prob_{uncopied} (C_{n - 1} + 1) \ & = \frac{C_{n-1} }{d} C_{n - 1} + \frac{d - C_{n - 1} }{d}(C_{n-1} + 1) \ & = \frac{d-1}{d}C_{n-1} + 1 \end{align} $$

Substituting $ C_n $ back into itself we find: $$ C_n = \sum_{0 \leq k \leq n - 1} \alpha ^ k $$

where $ \alpha = \frac{d - 1}{d} $

Solving this recurrence gives us:

$$ C_n = \frac{1 - \alpha ^ n}{1 - \alpha} $$

Examples

Let's plug some numbers into this equation and get an idea of the throughput we can expect.

For the following thought experiment the device is assumed to be 1 terabyte in size, the IOs to be 64k sized. We vary the COW block size, and then calculate how many IOs it would take to cause 50% of the device to be copied.

Since copying a block requires a read and then a write, each copy is going to incur an overhead of 2 * block size, and the throughput will drop correspondingly.

In the table below we give, for various block sizes, the expected volume of random IO required to trigger the 50% copied point, and the fraction of the throughput compared to a volume with no copy-on-write scheme.

I'm fighting the markdown parser, table moved to end of document for now

Conclusion

If your use case really is lots of small, random IO. Use a small block size (eg, 64k).

If this isn't your use case, don't test this way.

Code

If you want to experiment with some numbers of your own you may find this program useful.

Block size	Expected IO volume	Throughput fraction
1m	45g	4%
16m	2.8g	0.27%
64m	709m	0.068%
256m	177m	0.017%