QEMU machine types and compatibility (part 2)

In the first part of this article, I talked about how you can use versioned machine types to ensure compatibility. But the more interesting part is how this actually works under the covers.

Device properties, and making them compatible

QEMU devices often come with a list of properties that influence how the device is created and how it operates. Typically, authors try to come up with reasonable default values, which may be overriden if desired. However, the idea of what is considered reasonable may change over time, and a newer QEMU may provide a different default value for a property.

If you want to migrate a guest from an older QEMU machine to a more recent QEMU, you obviously need to use the default values from that older QEMU machine as well. For that, QEMU uses arrays of GlobalPropery structures.

If you take a look at hw/core/machine.c, you will notice several arrays named hw_compat_<major>_<minor>. These contain triplets specifying (from right to left) the default value for a certain property for a certain device. The arrays are designed to be included by the compat machine for <major>.<minor>, thus specifying a default value for that machine version and older. (More on this later in this article.)

For example, QEMU 5.2 changed the default number of virtio queues defined for virtio-blk and virtio-scsi devices: prior to 5.1, one queue would be present if no other value had been specified; with 5.2, the default number of queues would align with the number of vcpus for virtio-pci. Therefore, hw_compat_5_1 contains the following lines:

{ "virtio-blk-device", "num-queues", "1"},
{ "virtio-scsi-device", "num_queues", "1"},

(and some corresponding lines for vhost.) This makes sure that any virtio-blk or virtio-scsi device on a -5.1 or older machine type will have one virtio queue per default. Note that this holds true for all virtio-blk and virtio-scsi devices, regardless of which transport they are using; for transports like ccw where nothing changed with 5.2, this simply does not make any difference.

Generally, statements for all devices can go into the hw_compat_ arrays; if a device is not present or even not available at all for the machine that is started, the statement will simply not take any effect.

x86 considerations

For the x86 machine types (pc-i440fx and pc-q35), pc_compat_<major>_<minor> arrays are defined in hw/i386/pc.c, mostly covering properties for x86 cpus, but also some other x86-specific devices.

Per-machine changes

Some incompatible changes are not happening at the device property level, so the compat properties approach cannot be used. Instead, the individual machines need to take care of those changes.

For example, in QEMU 6.2 the smp parsing code started to prefer cores over sockets instead of preferring sockets. Therefore, all 6.1 compat machines have code like

m->smp_props.prefer_sockets = true;

to set prefer_sockets to true in the MachineClass. (Note that the m68k virt machine does not support smp, and therefore does not need that statement.)

Machines also sometimes need to configure associated capabilities in a compatible way. For example, the s390x cpu models may gain new feature flags in newer QEMU releases; when using a compat machine, those new flags need to be off in the cpu models that are used by default.

Inheritance

Compat machines for older machine types need the compatibility changes for newer machine types as well as some changes on top. Typically, this is done by the MachineState respectively MachineClass initializing functions for version n-1 calling the respective initializing functions for version n. As all new compatibility changes are added for the latest versioned machine type, changes are propagated down the whole stack of versions.

All machine types for version n include the hw_compat_<n> array (and the pc_compat_<n> array for x86), unless they are the latest version (which does not need any compat handling yet.) The older compat property arrays are included via the inheritance mechanism.

Putting it all together

QEMU currently supports versioned machine types for x86 (pc-i440fx, pc-q35), arm (virt), aarch64 (virt), s390x (s390-ccw-virtio), ppc64 (pseries), and m68k (virt). At the beginning of each development cycle, new (empty) arrays of compat properties for the last version are added and wired up in the machine types for that last version, new versions of each of these machines are added to the code, and the defaults switched to them (well, that’s the goal.) After that, the framework for adding incompatible changes is in place.

If you find that these changes have not yet been made when you plan to make an incompatible change, it is important that you add the new machine types first.

New and incompatible device properties

If you plan to change the default value of a device property, or add a new property with a default value that will cause guest-observable changes, you need to add an entry that preserves the old value (or sets a value that does not change the behaviour) to the compat property array for the last version. In general (non-x86 specific change), that means adding it to the hw_compat_ array, and all machine types will use it automatically.

Take care to use the right device for specifying the property; for example, there is often some confusion when dealing with virtio devices. If you e.g. modify a virtio-blk property (as in the example above), you need to add a statement for virtio-blk-device and not for virtio-blk-pci, or virtio-blk instances using the ccw or mmio transports would be left out. If, on the other hand, you modify a property only for virtio-blk devices using the pci transport, you need to add a statement for virtio-blk-pci. Similar considerations apply to other devices inheriting from base types.

Per-machine changes

If you change a non-device default characteristic, you need to add a compatibility statement for the machine types for the last version in their instance (or class) init functions. The hardest part here is making sure that all relevant machine types get the update.

For example, if you add a change in the s390x cpu models, it is easy to see that you only need to modify the code for the s390-ccw-virtio machine. For other changes, every versioned machine needs the change. And there are cases like the prefer_sockets change mentioned above, that apply to any machine type that supports smp.

I hope that these explanations help a bit with understanding how machine type compatibility works, and where to add your own changes.