WIP WARNING

This post is a work in progress..


Here’s a quick rundown on setting up pNFS SCSI virtual machines on KVM. The steps below are specific for two virtual machines – assuming one client and one server on the same physical host. We’ll need to do almost all our configruation work on the physical host on which the VMs run.

This setup also assumes that we don’t have any SPC4-compliant SCSI devices, and because SCSI layouts depend on being able to register keys and perform reservations, this setup is mostly about how to set up a shared SCSI device which supports registrations/reservations. You can read section-2.3 of the nfsv4 scsi layout draft for bit more information on these requirements for pNFS SCSI layouts.

We are going to emulate a compatible device using a Linux-IO backstore and then present it to the VMs using the vhost-scsi target. This is going to give us a very convenient software-defined fabric, but to do it, you’ll need a kernel on your physical host that has been built with CONFIG_VHOST_SCSI in order to have the vhost_scsi.ko module built. Currently, RHEL7 does not ship with the vhost_scsi module, so I recommend using a recent Fedora or upstream kernel on your physical host.

Finally, you might find there’s a bug in the vhost/scsi transport that will corrupt your data! So the next step is to patch your physical host’s kernel with “vhost/scsi: fix reuse of &vq->iov[out] in response”.

Without further explanations, here follows the list of steps for basic client-server pNFS SCSI VM setup:

Step 1

Patch/rebuild your physical host’s kernel with “vhost/scsi: fix reuse of &vq->iov[out] in response”.

Step 2

On the physical host, create a backing store for your SCSI lun by entering the interactive targetcli shell:

[root@godel ~]# targetcli
targetcli shell version 2.1.fb42
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> cd backstores/fileio
/backstores/fileio> create pnfs_lio_dev_1 /pnfs_lio_dev_1 100M
Created fileio pnfs_lio_dev_1 with size 104857600

Step 3

While still in targetcli on the physical host, we’ll create two vhost targets (one for each VM), and add that lun to them both as lun0:

/backstores/fileio> cd /vhost/
/vhost> create
Created target naa.500140583457c98b.
Created TPG 1.
/vhost> cd naa.500140583457c98b/tpg1/
/vhost> cd naa.500140583457c98b/tpg1/luns
/vhost/naa.50...98b/tpg1/luns> create /backstores/fileio/pnfs_lio_dev_1
Created LUN 0.
/vhost/naa.50...98b/tpg1/luns> cd /vhost/
/vhost> create
Created target naa.500140558a42bd60.
/vhost> cd naa.500140558a42bd60/tpg1/luns
/vhost/naa.50...d60/tpg1/luns> create /backstores/fileio/pnfs_lio_dev_1
Created LUN 0.
/vhost/naa.50...d60/tpg1/luns>

OK, your LIO config should look something like this:

/vhost/naa.50...d60/tpg1/luns> cd /
/> ls
o- / ..................................................................................... [...]
  o- backstores .......................................................................... [...]
  | o- block .............................................................. [Storage Objects: 0]
  | o- fileio ............................................................. [Storage Objects: 1]
  | | o- pnfs_lio_dev_1 ...................... [/pnfs_lio_dev_1 (100.0MiB) write-back activated]
  | o- pscsi .............................................................. [Storage Objects: 0]
  | o- ramdisk ............................................................ [Storage Objects: 0]
  o- iscsi ........................................................................ [Targets: 0]
  o- loopback ..................................................................... [Targets: 1]
  o- vhost ........................................................................ [Targets: 2]
    o- naa.500140558a42bd60 .......................................................... [TPGs: 1]
    | o- tpg1 .............................................. [naa.50014052b3c03dbe, no-gen-acls]
    |   o- acls ...................................................................... [ACLs: 0]
    |   o- luns ...................................................................... [LUNs: 1]
    |     o- lun0 .................................... [fileio/pnfs_lio_dev_1 (/pnfs_lio_dev_1)]
    o- naa.500140583457c98b .......................................................... [TPGs: 1]
      o- tpg1 .............................................. [naa.50014053b65ac5f4, no-gen-acls]
        o- acls ...................................................................... [ACLs: 0]
        o- luns ...................................................................... [LUNs: 1]
          o- lun0 .................................... [fileio/pnfs_lio_dev_1 (/pnfs_lio_dev_1)]
/> exit
Global pref auto_save_on_exit=true
Last 10 configs saved in /etc/target/backup.
Configuration saved to /etc/target/saveconfig.json

Just a note: as you might deduce from this LIO output, there are many different ways to configure the LIO fabric. You have options for which backing store you use, and if you did have iSCSI or SPC4 compliant devices on this host, you could present them to your VMs with this fabric. You also have the option of presenting the virtual device you just created back to the host using the loopback module.

Step 4

Now, we’ll present the vhost-scsi device to your two VMs. Since libvirt and virt-manager do not have interfaces for the scsi-vhost devices, we resort to modifying your VM’s qemu command line. We’ll keep the modification within the kvm config, however. On the physical host, use virsh edit <VM name> to add the following xml:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg
value='vhost-scsi-pci,id=vhost1,wwpn=naa.500140558a42bd60,event_idx=off'/>
  </qemu:commandline>

Note that the first line above is not an addition, but rather a modification to add xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0' to the existing root <domain> element. Also note that the value of the wwpn option is equal to one of the vhost target port groups you created in step 2. Also note that the id option can likely be set to any unique value per VM.

Repeat the above step for your other VM, this time specifying the other target port group name you created in step 2.

Step 5

In order for qemu to send ioctl messages to the vhost-scsi driver in the kernel, you’ll need to ensure that the cgroup_device_acl array contains “/dev/vhost-scsi” on the host’s /etc/libvirt/qemu.conf. Also, unless you’d like to configure specific SELinux exceptions, disable security_default_confined and security_require_confined in the host’s /etc/libvirt/qemu.conf file:

security_default_confined = 0
security_require_confined = 0
cgroup_device_acl = [ "/dev/null", "/dev/full", "/dev/zero", "/dev/random",
                  "/dev/urandom", "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
				  "/dev/rtc","/dev/hpet",
                  "/dev/vfio/vfio", "/dev/vhost-scsi" ]

If you fixed that up, restart libvirtd.

Step 6

[Re]boot your VMs! You should now have a scsi_generic device on each VM that is backed by your lun:

[root@rhel7 ~]# readlink /sys/bus/pci/drivers/virtio-pci/0000*/virtio*/host*/target*/*:*:*/generic
scsi_generic/sg4
[root@rhel7 ~]# sg_map
/dev/sg4  /dev/sdd

At this point, you might want to verify that your setup indeed has/supports identifiers and registrations by querying for page 0x83 of the VPD. Format with XFS, and proceed to export with the pnfs option.

Note:

Don't partition the device. Since reservations will exclude SCSI commands for the entire device, partitioned devices will return NFS4ERR_INVAL in reply to GETDEVICEINFO.

This is the end..

Feedback and corrections are welcome. Email me!