hyperthreading disabled, striping over 4 NVMe devices, stripe size 64k, use the AIO engine with 32-request depth AES-NI saturates the device while consuming about 10% CPU time. QAT performs badly, it can't exceed 2GiB/s. fio --ioengine=libaio --iodepth=32 --rw=randread --direct=1 --end_fsync=1 --bs=64k --numjobs=56 --time_based --runtime=10 --group_reporting --name=job --filename=/dev/mapper/rhel_cidic1-striped fio --ioengine=libaio --iodepth=32 --rw=randwrite --direct=1 --end_fsync=1 --bs=64k --numjobs=56 --time_based --runtime=10 --group_reporting --name=job --filename=/dev/mapper/rhel_cidic1-striped raw device: READ: bw=12.2GiB/s busy: 2.042612% idle: 97.006933% irq: 0.658023% WRITE: bw=4323MiB/s busy: 1.025853% idle: 98.566944% irq: 0.239876% dm-crypt, aes-ni, 4096: READ: bw=12.2GiB/s busy: 9.540310% idle: 88.679644% irq: 0.805306% WRITE: bw=4274MiB/s busy: 9.661991% idle: 87.697487% irq: 0.673262% dm-crypt, qat, 4096: READ: bw=1934MiB/s busy: 3.894440% idle: 92.760971% irq: 1.846177% WRITE: bw=1773MiB/s busy: 3.953193% idle: 92.596919% irq: 2.015018%