With 512-byte encryption sector size, QAT is totally unusable (it reaches 200 - 400 MiB/s), because the per-sector overhead is too high. With 4kB encryption sector size, QAT has worse throughput than AES-NI in all of the tests (except when the througput is limited by the NVMe device itself). In random I/O workload, QAT can't exceed 2GiB/s; in sequential I/O it can reach up to 3GiB/s. On the other hand, AES-NI can saturate RAID-0 of four NVMe devices with 12GiB/s read rate and 4GiB/s write rate while consuming only 10% of total CPU time. In one benchmark (random read on XFS), QAT has lower throughput than AES-NI, but lower CPU consumption as well - this is the only case where QAT shows some advantage. QAT deadlocked in one of the tests due to a bug in the driver, but I fixed it. I wrote a kernel module that stress-tests the crypto API: In single-threaded mode, AES-NI is faster than QAT for requests < 64kB and QAT is faster for requests >= 64kB. In multi-threaded mode (112 threads submitting encryption work concurrently) AES-NI has 10 times better throughput than QAT - because the machine has 56 AES-NI cores and only 2 QAT cores.