At the ALPSS 2024 conference last October we discussed the possibility of deprecating and eventually removing support the Request interface as kernel API. Such a change could impact DMMP so I was asked if Red Hat would be willing to support the effort by measuring the performance of DMMP's BIO interface[3] and comparing it to its Request based performance. Having such a comparative performance analysis would be very helpful in determining what further changes might be needed to move DMMP away from using the Request interface. This would help with the overall effort to improve BIO interface performance and eventually remove support for Request based IO as a kernel API.
In this presentation I will share the preliminary results of Red Hat's DMMP BIO vs Request performance tests[4] and discuss what the next possible steps could be for moving forward.
The tests and performance graphs in this presentation were developed and run by Samuel Petrovic (spetrovi@redhat.com). Credit goes to Samuel for creating these performance tests and many thanks to Benjamin Marzinski (bmarzins@redhat.com), Mikulas Patocka (mpatocka@redhat.com) and others on the Red Hat DMMP and Performance teams who contributed to this work.
[1] https://lwn.net/Articles/736534/
[2] https://lwn.net/Articles/738449/
[3] https://lore.kernel.org/linux-scsi/643e61a8-b0cb-4c9d-831a-879aa86d888e@redhat.com
[4] https://people.redhat.com/jmeneghi/LSFMM_2025/DMMP_BIOvsRequest/
root@rhel-storage-105:~# lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x000000007fffffff 2G online yes 0 0x0000000100000000-0x000000107fffffff 62G online yes 2-32 Memory block size: 2G Total online memory: 64G Total offline memory: 0B root@rhel-storage-105:~# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0-79 Vendor ID: GenuineIntel BIOS Vendor ID: Intel Model name: Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz BIOS Model name: Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz CPU @ 2.3GHz BIOS CPU family: 179 CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 20 Socket(s): 2 Stepping: 6 CPU(s) scaling MHz: 93% CPU max MHz: 3400.0000 CPU min MHz: 800.0000 BogoMIPS: 4600.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_pe rfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shado w flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd s ha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts vnmi avx512vbmi u mip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 1.9 MiB (40 instances) L1i: 1.3 MiB (40 instances) L2: 50 MiB (40 instances) L3: 60 MiB (2 instances) NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79 Vulnerabilities: Gather data sampling: Mitigation; Microcode Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Srbds: Not affected Tsx async abort: Not affected
If you have user_friendly_names set in /etc/multipath.conf (you probably do. It's the default). Then when you run "multipath -l" the top line for the device will look like: mpathX ("WWID") dm-Y "vendor","product" If you don't have user_friendly_names set (and you didn't explicitly set up an alias in /etc/multipath.conf), the device name is the same as the device WWID, so the top line for the device will look like: "WWID" dm-Y "vendor","product"
If you want to change all of the multipath devices on you machine to use the bio queue_mode, then add features "2 queue_mode bio" to the defaults section of /etc/multipath.conf If you have multipath devices that you don't want changed, you can either set this by multipath device vendor and product by adding a devices section that looks like: devices { device { vendor "vendor" product "product" features "2 queue_mode bio" } } Or you can set this for specific multipath devices by added a multipaths section for each device. For example: multipaths { multipath { wwid "WWID" features "2 queue_mode bio" } multipath { wwid "ANOTHER_WWID" features "2 queue_mode bio" } }
To remove one multipath device: # multipath -f "device_name" To remove all multipath devices: # multipath -F
# systemctl reload multipathd.sevice This will also recreate the multipath device. If you run "multipath -l", you should now see that it has "queue_mode bio" in the features line. When you want to switch back to request mode, you can just comment out that "features" line, like so: # features "2 queue_mode bio" And then do steps 3 & 4 again.
In preliminary raw device testing we can see that bio is slightly to significantly worse than request based IO. The workload used sequential reads and sequential writes. We can see that using small blocks (4k) shows us 30% performance drop, while larger blocks offer 3-9% performance drop, which is only barely out of statistical error of this test.
In our preliminary file systems tests we see the worst test cases for bio is a combination of single file test with small (4k) blocks which results in 50-70% loss. Single file test with larger blocks only see about 11-35% loss and tests with many files have no statistically significant difference.Storage Preparation:
FIO Test:
After making these changes the following tests have been run.
raw_io_scheduler_mq-deadline_tests
As you can see with mq-deadline, the performance difference is reduced.file_system_request_vs_bio_mq-deadline_tests
Request vs bio with the none scheduler - better for request:file_system_request_vs_bio_none_tests
Bio none vs bio mq-deadline - provides a huge perfromance gain:file_system_bio_vs_bio_mq-deadline_tests
Request none vs request mq-deadline - provides no difference:file_system_request_vs_request_mq-deadline_tests