These benchmarks show some advantage of the NVFS filesystem,. Benchmarks on simulated persistent memory ========================================= The benchmarks were done in a virtual machine on a system with two 6-core Opterons. One Opteron and its memory was disabled, so that NUMA effects don't influence the results. The emulated persistent memory device had 32GiB. It was an old machine, so for good performance, you need to set CONFIG_USE_NT_STORES (that will avoid using cache flushing and use non-temporal stores instead). Copy a linux tree (3GB) to a newly create persistent memory filesystem. The source is in the page cache when the test starts (i.e. run something like grep -r blabla /usr/src/git/linux-2.6). $ time cp -a /usr/src/git/linux-2.6 /mnt/test nvfs 4.5s ext2 dax 7.2s ext2 nodax 7.8s ext4 dax 7s ext4 nodax 8s xfs dax 6.8s xfs nodax 7s novafs 15s Grepping through the directory (cache-cold): "umount /mnt/test; mount /mnt/test" to flush cache $ time grep -r blabla /mnt/test nvfs 4.0s ext2 dax 4.7s ext2 nodax 7s ext4 dax 4.4s ext4 nodax 6.3s xfs dax 4.7s xfs nodax 7.1s novafs 8.8s Directory benchmark - it performs one million operations on a directory. http://people.redhat.com/~mpatocka/benchmarks/dir-test.c $ time dir-test /mnt/test/ 65536 1048576 nvfs 11s ext2 dax 17s ext2 nodax 18s ext4 dax 27s ext4 nodax 26s xfs dax 34s xfs nodax 34s novafs 84s Benchmarks on real persistent memory ==================================== The benchmarks were done on a system with four Skylake Xeons 8260L. This machine has reasonably fast implementation of the "clwb" instruction, so CONFIG_USE_NT_STORES was not defined. The numactl tool was used to bind the benchmark to the same node where persistent memory is connected. Copy a linux tree (4.3GB) to a newly create persistent memory filesystem. The source is in the page cache when the test starts (i.e. run something like grep -r blabla /usr/src/git/linux-2.6). $ time cp -a /usr/src/git/linux-2.6 /mnt/test nvfs 4.4s nova 4.9s ext2 dax 12s ext2 nodax 6.8s + 5s ext4 dax 6.3s ext4 nodax 4.9s + 7.5s xfs dax 7s xfs nodax 4.7s + 7.7s The number after the "+" sign is the time to flush cache. The time to copy the linux .git directory 1632 MiB / 41 files (i.e. throughput on large files) nvfs 0.9s novafs 0.9s ext2 dax 3.2s ext2 nodax 1.3s + 2.5s ext4 dax 1.4s ext4 nodax 1.1s + 2.3s xfs dax 1.4s xfs nodax 0.8s + 2.3s The time to copy everything except for the .git directory 2722 MiB / 125631 files (i.e. throughput on small files) nvfs 3.4s novafs 3.9s ext2 dax 14.3s ext2 nodax 12.1s + 4.5s ext4 dax 4.9s ext4 nodax 3.8s + 4.9s xfs dax 5.5s xfs nodax 3.8s + 5.2s Grepping through the directory: "umount /mnt/test; mount /mnt/test" to flush cache $ time grep -r blabla /mnt/test nvfs 3.8s nova 3.9s ext2 dax 4.5s ext2 nodax 5.5s ext2 from cache 3.3s ext4 dax 4.5s ext4 nodax 5.5s ext4 from cache 3.4s xfs dax 4.4s xfs nodax 5.4s xfs from cache 3.3s Note that reading from cache is faster than reading from the persistent memory. http://people.redhat.com/~mpatocka/benchmarks/dir-test.c dir-test /mnt/test/linux-2.6 63000 1048576 (create and delete empty files) nvfs 6.6s nova 10.9s ext2 dax 12.4s ext4 dax 8.4s xfs dax 12.2s dir-test /mnt/test/linux-2.6 63000 1048576 link (create and delete hardlinks to one file) nvfs 4.7s nova 10.5s ext2 dax 4.9s ext4 dax 5.6s xfs dax 7.8s dir-test /mnt/test/linux-2.6 63000 1048576 dir (create and delete directories) nvfs 8.2s nova 12.1s ext2 dax 106.2s ext4 dax 15.1s xfs dax 11.8s dd if=/dev/zero of=/mnt/test/test bs=4k oflag=dsync count=100000 - this benchmark performs large number of small synchronous writes, it can simulate database log workload nvfs 0.4s novafs 0.5s ext2 dax 1.4s ext2 nodax 1.4s ext4 dax 7.2s ext4 nodax 7.3s xfs dax 2.2s xfs nodax 1.8s fio with 48 jobs modifying 48 files [global] bs=4k iodepth=1 direct=1 ioengine=psync group_reporting runtime=60 time_based filesize=128M rw=randrw [job1] filename=/mnt/test/file1 ... [job48] filename=/mnt/test/file48 read / write nvfs 1245MiB/s / 1246MiB/s nova 938MiB/s / 938MiB/s ext2 dax 1269MiB/s / 1269MiB/s ext2 nodax 1281MiB/s / 1281MiB/s ext4 dax 1283MiB/s / 1284MiB/s ext4 nodax 1284MiB/s / 1284MiB/s xfs dax 1302MiB/s / 1302MiB/s xfs nodax 1291MiB/s / 1292MiB/s nvfs performance may be improved by using "__copy_from_user + arch_wb_cache_pmem", however it worsens performance of the "cp -a /usr/src/git/linux-2.6 /mnt/test" test. 1: __copy_from_user_inatomic_nocache 2: __copy_from_user + arch_wb_cache_pmem block size 1 2 4k 1259MiB/s 1313MiB/s 8k 1274MiB/s 1265MiB/s 16k 1150MiB/s 1222MiB/s 32k 1139MiB/s 1217MiB/s 64k 1198MiB/s 1257MiB/s 128k 1200MiB/s 1262MiB/s a fio benchmark where all 48 threads are accessing the same file: nvfs 1302MiB/s / 1302MiB/s nova 517MiB/s / 517MiB/s ext2 dax 866MiB/s / 866MiB/s ext2 nodax 1314MiB/s / 1315MiB/s ext4 dax 792MiB/s / 793MiB/s ext4 nodax 1022MiB/s / 1022MiB/s xfs dax 860MiB/s / 861MiB/s xfs nodax 1590MiB/s / 1590MiB/s fs_mark: https://github.com/josefbacik/fs_mark fs_mark -d /mnt/test/fsmark -s 10240 -n 10000 -t 48 Files/sec nvfs 130918.6 nova 130047.4 ext2 dax 7840.1 ext2 nodax 7845.7 ext4 dax 35496.7 ext4 nodax 37892.8 xfs dax 43482.1 xfs nodax 39898.1 nova was hitting this error randomly on this test, suggesting that there is some reliability problem: Error in unlink of /mnt/nova/fsmark//5f4a5e89~~~~~~~~FDMQG98TFJZ1WWHYUV9ZI23A : No such file or directory fopen failed to open: fs_log.txt.5415 fsck performance 100x the linux kernel tree on 1TB device - 449G used space, 13M used inodes "OMP_NUM_THREADS=48 numactl --cpunodebind=0 ./nvfsck -fn /dev/pmem0" nvfs uncached 20.8s nvfs cached 3.8s ext2 uncached 50.7s ext2 cached 38.5s ext4 uncached 23.9s ext4 cached 21.3s xfs uncached 20.3s xfs cached 20.3s Note that nvfsck mmaps the device and this causes the kernel to copy data from persistent memory to page cache - and this slows down nvfsck. I hope that this could be fixed and that mmap would map the persistent memory directly.