small files - linux kernel tree - 13381456448 bytes / 105601 files big files - 10x linux .git pack - 13062560965 bytes test1 - novafs test2 - novafs - with clwb instead of movnti test3 - ext4 on pmem with dax test4 - ext4 on pmem test5 - ext4 on pmem with btt test6 - tmpfs test7 - ext4 on ramdisk write througput: (cp -a) small files big files test1 605 MB/s 722 MB/s test2 1006-1266 MB/s 1327-1724 MB/s test3 568 MB/s 654 MB/s test4 1356 MB/s (466 MB/s) 1647 MB/s (514 MB/s) test5 1471 MB/s (375 MB/s) 1851 MB/s (406 MB/s) test6 1899 MB/s 2236 MB/s test7 1277 MB/s (806 MB/s) 1446 MB/s (798 MB/s) (the number in parentheses is the throughput including the 'sync' command) read throughput: (grep -Fr blabla /mnt/test) small files big files test1 566 MB/s 1486 MB/s test2 563 MB/s 1471 MB/s test3 548 MB/s 1393 MB/s test4 512 MB/s 1293 MB/s test5 516 MB/s 1275 MB/s test6 664 MB/s 2386 MB/s test7 561 MB/s 1495 MB/s read from cache: small files big files 669 MB/s 2423 MB/s linear writes: (dd if=/dev/zero of=/mnt/test/file bs=64k count=10000) async oflag=dsync test1 744 MB/s 737 MB/s test2 1.4 GB/s 1.4 GB.s test3 456 MB/s 288 MB/s test4 2.0 GB/s 260 MB/s test5 2.0 GB/s 151 MB/s test6 2.9 GB/s 2.9 GB/s test7 2.0 GB/s 501 MB/s compile time benchmark: - a test, how well the compiler performs if the executable "cc1" runs directly from persistent memory (make -j1 in the lvm source tree) without dax: 33.6s with dax on pmem: 35.0s (4% worse) dm-writecache throughput: (dd if=/dev/zero of=/dev/mapper/wc bs=64k oflag=direct) writecache block size 512 1024 2048 4096 movnti 496 MB/s 642 MB/s 725 MB/s 744 MB/s clflushopt 373 MB/s 688 MB/s 1.1 GB/s 1.2 GB/s with patch 497 MB/s 688 MB/s 1.1 GB/s 1.2 GB/s