small files - linux kernel tree - 13381456448 bytes / 105601 files big files - 10x linux .git pack - 13062560965 bytes test1 - novafs test2 - ext4 on pmem with dax test3 - ext4 on pmem test4 - ext4 on pmem with btt test5 - ext4 on ramdisk test6 - xfs on pmem with dax test7 - xfs on pmem test8 - xfs on pmem with btt test9 - xfs on ramdisk test10 - tmpfs asterisk* - the kernel is hacked to use clwb instead of movnti write througput: (cp -a, the source tree is in cache) small files big files test1 621 MB/s 662 MB/s test1* 1013 MB/s 1323 MB/s test2 579 MB/s 625 MB/s test2* 657 MB/s 767 MB/s test3 1441 MB/s (762 MB/s) 1681 MB/s (651 MB/s) test3* 1431 MB/s (664 MB/s) 1844 MB/s (764 MB/s) test4 1455 MB/s (251 MB/s) 1672 MB/s (262 MB/s) test4* 1457 MB/s (339 MB/s) 1843 MB/s (364 MB/s) test5 1330 MB/s (865 MB/s) 1677 MB/s (659 MB/s) test5* 1457 MB/s (778 MB/s) 1858 MB/s (676 MB/s) test6 473 MB/s 638 MB/s test6* 683 MB/s 739 MB/s test7 1349 MB/s (592 MB/s) 2534 MB/s (802 MB/s) test7* 1569 MB/s (688 MB/s) 1477 MB/s (702 MB/s) test8 1584 MB/s (278 MB/s) 2546 MB/s (282 MB/s) test8* 1513 MB/s (517 MB/s) 2173 MB/s (603 MB/s) test9 1576 MB/s (917 MB/s) 2498 MB/s (974 MB/s) test9* 1529 MB/s (913 MB/s) 1883 MB/s (897 MB/s) test10 1920 MB/s 1866 MB/s test10* 2102 MB/s 2613 MB/s (the number in parentheses is the throughput including the 'sync' command) read throughput: (grep -Fr blabla /mnt/test, the searched tree is not in cache) small files big files test1 551 MB/s 1634 MB/s test1* 604 MB/s 1771 MB/s test2 585 MB/s 1630 MB/s test2* 548 MB/s 1395 MB/s test3 520 MB/s 1292 MB/s test3* 499 MB/s 1196 MB/s test4 498 MB/s 1220 MB/s test4* 495 MB/s 1194 MB/s test5 552 MB/s 1511 MB/s test5* 564 MB/s 1515 MB/s test6 584 MB/s 1626 MB/s test6* 547 MB/s 1395 MB/s test7 525 MB/s 1267 MB/s test7* 493 MB/s 1190 MB/s test8 503 MB/s 1214 MB/s test8* 517 MB/s 1293 MB/s test9 557 MB/s 1459 MB/s test9* 551 MB/s 1484 MB/s test10 670 MB/s 2403 MB/s test10* 669 MB/s 2410 MB/s read from cache: small files big files read 675 MB/s 2445 MB/s read* 672 MB/s 2428 MB/s linear writes: (dd if=/dev/zero of=/mnt/test/file bs=64k count=10000) unsynced oflag=dsync test1 745 MB/s 745 MB/s test1* 1.4 GB/s 1.4 GB/s test2 661 MB/s 315 MB/s test2* 691 MB/s 350 MB/s test3 2.0 MB/s 239 MB/s test3* 2.0 GB/s 351 MB/s test4 2.0 GB/s 164 MB/s test4* 2.0 GB/s 250 MB/s test5 1.9 GB/s 391 MB/s test5* 2.0 GB/s 496 MB/s test6 650 MB/s 284 MB/s test6* 662 MB/s 285 MB/s test7 2.8 GB/s 366 MB/s test7* 2.8 GB/s 385 MB/s test8 2.9 GB/s 203 MB/s test8* 2.8 GB/s 313 MB/s test9 2.8 GB/s 754 MB/s test9* 2.8 GB/s 824 MB/s test10 2.9 GB/s 2.9 GB/s test10* 2.9 GB/s 2.9 GB/s compile time benchmark: - a test, how well the compiler performs if the executable "cc1" runs directly from persistent memory (make -j1 in the lvm source tree) without dax: 33.6s with dax on pmem: 35.0s (4% worse)