Comment by ryao
Comment by ryao 5 days ago
Percona and many others who benchmarked this properly would disagree with you. Percona found that ext4 and ZFS performed similarly when given identical hardware (with proper tuning of ZFS):
https://www.percona.com/blog/mysql-zfs-performance-update/
In this older comparison where they did not initially tune ZFS properly for the database, they found XFS to perform better, only for ZFS to outperform it when tuning was done and a L2ARC was added:
https://www.percona.com/blog/about-zfs-performance/
This is roughly what others find when they take the time to do proper tuning and benchmarks. ZFS outscales both ext4 and XFS, since it is a multiple block device filesystem that supports tiered storage while ext4 and XFS are single block device filesystems (with the exception of supporting journals on external drives). They need other things to provide them with scaling to multiple block devices and there is no block device level substitute for supporting tiered storage at the filesystem level.
That said, ZFS has a killer feature that ext4 and XFS do not have, which is low cost replication. You can snapshot and send/recv without affecting system performance very much, so even in situations where ZFS is not at the top in every benchmark such as being on equal hardware, it still wins, since the performance penalty of database backups on ext4 and XFS is huge.
There is no way that a CoW filesystem with parity calculations or striping is gonna beat XFS on multiple disks, specially on high speed NVMe.
The article provides great insight into optimizing ZFS, but using an EBS volume as store (with pretty poor IOPS) and then giving the NVMe as metadata cache only for ZFS feels like cheating. At the very least, metadata for XFS could have been offloaded to the NVMe too. I bet if we store set XFS with metadata and log to a RAMFS it will beat ZFS :)