Comment by JZerf

Comment by JZerf 4 days ago

4 replies

One reason why it might be a good idea to use higher quality drives when using ZFS is because it seems like in some scenarios ZFS can result in more writes being done to the drive than when other file systems are used. This can be a problem for some QLC and TLC drives that have low endurance.

I'm in the process of setting up a server at home and was testing a few different file systems. I was doing a test where I had a program continuously synchronously writing just a single byte every second (like might happen for some programs that are writing logs fairly continuously). For most of my tests I was just using the default settings for each file system. When using ext4 this resulted in 28 KB/s of actual writes being done to the drive which seems reasonable due to 4 KB blocks needing to be written, journaling, writing metadata, etc... BTRFS generated 68 KB/s of actual writes which still isn't too bad. When using ZFS about the best I could get it to do after trying various settings for volblocksize, ashift, logbias, atime, and compression settings still resulted in 312 KB/s of actual writes being done to the drive which I was not pleased with. At the rate ZFS was writing data, over a 10 year span that same program running continuously would result in about 100 TB of writes being done to the drive which is about a quarter of what my SSD is rated for.

craftkiller 4 days ago

One knob you could change that should radically alter that is zfs_txg_timeout which is how many seconds ZFS will accumulate writes before flushing them out to disk. The default is 5 seconds, but I usually increase mine to 20. When writing a lot of data, it'll get flushed to disk more often, so this timer is only for when you're writing small amounts of data like the test you just described.

> like might happen for some programs that are writing logs fairly continuously

On Linux, I think journald would be aggregating your logs from multiple services so at least you wouldn't be incurring that cost on a per-program basis. On FreeBSD with syslog we're doomed to separate log files.

> over a 10 year span that same program running continuously would result in about 100 TB of writes being done to the drive which is about a quarter of what my SSD is rated for

I sure hope I've upgraded SSDs by the year 2065.

  • JZerf 3 days ago

    > One knob you could change that should radically alter that is zfs_txg_timeout which is how many seconds ZFS will accumulate writes before flushing them out to disk.

    I don't believe that zfs_txg_timeout setting would make much of a difference for the test I described where I was doing synchronous writes.

    > On Linux, I think journald would be aggregating your logs from multiple services so at least you wouldn't be incurring that cost on a per-program basis.

    The server I'm setting up will be hosting several VMs running a mix of OSes and distros and running many types types of services and apps. Some of the logging could be aggregated but there will be multiple types of I/O (various types of databases, app updates, file server, etc...) and I wanted to get an idea of how much file system overhead there might be in a worst case kind of scenario.

    > I sure hope I've upgraded SSDs by the year 2065.

    Since I'll be running a lot of stuff on the server, I'll probably have quite a bit more writing going on than the test I described so if I used ZFS I believe the SSD could reach its rated endurance in just several years.

  • dizhn 4 days ago

    >I sure hope I've upgraded SSDs by the year 2065.

    My mind jumped at that too when I first read parent's comment. But presumably he's writing other files to disk too. Not just that one file. :)

    • JZerf 3 days ago

      > But presumably he's writing other files to disk too. Not just that one file.

      Yes, there will be much more going on than the simple test I was doing. The server will be hosting several VMs running a mix of OSes and distros and running many types types of services and apps.