Comment by lousken

Comment by lousken 5 days ago

25 replies

as far as stability goes, btrfs is used by meta, synology and many others, so I wouldn't say it's not stable, but some features are lacking

azalemeth 5 days ago

My understanding is that single-disk btrfs is good, but raid is decidedly dodgy; https://btrfs.readthedocs.io/en/latest/btrfs-man5.html#raid5... states that:

> The RAID56 feature provides striping and parity over several devices, same as the traditional RAID5/6.

> There are some implementation and design deficiencies that make it unreliable for some corner cases and *the feature should not be used in production, only for evaluation or testing*.

> The power failure safety for metadata with RAID56 is not 100%.

I have personally been bitten once (about 10 years ago) by btrfs just failing horribly on a single desktop drive. I've used either mdadm + ext4 (for /) or zfs (for large /data mounts) ever since. Zfs is fantastic and I genuinely don't understand why it's not used more widely.

  • crest 5 days ago

    One problem with your setup is that ZFS by design can't use a traditional *nix filesystem buffer cache. Instead it has to use its own ARC (adaptive replacement cache) with end-to-end checksumming, transparent compression, and copy-on-write semantics. This can lead to annoying performance problems when the two types of file system caches contest for available memory. There is a back pressure mechanism, but it effectively pauses other writes while evicting dirty cache entries to release memory.

    • ryao 5 days ago

      Traditionally, you have the page cache on top of the FS and the buffer cache below the FS, with the two being unified such that double caching is avoided in traditional UNIX filesystems.

      ZFS goes out of its way to avoid the buffer cache, although Linux does not give it the option to fully opt out of it since the block layer will buffer reads done by userland to disks underneath ZFS. That is why ZFS began to purge the buffer cache on every flush 11 years ago:

      https://github.com/openzfs/zfs/commit/cecb7487fc8eea3508c3b6...

      That is how it still works today:

      https://github.com/openzfs/zfs/blob/fe44c5ae27993a8ff53f4cef...

      If I recall correctly, the page cache is also still above ZFS when mmap() is used. There was talk about fixing it by having mmap() work out of ARC instead, but I don’t believe it was ever done, so there is technically double caching done there.

      • taskforcegemini 5 days ago

        what's the best way to deal with this then? disable filecache of linux? I've tried disabling/minimizing arc in the past to avoid the oom reaper, but the arc was stubborn and its RAM usage remained as is

    • [removed] 5 days ago
      [deleted]
  • lousken 5 days ago

    I was assuming OP wants to highlight filesystem use on a workstation/desktop, not for a file server/NAS. I had similar experience decade ago, but these days single drives just work, same with mirroring. For such setups btrfs should be stable. I've never seen a workstation with raid5/6 setup. Secondly, filesystems and volume managers are something else, even if e.g. btrfs and ZFS are essentialy both.

    For a NAS setup I would still prefer ZFS with truenas scale (or proxmox if virtualization is needed), just because all these scenarios are supported as well. And as far as ZFS goes, encryption is still something I am not sure about especially since I want to use snapshots sending those as a backup to remote machine.

  • hooli_gan 5 days ago

    RAID5/6 is not needed with btrfs. One should use RAID1, which supports striping the same data onto multiple drives in a redundant way.

    • johnmaguire 5 days ago

      How can you achieve 2-disk fault tolerance using btrfs and RAID 1?

      • Dalewyn 5 days ago

        By using three drives.

        RAID1 is just making literal copies, so each additional drive in a RAID1 is a self-sufficient copy. You want two drives of fault tolerance? Use three drives, so if you lose two copies you still have one left.

        This is of course hideously inefficient as you scale larger, but that is not the question posed.

  • brian_cunnie 5 days ago

    > I have personally been bitten once (about 10 years ago) by btrfs just failing horribly on a single desktop drive.

    Me, too. The drive was unrecoverable. I had to reinstall from scratch.

jeltz 4 days ago

It is possible to corrupt the file system from user space as a normal user with Btrfs. The PostgreSQL devs found that when working on async IO. And as fer as I know that issue has not been fixed.

https://www.postgresql.org/message-id/CA%2BhUKGL-sZrfwcdme8j...

_joel 5 days ago

I'm similar to some other people here, I guess once they've been bitten by data loss due to btrfs, it's difficult to advocate for it.

  • lousken 5 days ago

    I am assuming almost everybody at some point experienced data loss because they pulled out a flash drive too early. Is it safe to assume that we stopped using flash drives because of it?

    • _joel 5 days ago

      I'm not sure we have stopped using flash, judging by the pile of USB sticks on my desk :) In relation to the fs analogy if you used a flash drive that you know corrupted your data, you'd throw it away for one you know works.

      • ryao 5 days ago

        I once purchased a bunch of flash drives from Google’s online swag store and just unplugging them was often enough to put then in a state where they claimed to be 8MB devices and nothing I wrote to them was ever possible to read back in my limited tests. I stopped using those fast.

fourfour3 5 days ago

Do Synology actually use the multi-device options of btrfs, or are they using linux softraid + lvm underneath?

I know Synology Hybrid RAID is a clever use of LVM + MD raid, for example.

  • phs2501 4 days ago

    I believe Synology runs btrfs on top of regular mdraid + lvm, possibly with patches to let btrfs checksum failures reach into the underlying layers to find the right data to recover.

    Related blog post: https://daltondur.st/syno_btrfs_1/