Comment by scrp

Comment by scrp 6 months ago

After years in the making ZFS raidz expansaion is finally here.

Major features added in release:

  - RAIDZ Expansion: Add new devices to an existing RAIDZ pool, increasing storage capacity without downtime.

  - Fast Dedup: A major performance upgrade to the original OpenZFS deduplication functionality.

  - Direct IO: Allows bypassing the ARC for reads/writes, improving performance in scenarios like NVMe devices where caching may hinder efficiency.

  - JSON: Optional JSON output for the most used commands.

  - Long names: Support for file and directory names up to 1023 characters.

eatbitseveryday 6 months ago

> RAIDZ Expansion: Add new devices to an existing RAIDZ pool, increasing storage capacity without downtime.

More specifically:

> A new device (disk) can be attached to an existing RAIDZ vdev

Reply View 0 replies

cromka 6 months ago

So if I’m running a Proxmox on ZFS and NVMEs, will I be better off enabling Direct IO when 2.3 gets rolled out? What are the use cases for it?

Reply View 2 replies

0x457 6 months ago

Direct IO useful for databases and other applications that use their own disk caching layer. Without knowing what you run in Proxmox no one will be able to tell you if it's beneficial or not.

Reply View | 0 replies
Saris 6 months ago

I would guess for very high performance NVMe drives.

Reply View | 0 replies

jdboyd 6 months ago

The first 4 seem like really big deals.

Reply View 4 replies

snvzz 6 months ago

The fifth is also, once you consider non-ascii names.

Reply View | 3 replies
- GeorgeTirebiter 6 months ago
  
  Could someone show a legit reason to use 1000-character filenames? Seems to me, when filenames are long like that, they are actually capturing several KEYS that can be easily searched via ls & re's. e.g.
  2025-Jan-14-1258.93743_Experiment-2345_Gas-Flow-375.3_etc_etc.dat
  But to me this stuff should be in metadata. It's just that we don't have great tools for grepping the metadata.
  Heck, the original Macintosh FS had no subdirectories - they were faked by burying subdirectory names in the (flat filesysytem) filename. The original Macintosh File System (MFS), did not support true hierarchical subdirectories. Instead, the illusion of subdirectories was created by embedding folder-like names into the filenames themselves.
  This was done by using colons (:) as separators in filenames. A file named Folder:Subfolder:File would appear to belong to a subfolder within a folder. This was entirely a user interface convention managed by the Finder. Internally, MFS stored all files in a flat namespace, with no actual directory hierarchy in the filesystem structure.
  So, there is 'utility' in "overloading the filename space". But...
  
  Reply View | 2 replies
  
  p_l 6 months ago
  
  > Could someone show a legit reason to use 1000-character filenames?
  1023 byte names can mean less than 250 characters due to use of unicode and utf-8. Add to it unicode normalization which might "expand" some characters into two or more combining characters, deliberate use of combining characters, emoji, rare characters, and you might end up with many "characters" taking more than 4 bytes. A single "country flag" character will be usually 8 bytes, usually most emoji will be at least 4 bytes, skin tone modifiers will add 4 bytes, etc.
  this ' ' takes 27 bytes in my terminal, '󠁧󠁢󠁳󠁣󠁴󠁿' takes 28, another combo I found is 35 bytes.
  And that's on top of just getting a long title using let's say one of CJK or other less common scripts - an early manuscript of somewhat successful Japanese novel has a non-normalized filename of 119 byte, and it's nowhere close to actually long titles, something that someone might reasonably have on disk. A random find on the internet easily points to a book title that takes over 300 bytes in non-normalized utf8.
  P.S. proper title of "Robinson Crusoe" if used as filename takes at least 395 bytes...
  
  Reply View | 1 reply
  
  p_l 6 months ago
  
  hah. Apparently HN eradicated the carefully pasted complex unicode emojis.
  The first was "man+woman kissing" with skin tone modifier, then there was few flags
  
  Reply View | 0 replies

cm2187 6 months ago

But I presume it is still not possible to remove a vdev.

Reply View 34 replies

ryao 6 months ago

That was added a while ago:
https://openzfs.github.io/openzfs-docs/man/master/8/zpool-re...
It works by making a readonly copy of the vdev being removed inside the remaining space. The existing vdev is then removed. Data can still be accessed from the copy, but new writes will go to an actual vdev while data no longer needed on the copy is gradually reclaimed as free space as the old data is no longer needed.

Reply View | 4 replies
- lutorm 6 months ago
  
  Although "Top-level vdevs can only be removed if the primary pool storage does not contain a top-level raidz vdev, all top-level vdevs have the same sector size, and the keys for all encrypted datasets are loaded."
  
  Reply View | 3 replies
  
  ryao 6 months ago
  
  I forgot we still did not have that last bit implemented. However, it is less important now that we have expansion.
  
  Reply View | 1 reply
  
  justinclift 6 months ago
  
  > However, it is less important now that we have expansion.
  Not really sure if that's true. They seem like two different/distinct use cases, though there's probably some small overlap.
  
  Reply View | 0 replies
  
  cm2187 6 months ago
  
  And in my case all the vdevs are raidz
  
  Reply View | 0 replies
mustache_kimono 6 months ago

Is this possible elsewhere (re: other filesystems)?

Reply View | 28 replies
- cm2187 6 months ago
  
  It is possible with windows storage space (remove drive from a pool) and mdadm/lvm (remove disk from a RAID array, remove volume from lvm), which to me are the two major alternatives. Don't know about unraid.
  
  Reply View | 20 replies
  
  lloeki 6 months ago
  
  IIUC the ask (I have a hard time wrapping my head around zfs vernacular), btrfs allows this at least in some cases.
  If you can convince btrfs balance to not use the dev to remove it will simply rebalance data to the other devs and then you can btrfs device remove.
  
  Reply View | 0 replies
  
  mustache_kimono 6 months ago
  
  > It is possible with windows storage space (remove drive from a pool) and mdadm/lvm (remove disk from a RAID array, remove volume from lvm), which to me are the two major alternatives. Don't know about unraid.
  Perhaps I am misunderstanding you, but you can offline and remove drives from a ZFS pool.
  Do you mean WSS and mdadm/lvm will allow an automatic live rebalance and then reconfigure the drive topology?
  
  Reply View | 18 replies
- c45y 6 months ago
  
  Bcachefs allows it
  
  Reply View | 2 replies
  
  eptcyka 6 months ago
  
  Cool, just have to wait before it is stable enough for daily use of mission critical data. I am personally optimistic about bcachefs, but incredibly pessimistic about changing filesystems.
  
  Reply View | 1 reply
  
  ryao 6 months ago
  
  It seems easier to copy data to a new ZFS pool if you need to remove RAID-Z top level vdevs. Another possibility is to just wait for someone to implement it in ZFS. ZFS already has top level vdev removal for other types of vdevs. Support for top level raid-z vdev removal just needs to be implemented on top of that.
  
  Reply View | 0 replies
- pantalaimon 6 months ago
  
  btrfs has supported online adding and removing of devices to the pool from the start
  
  Reply View | 1 reply
  
  allanjude 6 months ago
  
  ZFS has supported online adding of vdevs since the start too, this is specifically modifying an existing vdev and widening it, which is much less common, and much more complex
  
  Reply View | 0 replies
- unixhero 6 months ago
  
  Btrfs
  
  Reply View | 1 reply
  
  tw04 6 months ago
  
  Except you shouldn’t use btrfs for any parity based raid if you value your data at all. In fact, I’m not aware if any vendor that has implemented btrfs with parity based raid, they all resort to btrfs on md.
  
  Reply View | 0 replies

BodyCulture 6 months ago

How well tested is this in combination with encryption?

Is the ZFS team handling encryption as a first class priority at all?

ZFS on Linux inherited a lot of fame from ZFS on Solaris, but everyone using it in production should study the issue tracker very well for a realistic impression of the situation.

Reply View 6 replies

p_l 6 months ago

Main issue with encryption is occasional attempts by certain (specific) Linux kernel developer to lockout ZFS out of access to advanced instruction set extensions (far from the only weird idea of that specific developer).
The way ZFS encryption is layered, the features should be pretty much orthogonal from each other, but I'll admit that there's a bit of lacking with ZFS native encryption (though mainly in upper layer tooling in my experience rather than actual on-disk encryption parts)

Reply View | 3 replies
- ryao 6 months ago
  
  These are actually wrappers around CPU instructions, so what ZFS does is implement its own equivalents. This does not affect encryption (beyond the inconvenience that we did not have SIMD acceleration for a while on certain architectures).
  
  Reply View | 0 replies
- snvzz 6 months ago
  
  >occasional attempts by certain (specific) Linux kernel developer
  Can we please refer to them by the actual name?
  
  Reply View | 1 reply
  
  E39M5S62 6 months ago
  
  Greg Kroah-Hartman.
  
  Reply View | 0 replies
ryao 6 months ago

The new features should interact fine with encryption. They are implemented at different parts of ZFS' internal stack.
There have been many man hours put into investigating bug reports involving encryption and some fixes were made. Unfortunately, something appears to be going wrong when non-raw sends of encrypted datasets are received by another system:
https://github.com/openzfs/zfs/issues/12014
I do not believe anyone has figured out what is going wrong there. It has not been for lack of trying. Raw sends from encrypted datasets appear to be fine.

Reply View | 0 replies
[removed] 6 months ago

[deleted]

Reply View | 0 replies