Comment by rekoil
Yaeh it's a pretty huge caveat to be honest.
Da1 Db1 Dc1 Pa1 Pb1
Da2 Db2 Dc2 Pa2 Pb2
Da3 Db3 Dc3 Pa3 Pb3
___ ___ ___ Pa4 Pb4
___ represents free space. After expansion by one disk you would logically expect something like: Da1 Db1 Dc1 Da2 Pa1 Pb1
Db2 Dc2 Da3 Db3 Pa2 Pb2
Dc3 ___ ___ ___ Pa3 Pb3
___ ___ ___ ___ Pa4 Pb4
But as I understand it it would actually expand to: Da1 Db1 Dc1 Dd1 Pa1 Pb1
Da2 Db2 Dc2 Dd2 Pa2 Pb2
Da3 Db3 Dc3 Dd3 Pa3 Pb3
___ ___ ___ ___ Pa4 Pb4
Where the Dd1-3 blocks are just wasted. Meaning by adding a new disk to the array you're only expanding free storage by 25%... So say you have 8TB disks for a total of 24TB of storage free originally, and you have 4TB free before expansion, you would have 5TB free after expansion.Please tell me I've misunderstood this, because to me it is a pretty useless implementation if I haven't.
ZFS RAID-Z does not have parity disks. The parity and data is interleaved to allow data reads to be done from all disks rather than just the data disks.
The slides here explain how it works:
https://openzfs.org/w/images/5/5e/RAIDZ_Expansion_2023.pdf
Anyway, you are not entirely wrong. The old data will have the old parity:data ratio while new data will have the new parity:data ratio. As old data is freed from the vdev, new writes will use the new parity:data ratio. You can speed this up by doing send/receive, or by deleting all snapshots and then rewriting the files in place. This has the caveat that reflinks will not survive the operation, such that if you used reflinks to deduplicate storage, you will find the deduplication effect is gone afterward.