Imaging mounted disk volumes under duress (2021)
(blog.benjojo.co.uk)76 points by yamrzou 6 days ago
76 points by yamrzou 6 days ago
Another marvel is Tom Ehlert's Drive Snapshot[0], which supports "disk image backups of live or offline Windows 2000-2022 systems all in a portable (and bootable!) ~1MB EXE".[1]
I keep my gaming machines for a long time and usually only upgrade the gpu in that time so the main "fast SSD" is much smaller than the "storage disk" of my new machine. But yes, I get rid of the Steam directory entirely and any large media files.
If I wanted to be more careful I could probably just do a full registry export and keep C:\Users\[username]\AppData. But rather than dig around trying to recall and export MORE stuff (when I want to be playing on the new machine...) I'll just keep a copy of the whole thing for reference.
And it'll get deleted down the track when I'm happily bedded into the new machine.
Other tips: if you moved your license for Windows to the new machine, run the VM without networking...
If you are wondering how to get stuff off it with no networking - because you are using (inbuilt to Windows Pro) hyperv instead of vmware - you can mount the VHD disks directly on your new machine while the VM is off.
Since the year 2007, my working assumption is that if data is not on ZFS on physically redundant media that the data has not been successfully saved. And, any machines that don't have ZFS (some RedHat based boxes) should be configured only through Ansible and configured with the intention that all data (including syslogs) is either forwarded somewhere that does have ZFS or is accessed via NFS (backed by ZFS).
Or Ceph Bluestore, which does checksums on physically redundant media. We do N+3 replication because we're lazy.
FYI, while these block level methods do have a use case, parallel rsync and other file level tools are far safer and often faster with less additional load on the disk.
Duplicating the OS/FS behavior hits the decidable problem and you just hope for the best with block level, often you won't notice corruption either.
1) The article's use case is explicitly bootable images.
2) No, most of us don't "hope for the best" with imaging, but would like to actually achieve a reasonable level of confidence. If your approach to data integrity is "you probably won't notice corruption", you don't have an approach to data integrity.
Block level copies of boot volumes is high risk, because they are almost exclusively mounted in RW mode via label or guid.
Pretty common problem for someone to do their boot drive, reboot and have it mount their backup.
If you are using iSCSI or anything with multipathing this can happen even without a reboot.
I know that block level copies seem like a good easy solution. But several decades in the storage admin to architect roles during the height of the SAN era showed that it is more problematic than you expected.
To be honest, full block level backup of a boot volume is something you do when the data isn't that critical anyways.
But if you use case requires this and the data is critical, I highly encourage you to dig into how even device files are emitted.
If you are like most people who don't have the pager scars that forced you to dig into udev etc... you probably won't realize what appears to be a concrete implementation detail is really just a facade pattern.
initrd with native OverlayFS kernel support is very versatile. ;)
Yet the btrfs, CephFS, or ZFS all have snapshot syncing tricks that make state mirrors far more practical and safe to pull off. =3
1) The article stated that using bootable images for backup was a preference. That doesn’t invalidate asking whether that’s an ideal preference
2) Arguing that it might be better to avoid such methods because of possible problems with data integrity isn’t a lack of an approach to data integrity.
Rice–Shapiro theorem.
The number of writers on a typical OS means that you can't depend on a pathological case.
I suppose you could reduce it to Rice's theorm and how threeTM is undecidable but remember it generalizes to even total functions.
Just goes back to the equivalence of two static programs requires running them, and there is too much entropy in file operations to practically cover much of the behavior.
When forced to, it can save you, but a block level copy on a live filesystem is opportunistic.
Crash consistency is obviously the best you can hope for here, so that and holes in classic NFS writes may be a more accessible lens on the non-happy path than my preferred computability one.
The guid being copied and no longer unique problem I mentioned below is where I have see people lose data the most.
The undecidable part is really a caution that it doesn't matter how smart the programmer is, there is simply no general algorithms that can remove the cost of losing the symantic meaning of those writers.
So it is not a good default strategy, only one to use when context forces it.
TL:DR it is Horses for courses
I know TheFineArticle is in Linux land but for Windows people with this issue you might look at Sysinternals Disk2vhd.[0]
It can be run from the online OS itself and it can store the resulting vhd on the same disk it is imaging (with space and disk performance constraints).
I find it handy for turning my freshly superceded gaming machine into a VM on my new machine for easy access to files, before doing whatever with my old hardware.
[0] https://learn.microsoft.com/en-us/sysinternals/downloads/dis...