08 — Disaster Recovery¶
What to do when something goes wrong, ordered by severity.
Scenario 1: Accidentally deleted a file¶
Window of recovery: 7 days for daily snapshots, 4 weeks for weekly.
If you noticed quickly and the next 3 AM backup hasn't run yet:
- Browse to
\\z2mini\backup\current\files\(read-only share — note thefiles\subdir;current\now also holdsimmich\,music\,db-dumps\,service-config\) - Find the file
- Copy it back to
\\z2mini\files\
If the next backup has already run, the deletion has propagated to current/files/ but the file still exists in any snapshot from before the deletion:
- Browse to
\\z2mini\backup\daily\<recent-date>\files\for the most recent snapshot from before deletion - Or
\\z2mini\backup\weekly\<recent-date>\files\for older deletions - Copy the file back to
\\z2mini\files\
You can also do this from the Z2's terminal:
cp /mnt/backup/daily/2026-05-11/files/path/to/file /data/files/path/to/file
sudo chown gabriel:gabriel /data/files/path/to/file
(For a music file it's …/daily/<date>/music/…; for a raw photo file …/daily/<date>/immich/…, though recovering Immich is usually better done through Immich's own restore flow — see 11-immich.md.)
Scenario 2: Backup script hasn't run / silent failure¶
Symptoms: SMART alerts arriving fine, but tail /mnt/backup/backup.log shows old timestamps.
# Check cron is running
sudo systemctl status cron
# Check recent cron activity
sudo journalctl -u cron --since "2 days ago"
# Verify your crontab is intact
crontab -l
If the crontab entry is missing, restore it:
Add: 0 3 * * * /home/gabriel/scripts/backup-files.sh
Run the backup manually to verify it still works:
Scenario 3: Backup drive failed or disconnected¶
Symptoms: df -h doesn't show /mnt/backup, or the backup script logs "ERROR: /mnt/backup is not mounted."
If the drive shows in lsblk but won't mount, check the cable, try a different USB port, or try the drive in another computer to see if it's the drive itself or the connection.
If the drive is dead, replace it. Steps to set up a new backup drive (this is the same procedure used for the May 2026 T5 → 990 PRO swap — see 05-backups.md):
- Connect the new drive
- Identify it:
lsblk - Wipe and partition (replace
sdXwith actual device): - Get the new UUID:
sudo blkid /dev/sdX1 - Update
/etc/fstabwith the new UUID — replace the existing/mnt/backupline; the format isUUID=<new> /mnt/backup ext4 defaults,nofail 0 2 - If it's an NVMe-in-USB enclosure, update
/etc/smartd.conf— the backup-drive line is/dev/disk/by-id/usb-...-0:0 -d sntasmedia -a -m gabrielgabrie99@gmail.com(find the by-id path withls -l /dev/disk/by-id/ | grep -i usb; pick the bridge-chip-d snt*variant —sntasmediafor ASMedia). See 06-drive-monitoring.md. - Mount:
sudo mount -a - Take ownership:
sudo chown -R gabriel:gabriel /mnt/backup - Run the backup to populate it:
~/scripts/backup-files.sh(the first full run copies ~520 GB and takes ~9 min over USB)
You'll lose your previous snapshot history but new backups start fresh.
Scenario 4: Data drive (/data) failed¶
Symptoms: /data won't mount, dmesg shows I/O errors, files in /data/files are inaccessible.
This is what your backups are for.
- Replace the failed drive (1TB NVMe, slot 2 in the Z2)
- Boot the Z2 (it should boot fine since OS is on a different drive)
- Format the new drive:
- Mount it:
- Take ownership:
sudo chown -R gabriel:gabriel /data - Restore the Samba files from backup:
- Restore the music library:
- Restore the Docker services — see Scenario 5, Step 6b below for the per-service restore (Immich library + DB dump, the SQLite DBs from
db-dumps/, theservice-config/<svc>/dirs). The containers themselves come back when youdocker compose up -deach stack. - Verify Samba can serve it: connect from your laptop and check files appear
Scenario 5: OS drive failed¶
Symptoms: Z2 won't boot, or boots into emergency mode, or kernel panics on startup.
This is the big one. Full system rebuild required.
Step 1: Replace the OS drive¶
Physically swap the failed Samsung drive with a working one (any 1TB+ NVMe).
Step 2: Fresh Ubuntu install¶
Follow 02-os-install.md, but do not reformat the data drive during the storage step. Only the new OS drive needs formatting. The SK hynix drive at /data should still have all your files.
When the storage configuration step comes up:
- Custom storage layout
- Use the new OS drive as boot device, format and mount at /
- For the SK hynix drive (your data drive): don't reformat — just set the mount point to /data (the existing partition is fine)
Step 3: Restore configuration from backup¶
After Ubuntu install completes and you can SSH in:
Verify backup drive is accessible: ls /mnt/backup/
# Restore the package list
sudo dpkg --set-selections < /mnt/backup/system-state/installed-packages.txt
sudo apt-get update
sudo apt-get -y dselect-upgrade
This reinstalls every package you had, including Samba, smartmontools, msmtp, sqlite3 (the backup script's SQLite online-backup step depends on it), Docker, etc. If the package-list restore misses anything, sudo apt install sqlite3 smartmontools msmtp msmtp-mta covers the load-bearing ones explicitly. The Docker stacks are restored from the backup in Step 6b below — docker.io / docker-compose-plugin (or the upstream docker-ce packages, however the box was set up) must be present first.
Step 4: Restore configuration files¶
# Find the most recent system config archive
LATEST=$(ls -t /mnt/backup/system-state/system-config-*.tar.gz | head -1)
echo "Restoring from: $LATEST"
# Extract over /
sudo tar -xzf "$LATEST" -C /
This restores:
- /etc/samba/smb.conf
- /etc/smartd.conf
- /etc/msmtprc
- /etc/apparmor.d/local/usr.bin.msmtp
- /etc/fstab
- /etc/hostname
- /etc/hosts
- /etc/logrotate.d/backup-files
- /etc/logrotate.d/msmtp
- /home/gabriel/scripts/
Step 5: Restore user crontab¶
Step 6: Re-create the backup script wrapper and sudoers¶
These aren't in the standard config backup (since the wrapper lives in /usr/local/sbin):
Create /usr/local/sbin/backup-system-state.sh per 05-backups.md, and /etc/sudoers.d/gabriel-backup. Refer to that doc for exact contents.
sudo chown root:root /usr/local/sbin/backup-system-state.sh
sudo chmod 755 /usr/local/sbin/backup-system-state.sh
Step 6b: Restore the self-hosted services¶
The Docker stacks are covered by the nightly backup (see 05-backups.md). Restore them onto a freshly-formatted /data (or onto the surviving /data if it's intact):
1. Each service's config / compose:
# restore every service's compose + .env + config
for svc in immich navidrome radicale vaultwarden beszel homepage caddy; do
sudo mkdir -p /data/docker/$svc
sudo rsync -a /mnt/backup/current/service-config/$svc/ /data/docker/$svc/
done
sudo chown -R gabriel:gabriel /data/docker
Ownership notes after this: Radicale's data/collections/ should be 2999:2999 (sudo chown -R 2999:2999 /data/docker/radicale/data /data/docker/radicale/config); Vaultwarden's data/ should be 1000:1000; the rest are fine as gabriel.
2. The SQLite databases — from db-dumps/ (these are the sqlite3 ".backup" dumps, ready to drop in as the live DB):
# Vaultwarden — into data/db.sqlite3 (make sure no stale -wal/-shm files are present)
sudo mkdir -p /data/docker/vaultwarden/data
sudo cp /mnt/backup/current/db-dumps/vaultwarden-db.sqlite3 /data/docker/vaultwarden/data/db.sqlite3
sudo rm -f /data/docker/vaultwarden/data/db.sqlite3-wal /data/docker/vaultwarden/data/db.sqlite3-shm
sudo chown -R 1000:1000 /data/docker/vaultwarden/data
# Navidrome — into data/navidrome.db (artwork/cache regenerate on next scan)
sudo mkdir -p /data/docker/navidrome/data
sudo cp /mnt/backup/current/db-dumps/navidrome.db /data/docker/navidrome/data/navidrome.db
sudo chown -R 1000:1000 /data/docker/navidrome/data
# Beszel — hub data.db + auxiliary.db (regenerate the hub's id_ed25519 / re-pair the agent on first boot)
sudo mkdir -p /data/docker/beszel/data
sudo cp /mnt/backup/current/db-dumps/beszel-data.db /data/docker/beszel/data/data.db
sudo cp /mnt/backup/current/db-dumps/beszel-auxiliary.db /data/docker/beszel/data/auxiliary.db
(Vaultwarden's rsa_key.pem, attachments/, sends/, config.json came back with the service-config/vaultwarden/ rsync in step 1.)
3. Immich — the photo files plus the database, following Immich's official backup/restore procedure:
# the photo library (UPLOAD_LOCATION) — comes back owned by gabriel since the
# backup rsync ran as gabriel; Immich fixes ownership on first boot, or chown if needed
sudo mkdir -p /data/docker/immich/library
sudo rsync -a /mnt/backup/current/immich/ /data/docker/immich/library/
# the .sql.gz database dumps are inside library/backups/ — that came along with the rsync
ls /data/docker/immich/library/backups/ # immich-db-backup-*.sql.gz
Then bring up the Immich stack, but recreate the Postgres database from the most recent .sql.gz dump per Immich's restore docs (recreate the DB, create the required extensions, load the dump) before starting immich-server. The postgres/ data dir is not in the backup by design — the dump is the canonical DB copy.
A note on ownership: the Immich
library/files come back owned bygabrielbecause the nightly backup rsync runs asgabriel(it uses--no-owner --no-group). On a fresh restore that's harmless — Immich's container runs as its own uid and either fixes ownership itself on first boot or just reads/writes fine; asudo chown -R <immich-uid>:<immich-uid> /data/docker/immich/libraryis the belt-and-suspenders move if Immich complains. Same caveat for Radicale'sdata/(re-chown to2999:2999).
4. Bring each stack up (Caddy first so the HTTPS front-ends come back, then the rest):
cd /data/docker/caddy && docker compose build && docker compose up -d
for svc in immich navidrome radicale vaultwarden beszel homepage; do
( cd /data/docker/$svc && docker compose up -d )
done
docker ps # everything healthy?
Caddy re-issues every Let's Encrypt cert via the Cloudflare DNS-01 challenge on first boot — needs the CF_API_TOKEN in /data/docker/caddy/.env (which came back with the service-config/caddy/ rsync). See 17-caddy.md.
Step 7: Re-set the Samba password¶
Samba passwords aren't in the backup for security reasons:
Step 8: Re-authenticate Tailscale¶
Visit the URL printed and authenticate. Your tailnet recognizes the same hostname (z2mini).
Step 9: Restart everything¶
Step 10: Verify¶
- Connect from your laptop:
ssh gabriel@z2mini - Try the SMB share:
\\z2mini\files - Run a backup:
~/scripts/backup-files.sh - Confirm log:
tail -30 /mnt/backup/backup.log
If all of those work, the system is fully recovered.
Estimated total time: ~1 hour (Ubuntu install + package restore + config extract + service restart).
Scenario 6: Both NVMe drives died (Z2 hardware failure)¶
Symptoms: Z2 won't power on, or won't see any drives.
You can't recover to the Z2, but your data is safe on the backup drive. Two options:
Option A: Replace the Z2 with another machine
Any computer with at least one drive bay can become the new Z2. Follow the disaster recovery procedure above on the replacement hardware. The hostname stays z2mini so all your client devices keep working.
Option B: Read your files from any Linux machine
Plug the backup drive (the 990 PRO in its ASM2462 USB enclosure) into any Linux laptop, mount it, and your data is right there at /mnt/backup/current/ — files/, immich/, music/, db-dumps/, service-config/. You can copy files back to your laptop until you have a permanent solution.
Scenario 7: Both NVMe drives AND the on-site backup drive died¶
The catastrophic case. There's now a partial off-site mitigation — the Samsung T5 (backup-offsite) at family's house, refreshed during visits via ~/scripts/backup-offsite.sh (see 05-backups.md).
The off-site drive holds (as of the last refresh): the Samba files, the music library, the SQLite DB dumps (Vaultwarden vault, Beszel ×2, Navidrome), every service's config (service-config/), and the system-state archives. So in this scenario you recover the documents, music, all service config + databases, and the system config from the off-site T5 — plug it into a Linux machine, mount /dev/disk/by-label/backup-offsite, copy what you need.
What's lost in this scenario: the Immich photo library. It's ~520 GB and doesn't fit on the 500 GB off-site T5 — so the photos only ever had copies at the apartment (/data live + /mnt/backup backup). If both of those are gone, the photos are gone (apart from whatever individual images still exist on the iPhone's camera roll, since iOS auto-backup keeps captures local for a while). Closing this gap needs a >1 TB off-site drive — see the future-improvements note in 05-backups.md.
Recovery README on the backup drive¶
A copy of recovery procedures lives at /mnt/backup/RECOVERY-README.md so it's accessible even when the Z2 itself is unavailable — it covers the package-reinstall list (now including sqlite3), the /etc config extract, the service-config/ + db-dumps/ + Immich restore, and the Caddy docker compose build && up -d step. Update it whenever recovery procedures change here.