Skip to content

08 — Disaster Recovery

What to do when something goes wrong, ordered by severity.


Scenario 1: Accidentally deleted a file

Window of recovery: 7 days for daily snapshots, 4 weeks for weekly.

If you noticed quickly and the next 3 AM backup hasn't run yet:

  1. Browse to \\z2mini\backup\current\files\ (read-only share — note the files\ subdir; current\ now also holds immich\, music\, db-dumps\, service-config\)
  2. Find the file
  3. Copy it back to \\z2mini\files\

If the next backup has already run, the deletion has propagated to current/files/ but the file still exists in any snapshot from before the deletion:

  1. Browse to \\z2mini\backup\daily\<recent-date>\files\ for the most recent snapshot from before deletion
  2. Or \\z2mini\backup\weekly\<recent-date>\files\ for older deletions
  3. Copy the file back to \\z2mini\files\

You can also do this from the Z2's terminal:

cp /mnt/backup/daily/2026-05-11/files/path/to/file /data/files/path/to/file
sudo chown gabriel:gabriel /data/files/path/to/file

(For a music file it's …/daily/<date>/music/…; for a raw photo file …/daily/<date>/immich/…, though recovering Immich is usually better done through Immich's own restore flow — see 11-immich.md.)


Scenario 2: Backup script hasn't run / silent failure

Symptoms: SMART alerts arriving fine, but tail /mnt/backup/backup.log shows old timestamps.

# Check cron is running
sudo systemctl status cron

# Check recent cron activity
sudo journalctl -u cron --since "2 days ago"

# Verify your crontab is intact
crontab -l

If the crontab entry is missing, restore it:

crontab -e

Add: 0 3 * * * /home/gabriel/scripts/backup-files.sh

Run the backup manually to verify it still works:

~/scripts/backup-files.sh
tail -30 /mnt/backup/backup.log

Scenario 3: Backup drive failed or disconnected

Symptoms: df -h doesn't show /mnt/backup, or the backup script logs "ERROR: /mnt/backup is not mounted."

# See what drives are visible
lsblk

# Try to mount manually
sudo mount -a

If the drive shows in lsblk but won't mount, check the cable, try a different USB port, or try the drive in another computer to see if it's the drive itself or the connection.

If the drive is dead, replace it. Steps to set up a new backup drive (this is the same procedure used for the May 2026 T5 → 990 PRO swap — see 05-backups.md):

  1. Connect the new drive
  2. Identify it: lsblk
  3. Wipe and partition (replace sdX with actual device):
    sudo wipefs -a /dev/sdX
    sudo parted /dev/sdX mklabel gpt
    sudo parted /dev/sdX mkpart primary ext4 1MiB 100%
    sudo mkfs.ext4 -L backup /dev/sdX1
    sudo tune2fs -m 0 /dev/sdX1          # reclaim the ext4 reserved blocks — this drive only holds backups
    
  4. Get the new UUID: sudo blkid /dev/sdX1
  5. Update /etc/fstab with the new UUID — replace the existing /mnt/backup line; the format is UUID=<new> /mnt/backup ext4 defaults,nofail 0 2
  6. If it's an NVMe-in-USB enclosure, update /etc/smartd.conf — the backup-drive line is /dev/disk/by-id/usb-...-0:0 -d sntasmedia -a -m gabrielgabrie99@gmail.com (find the by-id path with ls -l /dev/disk/by-id/ | grep -i usb; pick the bridge-chip -d snt* variant — sntasmedia for ASMedia). See 06-drive-monitoring.md.
  7. Mount: sudo mount -a
  8. Take ownership: sudo chown -R gabriel:gabriel /mnt/backup
  9. Run the backup to populate it: ~/scripts/backup-files.sh (the first full run copies ~520 GB and takes ~9 min over USB)

You'll lose your previous snapshot history but new backups start fresh.


Scenario 4: Data drive (/data) failed

Symptoms: /data won't mount, dmesg shows I/O errors, files in /data/files are inaccessible.

This is what your backups are for.

  1. Replace the failed drive (1TB NVMe, slot 2 in the Z2)
  2. Boot the Z2 (it should boot fine since OS is on a different drive)
  3. Format the new drive:
    lsblk  # identify the new drive
    sudo wipefs -a /dev/nvmeXn1
    sudo parted /dev/nvmeXn1 mklabel gpt
    sudo parted /dev/nvmeXn1 mkpart primary ext4 1MiB 100%
    sudo mkfs.ext4 /dev/nvmeXn1p1
    
  4. Mount it:
    sudo blkid /dev/nvmeXn1p1  # get the new UUID
    sudo nano /etc/fstab  # update the /data line with new UUID
    sudo mount -a
    
  5. Take ownership: sudo chown -R gabriel:gabriel /data
  6. Restore the Samba files from backup:
    sudo mkdir -p /data/files
    sudo rsync -a /mnt/backup/current/files/ /data/files/
    sudo chown -R gabriel:gabriel /data/files
    
  7. Restore the music library:
    sudo mkdir -p /data/music
    sudo rsync -a /mnt/backup/current/music/ /data/music/
    sudo chown -R gabriel:gabriel /data/music
    
  8. Restore the Docker services — see Scenario 5, Step 6b below for the per-service restore (Immich library + DB dump, the SQLite DBs from db-dumps/, the service-config/<svc>/ dirs). The containers themselves come back when you docker compose up -d each stack.
  9. Verify Samba can serve it: connect from your laptop and check files appear

Scenario 5: OS drive failed

Symptoms: Z2 won't boot, or boots into emergency mode, or kernel panics on startup.

This is the big one. Full system rebuild required.

Step 1: Replace the OS drive

Physically swap the failed Samsung drive with a working one (any 1TB+ NVMe).

Step 2: Fresh Ubuntu install

Follow 02-os-install.md, but do not reformat the data drive during the storage step. Only the new OS drive needs formatting. The SK hynix drive at /data should still have all your files.

When the storage configuration step comes up: - Custom storage layout - Use the new OS drive as boot device, format and mount at / - For the SK hynix drive (your data drive): don't reformat — just set the mount point to /data (the existing partition is fine)

Step 3: Restore configuration from backup

After Ubuntu install completes and you can SSH in:

# Mount the backup drive (assuming it's connected)
sudo mount -a

Verify backup drive is accessible: ls /mnt/backup/

# Restore the package list
sudo dpkg --set-selections < /mnt/backup/system-state/installed-packages.txt
sudo apt-get update
sudo apt-get -y dselect-upgrade

This reinstalls every package you had, including Samba, smartmontools, msmtp, sqlite3 (the backup script's SQLite online-backup step depends on it), Docker, etc. If the package-list restore misses anything, sudo apt install sqlite3 smartmontools msmtp msmtp-mta covers the load-bearing ones explicitly. The Docker stacks are restored from the backup in Step 6b below — docker.io / docker-compose-plugin (or the upstream docker-ce packages, however the box was set up) must be present first.

Step 4: Restore configuration files

# Find the most recent system config archive
LATEST=$(ls -t /mnt/backup/system-state/system-config-*.tar.gz | head -1)
echo "Restoring from: $LATEST"

# Extract over /
sudo tar -xzf "$LATEST" -C /

This restores: - /etc/samba/smb.conf - /etc/smartd.conf - /etc/msmtprc - /etc/apparmor.d/local/usr.bin.msmtp - /etc/fstab - /etc/hostname - /etc/hosts - /etc/logrotate.d/backup-files - /etc/logrotate.d/msmtp - /home/gabriel/scripts/

Step 5: Restore user crontab

crontab /mnt/backup/system-state/gabriel-crontab.txt
crontab -l  # verify

Step 6: Re-create the backup script wrapper and sudoers

These aren't in the standard config backup (since the wrapper lives in /usr/local/sbin):

Create /usr/local/sbin/backup-system-state.sh per 05-backups.md, and /etc/sudoers.d/gabriel-backup. Refer to that doc for exact contents.

sudo chown root:root /usr/local/sbin/backup-system-state.sh
sudo chmod 755 /usr/local/sbin/backup-system-state.sh

Step 6b: Restore the self-hosted services

The Docker stacks are covered by the nightly backup (see 05-backups.md). Restore them onto a freshly-formatted /data (or onto the surviving /data if it's intact):

sudo mkdir -p /data/docker

1. Each service's config / compose:

# restore every service's compose + .env + config
for svc in immich navidrome radicale vaultwarden beszel homepage caddy; do
  sudo mkdir -p /data/docker/$svc
  sudo rsync -a /mnt/backup/current/service-config/$svc/ /data/docker/$svc/
done
sudo chown -R gabriel:gabriel /data/docker

Ownership notes after this: Radicale's data/collections/ should be 2999:2999 (sudo chown -R 2999:2999 /data/docker/radicale/data /data/docker/radicale/config); Vaultwarden's data/ should be 1000:1000; the rest are fine as gabriel.

2. The SQLite databases — from db-dumps/ (these are the sqlite3 ".backup" dumps, ready to drop in as the live DB):

# Vaultwarden — into data/db.sqlite3 (make sure no stale -wal/-shm files are present)
sudo mkdir -p /data/docker/vaultwarden/data
sudo cp /mnt/backup/current/db-dumps/vaultwarden-db.sqlite3 /data/docker/vaultwarden/data/db.sqlite3
sudo rm -f /data/docker/vaultwarden/data/db.sqlite3-wal /data/docker/vaultwarden/data/db.sqlite3-shm
sudo chown -R 1000:1000 /data/docker/vaultwarden/data

# Navidrome — into data/navidrome.db (artwork/cache regenerate on next scan)
sudo mkdir -p /data/docker/navidrome/data
sudo cp /mnt/backup/current/db-dumps/navidrome.db /data/docker/navidrome/data/navidrome.db
sudo chown -R 1000:1000 /data/docker/navidrome/data

# Beszel — hub data.db + auxiliary.db (regenerate the hub's id_ed25519 / re-pair the agent on first boot)
sudo mkdir -p /data/docker/beszel/data
sudo cp /mnt/backup/current/db-dumps/beszel-data.db /data/docker/beszel/data/data.db
sudo cp /mnt/backup/current/db-dumps/beszel-auxiliary.db /data/docker/beszel/data/auxiliary.db

(Vaultwarden's rsa_key.pem, attachments/, sends/, config.json came back with the service-config/vaultwarden/ rsync in step 1.)

3. Immich — the photo files plus the database, following Immich's official backup/restore procedure:

# the photo library (UPLOAD_LOCATION) — comes back owned by gabriel since the
# backup rsync ran as gabriel; Immich fixes ownership on first boot, or chown if needed
sudo mkdir -p /data/docker/immich/library
sudo rsync -a /mnt/backup/current/immich/ /data/docker/immich/library/
# the .sql.gz database dumps are inside library/backups/ — that came along with the rsync
ls /data/docker/immich/library/backups/   # immich-db-backup-*.sql.gz

Then bring up the Immich stack, but recreate the Postgres database from the most recent .sql.gz dump per Immich's restore docs (recreate the DB, create the required extensions, load the dump) before starting immich-server. The postgres/ data dir is not in the backup by design — the dump is the canonical DB copy.

A note on ownership: the Immich library/ files come back owned by gabriel because the nightly backup rsync runs as gabriel (it uses --no-owner --no-group). On a fresh restore that's harmless — Immich's container runs as its own uid and either fixes ownership itself on first boot or just reads/writes fine; a sudo chown -R <immich-uid>:<immich-uid> /data/docker/immich/library is the belt-and-suspenders move if Immich complains. Same caveat for Radicale's data/ (re-chown to 2999:2999).

4. Bring each stack up (Caddy first so the HTTPS front-ends come back, then the rest):

cd /data/docker/caddy && docker compose build && docker compose up -d
for svc in immich navidrome radicale vaultwarden beszel homepage; do
  ( cd /data/docker/$svc && docker compose up -d )
done
docker ps   # everything healthy?

Caddy re-issues every Let's Encrypt cert via the Cloudflare DNS-01 challenge on first boot — needs the CF_API_TOKEN in /data/docker/caddy/.env (which came back with the service-config/caddy/ rsync). See 17-caddy.md.

Step 7: Re-set the Samba password

Samba passwords aren't in the backup for security reasons:

sudo smbpasswd -a gabriel
sudo smbpasswd -e gabriel

Step 8: Re-authenticate Tailscale

sudo tailscale up --ssh

Visit the URL printed and authenticate. Your tailnet recognizes the same hostname (z2mini).

Step 9: Restart everything

sudo systemctl restart smbd smartd
sudo systemctl status smbd smartd tailscaled

Step 10: Verify

  • Connect from your laptop: ssh gabriel@z2mini
  • Try the SMB share: \\z2mini\files
  • Run a backup: ~/scripts/backup-files.sh
  • Confirm log: tail -30 /mnt/backup/backup.log

If all of those work, the system is fully recovered.

Estimated total time: ~1 hour (Ubuntu install + package restore + config extract + service restart).


Scenario 6: Both NVMe drives died (Z2 hardware failure)

Symptoms: Z2 won't power on, or won't see any drives.

You can't recover to the Z2, but your data is safe on the backup drive. Two options:

Option A: Replace the Z2 with another machine

Any computer with at least one drive bay can become the new Z2. Follow the disaster recovery procedure above on the replacement hardware. The hostname stays z2mini so all your client devices keep working.

Option B: Read your files from any Linux machine

Plug the backup drive (the 990 PRO in its ASM2462 USB enclosure) into any Linux laptop, mount it, and your data is right there at /mnt/backup/current/files/, immich/, music/, db-dumps/, service-config/. You can copy files back to your laptop until you have a permanent solution.


Scenario 7: Both NVMe drives AND the on-site backup drive died

The catastrophic case. There's now a partial off-site mitigation — the Samsung T5 (backup-offsite) at family's house, refreshed during visits via ~/scripts/backup-offsite.sh (see 05-backups.md).

The off-site drive holds (as of the last refresh): the Samba files, the music library, the SQLite DB dumps (Vaultwarden vault, Beszel ×2, Navidrome), every service's config (service-config/), and the system-state archives. So in this scenario you recover the documents, music, all service config + databases, and the system config from the off-site T5 — plug it into a Linux machine, mount /dev/disk/by-label/backup-offsite, copy what you need.

What's lost in this scenario: the Immich photo library. It's ~520 GB and doesn't fit on the 500 GB off-site T5 — so the photos only ever had copies at the apartment (/data live + /mnt/backup backup). If both of those are gone, the photos are gone (apart from whatever individual images still exist on the iPhone's camera roll, since iOS auto-backup keeps captures local for a while). Closing this gap needs a >1 TB off-site drive — see the future-improvements note in 05-backups.md.


Recovery README on the backup drive

A copy of recovery procedures lives at /mnt/backup/RECOVERY-README.md so it's accessible even when the Z2 itself is unavailable — it covers the package-reinstall list (now including sqlite3), the /etc config extract, the service-config/ + db-dumps/ + Immich restore, and the Caddy docker compose build && up -d step. Update it whenever recovery procedures change here.