17 — Caddy (HTTPS Reverse Proxy)¶
The single HTTPS ingress for the Z2. One Caddy container fronts every web service on a clean <service>.z2mini.gabrielgabrie.com hostname with an auto-renewing Let's Encrypt certificate. It replaced the ad-hoc per-service tailscale serve listeners that Radicale (:443) and Vaultwarden (:8443) used.
Companion to 03-tailscale.md (transport — Caddy still listens on the tailnet interface only) and the per-service pages: 11-immich.md, 12-navidrome.md, 13-homepage.md, 14-beszel.md, 15-radicale.md, 16-vaultwarden.md — they all sit behind it now.
Overview¶
Single-container Docker stack, locally built (stock Caddy + the caddy-dns/cloudflare plugin):
| Container | Image | Purpose |
|---|---|---|
caddy |
caddy-cloudflare:2.11.2 (built locally from caddy:2.11.2-builder + xcaddy build --with github.com/caddy-dns/cloudflare) |
TLS termination + reverse proxy for all web services; ACME client (DNS-01 via Cloudflare) |
Runs network_mode: host, with listeners pinned to 100.67.235.68 (the Z2's Tailscale IP) — port 443 for HTTPS, port 80 for the HTTP→HTTPS redirect. Not exposed to the public internet (the box has no public inbound — it's behind student-housing NAT, reachable only over the tailnet).
Routes¶
| Hostname | Proxies to | Service |
|---|---|---|
https://immich.z2mini.gabrielgabrie.com |
127.0.0.1:2283 |
Immich |
https://navidrome.z2mini.gabrielgabrie.com |
127.0.0.1:4533 |
Navidrome |
https://home.z2mini.gabrielgabrie.com |
127.0.0.1:3000 |
Homepage |
https://beszel.z2mini.gabrielgabrie.com |
127.0.0.1:8090 |
Beszel |
https://radicale.z2mini.gabrielgabrie.com |
127.0.0.1:5232 |
Radicale (CalDAV/CardDAV) |
https://vault.z2mini.gabrielgabrie.com |
127.0.0.1:8080 |
Vaultwarden |
Every backend app listens on 127.0.0.1:<port> only — they are not reachable on the tailnet directly. Caddy (host networking, so it can see the host's loopback) is the only door in. Tailscale already encrypts the wire, so plain HTTP from Caddy → 127.0.0.1 is fine; the TLS is for the clients (browsers, iOS Calendar, Bitwarden apps, Immich app, Subsonic clients).
*.z2mini.gabrielgabrie.com resolves (via Cloudflare) to 100.67.235.68, which is only routable from devices on the tailnet — so the friendly URLs work from tailnet devices and resolve-but-don't-connect from anywhere else. Exactly the same security posture as before; the names just got nicer.
Design decisions¶
| Decision | Reasoning |
|---|---|
| Caddy over Traefik / nginx / NPM | Smallest config (a Caddyfile block per service is ~3 lines), automatic HTTPS with renewal built in, single static binary, matches the lightweight pattern of everything else here. Traefik's label-based auto-discovery is powerful but overkill for ~6 services and adds YAML cruft to every stack; nginx/NPM need manual cert plumbing. |
Single reverse proxy replacing per-service tailscale serve |
The tailscale serve approach worked but didn't scale — each TLS-requiring service needed its own port (:443 Radicale, :8443 Vaultwarden, :8444 would be next…). One Caddy on :443 with subdomains is uniform and adds no per-port juggling. |
Custom image with the caddy-dns/cloudflare plugin |
The box has no public inbound, so the ACME HTTP-01 and TLS-ALPN-01 challenges (which need port 80/443 reachable from Let's Encrypt) are not usable. DNS-01 is — it proves domain control by creating a TXT record, which Caddy does via the Cloudflare API. That needs the provider plugin compiled in (Caddy is statically linked; modules can't be loaded at runtime). Standard pattern — see the upstream Docker docs. |
DNS moved to Cloudflare (whole gabrielgabrie.com zone) |
Needed an API-controllable public DNS zone for the DNS-01 challenge; Cloudflare's API + the caddy-dns/cloudflare plugin is the most battle-tested combo, and it ended the prior finickiness with Hostinger's subdomain handling. Hostinger remains the registrar — only DNS resolution moved (nameservers point at Cloudflare). |
network_mode: host for Caddy |
Lets Caddy reach the apps on the host's 127.0.0.1:<port>, which is what makes the "apps bind localhost only, Caddy is the one door" posture possible. The trade-offs: Caddy binds ports directly on the host (mitigated by default_bind 100.67.235.68 in the Caddyfile, so it doesn't also listen on the LAN), and its admin endpoint (127.0.0.1:2019, unauthenticated by default) ends up on the host's loopback — reachable only from the box itself, used for the healthcheck and caddy reload. |
All app containers re-bound to 127.0.0.1:<port> only |
Before Caddy, apps bound 100.67.235.68:<port> so tailnet devices could hit them directly. With Caddy as the single ingress, that direct path is just extra surface — re-binding to 127.0.0.1 closes it. Side benefits: on-server scripts can use localhost:<port> again, and the 100.67.235.68 literal is gone from the app compose files. |
Listeners pinned to 100.67.235.68 (not 0.0.0.0) via default_bind |
Same "tailnet interface only, never the LAN, never public" posture as everything else. Without it, host-networked Caddy would bind 0.0.0.0. |
tailscale set --operator=gabriel |
One-time setup so tailscale serve / tailscale set / etc. don't need sudo (the gabriel user is already in the sudo group; this just removes the prompt for Tailscale ops). Done as part of the cutover when tailscale serve reset was needed. |
Tailscale global resolvers set to 1.1.1.1 + 8.8.8.8, "Override local DNS" on |
Tailnet devices were forwarding non-MagicDNS queries to whatever local network they were on — and the student-housing router's DNS chokes on *.z2mini.gabrielgabrie.com (stale cache and/or it strips 100.64.0.0/10 CGNAT answers as "DNS rebinding"). Pointing all tailnet devices at public resolvers fixes resolution everywhere and is a better config regardless. |
Image pinning via the CADDY_VERSION build arg (Dockerfile + compose), never :latest |
Updates are deliberate (docker compose build && docker compose up -d), same posture as every other stack. |
restart: unless-stopped (not always) |
Survives reboots, but docker compose stop actually stops. |
log per site, no-new-privileges, resource limits, healthcheck |
Access logs to stdout for debugging; the usual hardening minus a non-root user (the official Caddy image runs as root by default, and switching would mean chowning the certs dir — marginal benefit for a well-maintained official image, same call as Immich's containers). |
What was considered and rejected¶
- Traefik — Docker-native, auto-discovers containers via labels. Powerful, but it'd put
traefik.*labels on every stack's compose file (which are kept close to upstream-verbatim) and it's a steeper config surface than a Caddyfile for 6 static routes. - nginx / Nginx Proxy Manager — the classic; NPM adds a GUI. Heavier, and cert acquisition/renewal is manual plumbing (certbot, cron) vs. Caddy doing it natively.
- Keeping
tailscale serveand adding more ports —:443,:8443,:8444, … — works but doesn't scale and there's no scheme for "which port is which service." tailscale servewith path prefixes on:443(…ts.net/immich,/vault) — would avoid a custom domain, but Vaultwarden's subpath support is finicky and Radicale's CalDAV principal discovery wants/. Subdomains are robust.- Renaming the tailnet to something prettier than
elk-kanyu.ts.net— low payoff (still*.ts.net), breaks every issued cert and MagicDNS name; not worth it. The custom domain via Caddy is the better answer. - Caddy bridge-networked with apps on
100.67.235.68:<port>— was the state immediately after the first cutover; works but leaves the apps directly reachable on the tailnet (extra surface) and keeps the100.67.235.68literal in the app compose files. Thenetwork_mode: host+127.0.0.1re-bind is the cleaner end state. - Caddy fetching the
*.ts.netcert fromtailscaled(tls { get_certificate tailscale }, via thecaddy-tailscaleplugin) — only works for the tailnet hostname, not a custom domain; the custom domain needs DNS-01 anyway. - Cloudflare proxy/CDN ("orange cloud") on the records — left as DNS-only (grey cloud) on everything. Cloudflare's value here is robust DNS + an API for DNS-01; the proxy/CDN/WAF layer is a separate opt-in (and can't proxy the
100.64.0.0/10Tailscale IP anyway). Thedocs./prices./etc. records also stay DNS-only so they behave exactly as they did on Hostinger.
Install¶
Directory layout¶
/data/docker/caddy/
├── Dockerfile ← stock caddy + caddy-dns/cloudflare plugin (multi-stage xcaddy build)
├── docker-compose.yml ← stack definition; network_mode: host; build: + image: caddy-cloudflare:2.11.2
├── .env ← mode 600 — CF_API_TOKEN (Cloudflare API token, Zone.DNS:Edit on the one zone)
├── Caddyfile ← global options (email, acme_dns cloudflare, default_bind) + one block per site
├── data/ ← Caddy's persistent data — issued certs + the ACME account key. Auto-renewing,
│ so even total loss here is recoverable (Caddy just re-issues).
└── config/ ← Caddy's autosave.json (the last-loaded config; written automatically)
Dockerfile¶
# Caddy + the caddy-dns/cloudflare plugin (needed for ACME DNS-01 — no public inbound here).
# Build: docker compose build (or: docker compose up -d --build)
ARG CADDY_VERSION=2.11.2
FROM caddy:${CADDY_VERSION}-builder AS builder
RUN xcaddy build --with github.com/caddy-dns/cloudflare
FROM caddy:${CADDY_VERSION}
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
docker-compose.yml¶
name: caddy
services:
caddy:
container_name: caddy
build:
context: .
args:
CADDY_VERSION: "2.11.2"
image: caddy-cloudflare:2.11.2 # locally-built; pinned
restart: unless-stopped
network_mode: host # so Caddy can reach the apps on 127.0.0.1:<port>
env_file: .env # supplies CF_API_TOKEN -> {env.CF_API_TOKEN}
environment:
TZ: America/Toronto
security_opt:
- no-new-privileges:true
deploy:
resources:
limits:
memory: 256M
pids: 100
healthcheck:
test: ["CMD-SHELL", "wget -qO /dev/null http://127.0.0.1:2019/config/ || exit 1"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
volumes:
- "/data/docker/caddy/Caddyfile:/etc/caddy/Caddyfile:ro"
- "/data/docker/caddy/data:/data"
- "/data/docker/caddy/config:/config"
.env¶
# Cloudflare API token — "Edit zone DNS" template, scoped to the gabrielgabrie.com zone only
# (Zone.DNS:Edit + Zone.Zone:Read). Used by Caddy for the ACME DNS-01 challenge. Mode 600.
# Verify it's valid:
# curl -s https://api.cloudflare.com/client/v4/user/tokens/verify -H "Authorization: Bearer $CF_API_TOKEN"
# Rotate it: Cloudflare -> My Profile -> API Tokens -> Roll, then update this file + docker compose up -d.
CF_API_TOKEN=<the token>
Caddyfile¶
{
email gabrielgabrie99@gmail.com
acme_dns cloudflare {env.CF_API_TOKEN}
default_bind 100.67.235.68 # host networking -> pin every listener to the Tailscale IP
}
immich.z2mini.gabrielgabrie.com {
log
reverse_proxy 127.0.0.1:2283
}
navidrome.z2mini.gabrielgabrie.com {
log
reverse_proxy 127.0.0.1:4533
}
home.z2mini.gabrielgabrie.com {
log
reverse_proxy 127.0.0.1:3000
}
beszel.z2mini.gabrielgabrie.com {
log
reverse_proxy 127.0.0.1:8090
}
radicale.z2mini.gabrielgabrie.com {
log
reverse_proxy 127.0.0.1:5232
}
vault.z2mini.gabrielgabrie.com {
log
reverse_proxy 127.0.0.1:8080
}
Caddy's reverse_proxy preserves the original Host: header by default and upgrades WebSocket connections automatically — so Immich's live updates, Vaultwarden's WebSocket sync, and Beszel's live metrics all work without extra config. It also sets X-Forwarded-For / X-Forwarded-Proto / X-Forwarded-Host.
Cloudflare / DNS setup (one-time)¶
- Move the
gabrielgabrie.comzone to Cloudflare. Add the site in Cloudflare (free plan), let it scan the existing records, verify every record came over (especially thedocs/demo/minimal/prices/circuitsA + per-subdomain AAAA records — Hostinger's "subdomains" panel records don't show in its zone-file export, so Cloudflare's scan and a directdig @ns1.dns-parking.com <name>are the real source of truth — see 10-system-reference.md for the full record list), set everything to DNS only (grey cloud), then point the domain's nameservers (at Hostinger, the registrar) to Cloudflare's two. - Add
*.z2miniA →100.67.235.68, DNS only. Covers every<svc>.z2mini.gabrielgabrie.com— no per-service record needed. - Create the API token — "Edit zone DNS" template, scoped to the
gabrielgabrie.comzone — and put it in/data/docker/caddy/.envasCF_API_TOKEN(mode 600). - Tailscale → DNS → Global nameservers → add
1.1.1.1+8.8.8.8, "Override local DNS" ON. So every tailnet device resolves*.z2mini.gabrielgabrie.comvia a reliable resolver (the student-housing router's DNS chokes on these names — see the Troubleshooting section).
First boot¶
mkdir -p /data/docker/caddy/{data,config}
cd /data/docker/caddy
# write Dockerfile, docker-compose.yml, .env (chmod 600), Caddyfile
docker compose build # ~1-2 min: pulls caddy:*-builder, xcaddy compiles with the plugin
docker compose config --quiet
docker compose up -d
docker compose ps # expect: Up X seconds (healthy)
docker compose logs --tail 30 caddy # expect: "certificate obtained successfully" for each host, issuer = Let's Encrypt
# confirm the cloudflare module is in:
docker compose exec caddy caddy list-modules | grep cloudflare # -> dns.providers.cloudflare
Adding a new service behind Caddy¶
- Bind the new container to
127.0.0.1:<port>in its compose (ports: ["127.0.0.1:<port>:<port>"]) — not0.0.0.0, not the tailnet IP. - Add a Caddyfile block:
- Reload Caddy —
docker compose -f /data/docker/caddy/docker-compose.yml exec caddy caddy reload --config /etc/caddy/Caddyfile(graceful, no dropped connections) ordocker compose up -d caddy. Caddy obtains the cert via DNS-01 within ~30-60 s. - The
*.z2miniwildcard A record already resolves<name>.z2mini.gabrielgabrie.com— no Cloudflare change needed. - Add a Homepage tile (13-homepage.md) with
href: https://<name>.z2mini.gabrielgabrie.com; if it has a Homepage widget, pointwidget.urlat the Caddy hostname too (the Homepage container can't reach127.0.0.1:<port>of the host — it goes through Caddy like everything else).
This is now step 1 of the "bring up a new service" checklist — see z2mini-context-for-ai.md.
Operations¶
Start / stop / reload / pull updates¶
cd /data/docker/caddy
docker compose up -d # start (or recreate after a compose / .env change)
docker compose exec caddy caddy reload --config /etc/caddy/Caddyfile # reload after a Caddyfile-only edit (graceful)
docker compose stop # stop
docker compose down # stop + remove (data preserved)
# Caddy version bump: edit CADDY_VERSION in BOTH Dockerfile and docker-compose.yml, then:
docker compose build
docker compose up -d
docker compose logs --tail 20 caddy
Logs¶
Access logs (one JSON line per request, from the log directive) and ACME events both land here. Useful for diagnosing 502s (which backend, what error) and cert issuance.
Cert status¶
Certs auto-renew ~⅔ through each 90-day cycle via DNS-01 — zero maintenance. To check what's stored:
docker exec caddy ls -R /data/caddy/certificates/ # one dir per hostname, each with a .crt and .key
curl -fsS https://api.cloudflare.com/client/v4/user/tokens/verify -H "Authorization: Bearer $(grep CF_API_TOKEN /data/docker/caddy/.env | cut -d= -f2)" # token still active?
If renewal ever fails it's almost always the Cloudflare token (rolled, expired, or wrong scope) — see Troubleshooting.
Connecting from on the server itself¶
Because every app now binds 127.0.0.1:<port>, on-server scripts can use http://127.0.0.1:<port> (or http://localhost:<port>) directly — the old "must use the tailnet IP" gotcha is gone. https://<svc>.z2mini.gabrielgabrie.com also works from the box (resolves to 100.67.235.68, reaches Caddy). What no longer works: http://z2mini:<port> and http://100.67.235.68:<port> — those bindings were removed.
Backup considerations¶
Caddy is in the nightly backup (since May 2026 — see 05-backups.md). ~/scripts/backup-files.sh rsyncs the config into /mnt/backup/current/service-config/caddy/:
Dockerfile,docker-compose.yml,Caddyfile— tiny, plain text, rsync-safe. The whole point of the stack..env— the load-bearing secret (the Cloudflare API token). Mode 600. Without it Caddy can't renew certs. Backed up with the rest.
What's not backed up:
config/caddy/— Caddy'sautosave.json(the last-loaded config). It's container-owned, mode 700 (the backup user can't read it), and rebuildable from the Caddyfile anyway. Skipped.data/caddy/— the issued certs + the ACME account key. Same story: container-owned, mode 700, unreadable by the backup user — and not load-bearing: certs auto-renew, and Caddy simply re-issues everything on a freshdata/dir given a valid.envtoken (it re-runs the Cloudflare DNS-01 challenge for each host). So losing it just means a one-time re-issuance on restore (mind Let's Encrypt's 50-certs/week rate limit if you're recreating in a loop, but for a normal rebuild it's a non-issue).
Improvement over the old
tailscale servesetup: that config lived intailscaled's state (/var/lib/tailscale/), not under/data/docker/, so it had to be manually re-run after a rebuild. Caddy's actual config (Dockerfile + compose + Caddyfile +.env) is under/data/docker/caddy/and is in the nightly backup like every other stack.
The off-site T5 also gets service-config/caddy/ (the whole service-config/ tree goes off-site via backup-offsite.sh).
Restore (per RECOVERY-README.md / 08-recovery.md → Step 6b): restore /data/docker/caddy/ from service-config/caddy/ (or recreate the files + the .env token), docker compose build && docker compose up -d — Caddy re-issues every cert on first boot. Also re-run tailscale set --operator=gabriel after Tailscale re-auth. The Cloudflare zone + the *.z2mini record + the Tailscale global resolvers are all server-independent (they live in Cloudflare / the Tailscale admin) — nothing to redo there.
Troubleshooting¶
A tailnet device can't reach https://<svc>.z2mini.gabrielgabrie.com — "site can't be reached" / DNS doesn't resolve:
- It's almost always DNS, specifically the local network's resolver choking on
*.z2mini.gabrielgabrie.com(stale negative cache from before the Cloudflare migration, or — common on consumer/landlord routers — "DNS rebinding protection" that drops answers in the100.64.0.0/10CGNAT range Tailscale uses). Confirm:nslookup <svc>.z2mini.gabrielgabrie.com 1.1.1.1→ should return100.67.235.68;nslookup <svc>.z2mini.gabrielgabrie.com(default resolver) → fails. - Fix: make sure Tailscale → DNS → Global nameservers has
1.1.1.1+8.8.8.8with "Override local DNS" ON, then on the device re-sync Tailscale's DNS (tailscale set --accept-dns=false; tailscale set --accept-dns=true, or reconnect Tailscale) and flush the OS DNS cache (ipconfig /flushdnson Windows).
A site returns 502 Bad Gateway:
- The backend isn't reachable. Check the app is up (
docker compose -f /data/docker/<app>/docker-compose.yml ps) and that it's bound to127.0.0.1:<port>(ss -tln | grep <port>— should show127.0.0.1:<port>, not0.0.0.0or the tailnet IP and not missing). Caddy's log will name the upstream and the dial error.
Cert won't issue / renew (could not get certificate, ACME errors in the log):
- The Cloudflare token: rolled without updating
.env, expired (if you set an expiry on it — the "Edit zone DNS" template default is no expiry), or wrong scope. Verify:curl -fsS https://api.cloudflare.com/client/v4/user/tokens/verify -H "Authorization: Bearer <token>"→"status":"active". After fixing.env:docker compose up -d caddy. - Less commonly: Let's Encrypt rate-limited (50 certs/week per registered domain) — only an issue if you've been recreating certs in a loop.
docker compose ps shows (unhealthy):
- The healthcheck
wgetshttp://127.0.0.1:2019/config/(Caddy's admin endpoint, on the host's loopback because of host networking). If Caddy launched but the admin endpoint is off (admin offin the Caddyfile) or moved, the check fails while the service may still be fine. Checkdocker compose logs caddyforserving initial configuration.
Caddy is also answering on the LAN, not just the tailnet:
- Confirm
default_bind 100.67.235.68is in the Caddyfile's global block. Withnetwork_mode: hostand nodefault_bind, Caddy binds0.0.0.0.ss -tln | grep ':443'should show100.67.235.68:443, not0.0.0.0:443.
Want to run tailscale serve / tailscale set and getting "Access denied":
- The operator should be set:
sudo tailscale set --operator=gabriel(one-time). After that those commands work asgabrielwithout sudo. (For this setuptailscale serveis retired — Caddy owns:443— but the operator setting is still convenient.)