Skip to content

17 — Caddy (HTTPS Reverse Proxy)

The single HTTPS ingress for the Z2. One Caddy container fronts every web service on a clean <service>.z2mini.gabrielgabrie.com hostname with an auto-renewing Let's Encrypt certificate. It replaced the ad-hoc per-service tailscale serve listeners that Radicale (:443) and Vaultwarden (:8443) used.

Companion to 03-tailscale.md (transport — Caddy still listens on the tailnet interface only) and the per-service pages: 11-immich.md, 12-navidrome.md, 13-homepage.md, 14-beszel.md, 15-radicale.md, 16-vaultwarden.md — they all sit behind it now.


Overview

Single-container Docker stack, locally built (stock Caddy + the caddy-dns/cloudflare plugin):

Container Image Purpose
caddy caddy-cloudflare:2.11.2 (built locally from caddy:2.11.2-builder + xcaddy build --with github.com/caddy-dns/cloudflare) TLS termination + reverse proxy for all web services; ACME client (DNS-01 via Cloudflare)

Runs network_mode: host, with listeners pinned to 100.67.235.68 (the Z2's Tailscale IP) — port 443 for HTTPS, port 80 for the HTTP→HTTPS redirect. Not exposed to the public internet (the box has no public inbound — it's behind student-housing NAT, reachable only over the tailnet).

Routes

Hostname Proxies to Service
https://immich.z2mini.gabrielgabrie.com 127.0.0.1:2283 Immich
https://navidrome.z2mini.gabrielgabrie.com 127.0.0.1:4533 Navidrome
https://home.z2mini.gabrielgabrie.com 127.0.0.1:3000 Homepage
https://beszel.z2mini.gabrielgabrie.com 127.0.0.1:8090 Beszel
https://radicale.z2mini.gabrielgabrie.com 127.0.0.1:5232 Radicale (CalDAV/CardDAV)
https://vault.z2mini.gabrielgabrie.com 127.0.0.1:8080 Vaultwarden

Every backend app listens on 127.0.0.1:<port> only — they are not reachable on the tailnet directly. Caddy (host networking, so it can see the host's loopback) is the only door in. Tailscale already encrypts the wire, so plain HTTP from Caddy → 127.0.0.1 is fine; the TLS is for the clients (browsers, iOS Calendar, Bitwarden apps, Immich app, Subsonic clients).

*.z2mini.gabrielgabrie.com resolves (via Cloudflare) to 100.67.235.68, which is only routable from devices on the tailnet — so the friendly URLs work from tailnet devices and resolve-but-don't-connect from anywhere else. Exactly the same security posture as before; the names just got nicer.


Design decisions

Decision Reasoning
Caddy over Traefik / nginx / NPM Smallest config (a Caddyfile block per service is ~3 lines), automatic HTTPS with renewal built in, single static binary, matches the lightweight pattern of everything else here. Traefik's label-based auto-discovery is powerful but overkill for ~6 services and adds YAML cruft to every stack; nginx/NPM need manual cert plumbing.
Single reverse proxy replacing per-service tailscale serve The tailscale serve approach worked but didn't scale — each TLS-requiring service needed its own port (:443 Radicale, :8443 Vaultwarden, :8444 would be next…). One Caddy on :443 with subdomains is uniform and adds no per-port juggling.
Custom image with the caddy-dns/cloudflare plugin The box has no public inbound, so the ACME HTTP-01 and TLS-ALPN-01 challenges (which need port 80/443 reachable from Let's Encrypt) are not usable. DNS-01 is — it proves domain control by creating a TXT record, which Caddy does via the Cloudflare API. That needs the provider plugin compiled in (Caddy is statically linked; modules can't be loaded at runtime). Standard pattern — see the upstream Docker docs.
DNS moved to Cloudflare (whole gabrielgabrie.com zone) Needed an API-controllable public DNS zone for the DNS-01 challenge; Cloudflare's API + the caddy-dns/cloudflare plugin is the most battle-tested combo, and it ended the prior finickiness with Hostinger's subdomain handling. Hostinger remains the registrar — only DNS resolution moved (nameservers point at Cloudflare).
network_mode: host for Caddy Lets Caddy reach the apps on the host's 127.0.0.1:<port>, which is what makes the "apps bind localhost only, Caddy is the one door" posture possible. The trade-offs: Caddy binds ports directly on the host (mitigated by default_bind 100.67.235.68 in the Caddyfile, so it doesn't also listen on the LAN), and its admin endpoint (127.0.0.1:2019, unauthenticated by default) ends up on the host's loopback — reachable only from the box itself, used for the healthcheck and caddy reload.
All app containers re-bound to 127.0.0.1:<port> only Before Caddy, apps bound 100.67.235.68:<port> so tailnet devices could hit them directly. With Caddy as the single ingress, that direct path is just extra surface — re-binding to 127.0.0.1 closes it. Side benefits: on-server scripts can use localhost:<port> again, and the 100.67.235.68 literal is gone from the app compose files.
Listeners pinned to 100.67.235.68 (not 0.0.0.0) via default_bind Same "tailnet interface only, never the LAN, never public" posture as everything else. Without it, host-networked Caddy would bind 0.0.0.0.
tailscale set --operator=gabriel One-time setup so tailscale serve / tailscale set / etc. don't need sudo (the gabriel user is already in the sudo group; this just removes the prompt for Tailscale ops). Done as part of the cutover when tailscale serve reset was needed.
Tailscale global resolvers set to 1.1.1.1 + 8.8.8.8, "Override local DNS" on Tailnet devices were forwarding non-MagicDNS queries to whatever local network they were on — and the student-housing router's DNS chokes on *.z2mini.gabrielgabrie.com (stale cache and/or it strips 100.64.0.0/10 CGNAT answers as "DNS rebinding"). Pointing all tailnet devices at public resolvers fixes resolution everywhere and is a better config regardless.
Image pinning via the CADDY_VERSION build arg (Dockerfile + compose), never :latest Updates are deliberate (docker compose build && docker compose up -d), same posture as every other stack.
restart: unless-stopped (not always) Survives reboots, but docker compose stop actually stops.
log per site, no-new-privileges, resource limits, healthcheck Access logs to stdout for debugging; the usual hardening minus a non-root user (the official Caddy image runs as root by default, and switching would mean chowning the certs dir — marginal benefit for a well-maintained official image, same call as Immich's containers).

What was considered and rejected

  • Traefik — Docker-native, auto-discovers containers via labels. Powerful, but it'd put traefik.* labels on every stack's compose file (which are kept close to upstream-verbatim) and it's a steeper config surface than a Caddyfile for 6 static routes.
  • nginx / Nginx Proxy Manager — the classic; NPM adds a GUI. Heavier, and cert acquisition/renewal is manual plumbing (certbot, cron) vs. Caddy doing it natively.
  • Keeping tailscale serve and adding more ports:443, :8443, :8444, … — works but doesn't scale and there's no scheme for "which port is which service."
  • tailscale serve with path prefixes on :443 (…ts.net/immich, /vault) — would avoid a custom domain, but Vaultwarden's subpath support is finicky and Radicale's CalDAV principal discovery wants /. Subdomains are robust.
  • Renaming the tailnet to something prettier than elk-kanyu.ts.net — low payoff (still *.ts.net), breaks every issued cert and MagicDNS name; not worth it. The custom domain via Caddy is the better answer.
  • Caddy bridge-networked with apps on 100.67.235.68:<port> — was the state immediately after the first cutover; works but leaves the apps directly reachable on the tailnet (extra surface) and keeps the 100.67.235.68 literal in the app compose files. The network_mode: host + 127.0.0.1 re-bind is the cleaner end state.
  • Caddy fetching the *.ts.net cert from tailscaled (tls { get_certificate tailscale }, via the caddy-tailscale plugin) — only works for the tailnet hostname, not a custom domain; the custom domain needs DNS-01 anyway.
  • Cloudflare proxy/CDN ("orange cloud") on the records — left as DNS-only (grey cloud) on everything. Cloudflare's value here is robust DNS + an API for DNS-01; the proxy/CDN/WAF layer is a separate opt-in (and can't proxy the 100.64.0.0/10 Tailscale IP anyway). The docs./prices./etc. records also stay DNS-only so they behave exactly as they did on Hostinger.

Install

Directory layout

/data/docker/caddy/
├── Dockerfile            ← stock caddy + caddy-dns/cloudflare plugin (multi-stage xcaddy build)
├── docker-compose.yml    ← stack definition; network_mode: host; build: + image: caddy-cloudflare:2.11.2
├── .env                  ← mode 600 — CF_API_TOKEN (Cloudflare API token, Zone.DNS:Edit on the one zone)
├── Caddyfile             ← global options (email, acme_dns cloudflare, default_bind) + one block per site
├── data/                 ← Caddy's persistent data — issued certs + the ACME account key. Auto-renewing,
│                            so even total loss here is recoverable (Caddy just re-issues).
└── config/               ← Caddy's autosave.json (the last-loaded config; written automatically)

Dockerfile

# Caddy + the caddy-dns/cloudflare plugin (needed for ACME DNS-01 — no public inbound here).
# Build:  docker compose build      (or: docker compose up -d --build)
ARG CADDY_VERSION=2.11.2

FROM caddy:${CADDY_VERSION}-builder AS builder
RUN xcaddy build --with github.com/caddy-dns/cloudflare

FROM caddy:${CADDY_VERSION}
COPY --from=builder /usr/bin/caddy /usr/bin/caddy

docker-compose.yml

name: caddy

services:
  caddy:
    container_name: caddy
    build:
      context: .
      args:
        CADDY_VERSION: "2.11.2"
    image: caddy-cloudflare:2.11.2      # locally-built; pinned
    restart: unless-stopped
    network_mode: host                   # so Caddy can reach the apps on 127.0.0.1:<port>
    env_file: .env                       # supplies CF_API_TOKEN -> {env.CF_API_TOKEN}
    environment:
      TZ: America/Toronto
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          memory: 256M
          pids: 100
    healthcheck:
      test: ["CMD-SHELL", "wget -qO /dev/null http://127.0.0.1:2019/config/ || exit 1"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 15s
    volumes:
      - "/data/docker/caddy/Caddyfile:/etc/caddy/Caddyfile:ro"
      - "/data/docker/caddy/data:/data"
      - "/data/docker/caddy/config:/config"

.env

# Cloudflare API token — "Edit zone DNS" template, scoped to the gabrielgabrie.com zone only
# (Zone.DNS:Edit + Zone.Zone:Read). Used by Caddy for the ACME DNS-01 challenge. Mode 600.
# Verify it's valid:
#   curl -s https://api.cloudflare.com/client/v4/user/tokens/verify -H "Authorization: Bearer $CF_API_TOKEN"
# Rotate it:  Cloudflare -> My Profile -> API Tokens -> Roll, then update this file + docker compose up -d.
CF_API_TOKEN=<the token>

Caddyfile

{
    email gabrielgabrie99@gmail.com
    acme_dns cloudflare {env.CF_API_TOKEN}
    default_bind 100.67.235.68     # host networking -> pin every listener to the Tailscale IP
}

immich.z2mini.gabrielgabrie.com {
    log
    reverse_proxy 127.0.0.1:2283
}
navidrome.z2mini.gabrielgabrie.com {
    log
    reverse_proxy 127.0.0.1:4533
}
home.z2mini.gabrielgabrie.com {
    log
    reverse_proxy 127.0.0.1:3000
}
beszel.z2mini.gabrielgabrie.com {
    log
    reverse_proxy 127.0.0.1:8090
}
radicale.z2mini.gabrielgabrie.com {
    log
    reverse_proxy 127.0.0.1:5232
}
vault.z2mini.gabrielgabrie.com {
    log
    reverse_proxy 127.0.0.1:8080
}

Caddy's reverse_proxy preserves the original Host: header by default and upgrades WebSocket connections automatically — so Immich's live updates, Vaultwarden's WebSocket sync, and Beszel's live metrics all work without extra config. It also sets X-Forwarded-For / X-Forwarded-Proto / X-Forwarded-Host.

Cloudflare / DNS setup (one-time)

  1. Move the gabrielgabrie.com zone to Cloudflare. Add the site in Cloudflare (free plan), let it scan the existing records, verify every record came over (especially the docs/demo/minimal/prices/circuits A + per-subdomain AAAA records — Hostinger's "subdomains" panel records don't show in its zone-file export, so Cloudflare's scan and a direct dig @ns1.dns-parking.com <name> are the real source of truth — see 10-system-reference.md for the full record list), set everything to DNS only (grey cloud), then point the domain's nameservers (at Hostinger, the registrar) to Cloudflare's two.
  2. Add *.z2mini A → 100.67.235.68, DNS only. Covers every <svc>.z2mini.gabrielgabrie.com — no per-service record needed.
  3. Create the API token — "Edit zone DNS" template, scoped to the gabrielgabrie.com zone — and put it in /data/docker/caddy/.env as CF_API_TOKEN (mode 600).
  4. Tailscale → DNS → Global nameservers → add 1.1.1.1 + 8.8.8.8, "Override local DNS" ON. So every tailnet device resolves *.z2mini.gabrielgabrie.com via a reliable resolver (the student-housing router's DNS chokes on these names — see the Troubleshooting section).

First boot

mkdir -p /data/docker/caddy/{data,config}
cd /data/docker/caddy
# write Dockerfile, docker-compose.yml, .env (chmod 600), Caddyfile
docker compose build              # ~1-2 min: pulls caddy:*-builder, xcaddy compiles with the plugin
docker compose config --quiet
docker compose up -d
docker compose ps                 # expect: Up X seconds (healthy)
docker compose logs --tail 30 caddy   # expect: "certificate obtained successfully" for each host, issuer = Let's Encrypt
# confirm the cloudflare module is in:
docker compose exec caddy caddy list-modules | grep cloudflare    # -> dns.providers.cloudflare

Adding a new service behind Caddy

  1. Bind the new container to 127.0.0.1:<port> in its compose (ports: ["127.0.0.1:<port>:<port>"]) — not 0.0.0.0, not the tailnet IP.
  2. Add a Caddyfile block:
    <name>.z2mini.gabrielgabrie.com {
     log
     reverse_proxy 127.0.0.1:<port>
    }
    
  3. Reload Caddydocker compose -f /data/docker/caddy/docker-compose.yml exec caddy caddy reload --config /etc/caddy/Caddyfile (graceful, no dropped connections) or docker compose up -d caddy. Caddy obtains the cert via DNS-01 within ~30-60 s.
  4. The *.z2mini wildcard A record already resolves <name>.z2mini.gabrielgabrie.com — no Cloudflare change needed.
  5. Add a Homepage tile (13-homepage.md) with href: https://<name>.z2mini.gabrielgabrie.com; if it has a Homepage widget, point widget.url at the Caddy hostname too (the Homepage container can't reach 127.0.0.1:<port> of the host — it goes through Caddy like everything else).

This is now step 1 of the "bring up a new service" checklist — see z2mini-context-for-ai.md.


Operations

Start / stop / reload / pull updates

cd /data/docker/caddy
docker compose up -d                        # start (or recreate after a compose / .env change)
docker compose exec caddy caddy reload --config /etc/caddy/Caddyfile   # reload after a Caddyfile-only edit (graceful)
docker compose stop                         # stop
docker compose down                         # stop + remove (data preserved)

# Caddy version bump: edit CADDY_VERSION in BOTH Dockerfile and docker-compose.yml, then:
docker compose build
docker compose up -d
docker compose logs --tail 20 caddy

Logs

docker compose logs -f --tail 50 caddy

Access logs (one JSON line per request, from the log directive) and ACME events both land here. Useful for diagnosing 502s (which backend, what error) and cert issuance.

Cert status

Certs auto-renew ~⅔ through each 90-day cycle via DNS-01 — zero maintenance. To check what's stored:

docker exec caddy ls -R /data/caddy/certificates/   # one dir per hostname, each with a .crt and .key
curl -fsS https://api.cloudflare.com/client/v4/user/tokens/verify -H "Authorization: Bearer $(grep CF_API_TOKEN /data/docker/caddy/.env | cut -d= -f2)"   # token still active?

If renewal ever fails it's almost always the Cloudflare token (rolled, expired, or wrong scope) — see Troubleshooting.

Connecting from on the server itself

Because every app now binds 127.0.0.1:<port>, on-server scripts can use http://127.0.0.1:<port> (or http://localhost:<port>) directly — the old "must use the tailnet IP" gotcha is gone. https://<svc>.z2mini.gabrielgabrie.com also works from the box (resolves to 100.67.235.68, reaches Caddy). What no longer works: http://z2mini:<port> and http://100.67.235.68:<port> — those bindings were removed.


Backup considerations

Caddy is in the nightly backup (since May 2026 — see 05-backups.md). ~/scripts/backup-files.sh rsyncs the config into /mnt/backup/current/service-config/caddy/:

  • Dockerfile, docker-compose.yml, Caddyfile — tiny, plain text, rsync-safe. The whole point of the stack.
  • .envthe load-bearing secret (the Cloudflare API token). Mode 600. Without it Caddy can't renew certs. Backed up with the rest.

What's not backed up:

  • config/caddy/ — Caddy's autosave.json (the last-loaded config). It's container-owned, mode 700 (the backup user can't read it), and rebuildable from the Caddyfile anyway. Skipped.
  • data/caddy/ — the issued certs + the ACME account key. Same story: container-owned, mode 700, unreadable by the backup user — and not load-bearing: certs auto-renew, and Caddy simply re-issues everything on a fresh data/ dir given a valid .env token (it re-runs the Cloudflare DNS-01 challenge for each host). So losing it just means a one-time re-issuance on restore (mind Let's Encrypt's 50-certs/week rate limit if you're recreating in a loop, but for a normal rebuild it's a non-issue).

Improvement over the old tailscale serve setup: that config lived in tailscaled's state (/var/lib/tailscale/), not under /data/docker/, so it had to be manually re-run after a rebuild. Caddy's actual config (Dockerfile + compose + Caddyfile + .env) is under /data/docker/caddy/ and is in the nightly backup like every other stack.

The off-site T5 also gets service-config/caddy/ (the whole service-config/ tree goes off-site via backup-offsite.sh).

Restore (per RECOVERY-README.md / 08-recovery.md → Step 6b): restore /data/docker/caddy/ from service-config/caddy/ (or recreate the files + the .env token), docker compose build && docker compose up -d — Caddy re-issues every cert on first boot. Also re-run tailscale set --operator=gabriel after Tailscale re-auth. The Cloudflare zone + the *.z2mini record + the Tailscale global resolvers are all server-independent (they live in Cloudflare / the Tailscale admin) — nothing to redo there.


Troubleshooting

A tailnet device can't reach https://<svc>.z2mini.gabrielgabrie.com — "site can't be reached" / DNS doesn't resolve:

  • It's almost always DNS, specifically the local network's resolver choking on *.z2mini.gabrielgabrie.com (stale negative cache from before the Cloudflare migration, or — common on consumer/landlord routers — "DNS rebinding protection" that drops answers in the 100.64.0.0/10 CGNAT range Tailscale uses). Confirm: nslookup <svc>.z2mini.gabrielgabrie.com 1.1.1.1 → should return 100.67.235.68; nslookup <svc>.z2mini.gabrielgabrie.com (default resolver) → fails.
  • Fix: make sure Tailscale → DNS → Global nameservers has 1.1.1.1 + 8.8.8.8 with "Override local DNS" ON, then on the device re-sync Tailscale's DNS (tailscale set --accept-dns=false; tailscale set --accept-dns=true, or reconnect Tailscale) and flush the OS DNS cache (ipconfig /flushdns on Windows).

A site returns 502 Bad Gateway:

  • The backend isn't reachable. Check the app is up (docker compose -f /data/docker/<app>/docker-compose.yml ps) and that it's bound to 127.0.0.1:<port> (ss -tln | grep <port> — should show 127.0.0.1:<port>, not 0.0.0.0 or the tailnet IP and not missing). Caddy's log will name the upstream and the dial error.

Cert won't issue / renew (could not get certificate, ACME errors in the log):

  • The Cloudflare token: rolled without updating .env, expired (if you set an expiry on it — the "Edit zone DNS" template default is no expiry), or wrong scope. Verify: curl -fsS https://api.cloudflare.com/client/v4/user/tokens/verify -H "Authorization: Bearer <token>""status":"active". After fixing .env: docker compose up -d caddy.
  • Less commonly: Let's Encrypt rate-limited (50 certs/week per registered domain) — only an issue if you've been recreating certs in a loop.

docker compose ps shows (unhealthy):

  • The healthcheck wgets http://127.0.0.1:2019/config/ (Caddy's admin endpoint, on the host's loopback because of host networking). If Caddy launched but the admin endpoint is off (admin off in the Caddyfile) or moved, the check fails while the service may still be fine. Check docker compose logs caddy for serving initial configuration.

Caddy is also answering on the LAN, not just the tailnet:

  • Confirm default_bind 100.67.235.68 is in the Caddyfile's global block. With network_mode: host and no default_bind, Caddy binds 0.0.0.0. ss -tln | grep ':443' should show 100.67.235.68:443, not 0.0.0.0:443.

Want to run tailscale serve / tailscale set and getting "Access denied":

  • The operator should be set: sudo tailscale set --operator=gabriel (one-time). After that those commands work as gabriel without sudo. (For this setup tailscale serve is retired — Caddy owns :443 — but the operator setting is still convenient.)