MakerFLOSS_Troubleshooting/runbooks/publishing-services-mf01.md

76 lines
3.7 KiB
Markdown
Raw Permalink Normal View History

# Runbook — publishing a service on mf01
mf01 publishes HTTP services as `https://<svc>.mf01.makerfloss.eu`. TLS
terminates on the **makerfloss VPS**; the VPS proxies `*.mf01` over `wg1` to
mf01's internal Traefik (`10.13.0.8:80`), which routes by Host label to the
container.
- Design: `AnsibleBaobabV4/docs/superpowers/specs/2026-06-09-mf01-service-publishing-design.md`
- Plan: `AnsibleBaobabV4/docs/superpowers/plans/2026-06-09-mf01-service-publishing.md`
- Built and verified live 2026-06-09 (first service: `whoami.mf01.makerfloss.eu`).
## How a request flows
```
browser ─https─▶ <svc>.mf01.makerfloss.eu (DNS *.mf01 A → 88.99.32.236)
─▶ VPS Traefik :443 (wildcard cert *.mf01.makerfloss.eu, TLS ends here)
─▶ plain HTTP over wg1 ─▶ mf01 Traefik 10.13.0.8:80
─▶ container (routed by Host label)
```
The VPS side is configured **once** (in `host_vars/makerfloss.yml`):
`traefik_wildcard_sets` (anchors the `*.mf01` DNS-01 cert), a
`traefik_extra_dynamic_files` catch-all router → `http://10.13.0.8:80`, and the
`mf01` / `*.mf01` A records. **You never touch the VPS to add a service.**
mf01 is configured once (in `host_vars/mf01.yml`): internal Traefik
(`traefik_acme_enabled: false`, `traefik_bind_address: 10.13.0.8`),
`traefik_zone: mf01.makerfloss.eu`, and host-wide
`container_traefik_overrides: {entrypoints: [web], tls: false, certresolver: ""}`.
## Add a new service
1. Enable a container role on mf01 in `host_vars/mf01.yml`, e.g.:
```yaml
container_<svc>_enabled: true
```
Its router auto-publishes as `<container_name>.mf01.makerfloss.eu` because
`traefik_zone` is `mf01.makerfloss.eu` and the host-wide
`container_traefik_overrides` force the plain `web` entrypoint with no TLS.
2. Deploy (note: the tag is the role's short name, e.g. `whoami`, not
`container-whoami`):
```bash
cd ~/Projects/AnsibleBaobabV4
.venv/bin/ansible-playbook play_containers.yml --limit mf01 -t <svc>
```
3. Live at `https://<container_name>.mf01.makerfloss.eu` — no DNS, cert, or VPS
change (covered by the wildcard cert + catch-all route). Verify:
```bash
curl -s https://<container_name>.mf01.makerfloss.eu
```
To use a hostname other than the container's default name, set that service's
`hostnames` via its own `container_traefik_overrides` (do **not** put hostnames
in the host-wide dict — that would force every service to the same name).
## Reach / management
- Ansible reaches mf01 at its stable wg IP `10.13.0.8` via ProxyJump through the
VPS (`ansible_host: 10.13.0.8`, `ProxyJump="sjat@makerfloss.eu:7576"`).
- Shell: `ssh -J sjat@makerfloss.eu:7576 -p 7576 sjat@10.13.0.8`.
## Troubleshooting
| Symptom | Check |
|---|---|
| `https://<svc>.mf01...` → 404, valid cert | Request reached mf01 Traefik but no router matches. Confirm the container is up (`docker ps` on mf01) and has the `traefik.http.routers.<svc>.rule=Host(...)` label. |
| `tls_verify != 0` / cert error | VPS Traefik isn't serving the `*.mf01` cert. Check `docker logs traefik` on the VPS for ACME (`acme: ... mf01`), and that `/srv/traefik/config/dynamic/mf01-delegate.yml` **parses** (a YAML error there breaks the whole file provider — keep regex rules single-quoted). |
| 502/504 from the VPS | wg1 tunnel down or mf01 Traefik down. Check `wg show wg1` on the VPS (mf01 = `10.13.0.8`) and the traefik container on mf01. |
| Service restricted access wanted | The `internal-only@file` middleware lives on the **VPS** Traefik; a restricted service must be routed VPS-side or use an mf01-side allowlist middleware (future work). |
## Known limitation
`container_traefik_overrides` is host-wide on mf01 (correct for entrypoint/tls,
which are constant). Per-service **hostnames** must come from each service's own
role override, not the host-wide dict.