From 2796616d0534b85d16aac28ef4e106847f31370c Mon Sep 17 00:00:00 2001 From: sjat Date: Tue, 9 Jun 2026 13:04:35 +0200 Subject: [PATCH] docs: capture topology + operational learnings in CLAUDE.md/README Bring the everyday guides up to the live state (flat data VLAN 30 + isolated mgmt VLAN 99 on ether8, DHCP + web UI experiment) and record the gotchas that cost time: the bench tunnel (paramiko ignores ProxyJump), mamba NM-profile stickiness on cable flap, the RouterOS find-by-address quirk, and the commit-confirmed detached-flip pattern for lockout-prone changes. Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 42 +++++++++++++++++++++++++++++++++--------- README.md | 14 +++++++++----- 2 files changed, 42 insertions(+), 14 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 7b53225..20115f3 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -30,12 +30,35 @@ ansible-playbook play_switch.yml --tags vlans # one domain ansible-vault view group_vars/mikrotik.vault.yml # read a secret ``` +## Access (on-site / bench) + +The switch is reachable only via the makerspace laptop `mamba`. Ansible's `network_cli` +uses paramiko, which **ignores ProxyJump**, so port-forward instead of double-hopping: + +```bash +ssh -J kuku -p 7576 sjat@10.8.0.4 -L 2222:192.168.88.1:22 -N # tunnel to the switch +ansible-playbook play_switch.yml -e ansible_host=127.0.0.1 -e ansible_port=2222 -e ansible_user=sjat +ssh-keygen -R '[127.0.0.1]:2222' # if the tunnel host key changed +``` + +- `mamba` is the mgmt station on **switch port 8** (MGMT VLAN); it must be on port 8 to + reach `192.168.88.1`. From a data port it gets `10.2.30.x` and **cannot** reach mgmt. +- NM profiles on `mamba` `enp0s31f6`: `crs310-bench` (static `.2`) and `Wired connection 1` + (DHCP). Moving the cable flaps the link and NM re-selects a profile — pin the intended + one sticky (`autoconnect yes` + higher priority) and the other off, or it reverts. + ## Rules - **Idempotency:** RouterOS tasks use `community.routeros.command` with `:if [find]` guards. Run every device-touching play **twice**; the second run must report no changes. - **Lockout safety:** keep an independent recovery channel (serial/WinBox-MAC) when - touching mgmt/services/VLANs; enable `vlan-filtering` **last**. + touching mgmt/services/VLANs; enable `vlan-filtering` **last**. For lockout-prone + changes over the network (vlan-filtering, moving the mgmt IP), run them as a detached + self-reverting job — `:execute { …; :delay 240s; :if ($mgmtok=false) do={ revert } }`, + then `:global mgmtok true` once verified. (Auto-healed a hard lockout during the cutover.) +- **RouterOS `find ... address=` never matches** an ip/address or dhcp-network + value (returns 0 even on an exact string) — match by `[find interface=X]` or + `:foreach`+`/ip/address/get $a address`. Bit the mgmt-IP move (duplicated the IP). - **All real values go in `host_vars`;** the role holds only mechanism + placeholders. - **Secrets** go to the `makerfloss` vault, never plaintext. Encrypt with `ansible-vault encrypt --encrypt-vault-id makerfloss `. @@ -43,12 +66,13 @@ ansible-vault view group_vars/mikrotik.vault.yml # read a secret ## Status / next -Bootstrap is done (user `sjat` + key + identity `crs310-maker`, RouterOS 7.19.6 pinned; -default `admin` now disabled). All per-domain task files are **implemented**: -`identity`, `users`, `backup`, `firmware` (opt-in) and `play_bootstrap` / `play_backup` -are idempotency-verified against the device. `vlans` is implemented and Jinja-validated -but its **device run is deferred** — the `host_vars` topology is still a placeholder. +Live on the device (2026-06-09): flat L2 switch on `10.2.30.0/24` — **DATA VLAN 30** +(`ether1` copper uplink + `ether2-7` + SFP+), **isolated MGMT VLAN 99 on `ether8`** +(mgmt `192.168.88.1/24`, no gateway/NTP/DNS), `vlan-filtering` on. The mgmt port also +serves DHCP (`192.168.88.10-.254`) + the web UI as a makerspace experiment (flags +`switch_web_enabled`, `switch_mgmt_dhcp_enabled`). Default `admin` disabled; login as +`sjat` (key, or vaulted password). All task files + `play_bootstrap`/`play_backup` are +idempotency-verified. Design + cutover runbook: +`docs/superpowers/specs/2026-06-09-crs310-flat-mgmtvlan-design.md`. -Next, on-site with a recovery channel: drop the real VLAN/port map into `host_vars`, -reconcile the legacy defconf IP (`192.168.88.1/24` lives directly on `bridge`), then run -`--tags vlans` and confirm mgmt reachability before/after `vlan-filtering=yes`. +Next: SFP+ 10G uplink and real VLAN segmentation once connectors + a VLAN plan are ready. diff --git a/README.md b/README.md index 88ecd53..e24383a 100644 --- a/README.md +++ b/README.md @@ -13,12 +13,16 @@ rebuilt from this repo instead of by hand in WinBox. | Repo scaffolding, role skeleton, vault | ✅ done | | On-site device prep + **bootstrap** (named user + SSH key + identity) | ✅ done (2026-06-08) | | `identity` / `users` / `backup` / `firmware` + `play_bootstrap` / `play_backup` | ✅ implemented; idempotency-verified against the device (firmware is opt-in, lint/syntax only) | -| `vlans` (VLAN-aware bridge, ports, mgmt iface) | ✅ implemented + Jinja-validated; **device run deferred** — needs the real VLAN/port plan and an on-site recovery channel before `vlan-filtering` is enabled | +| `vlans` (VLAN-aware bridge, ports, mgmt iface) | ✅ **applied & live** — flat data VLAN + isolated mgmt VLAN, `vlan-filtering` on | -The switch is reachable today by key auth as user `sjat`. All task files now carry their -real RouterOS logic. The `vlans` topology in `host_vars` is still a **placeholder**: -replace it with the real makerspace VLAN ids + per-port map before running `--tags vlans` -on the live device, and do so on-site with a serial/WinBox-MAC recovery channel open. +**Live topology (2026-06-09):** a flat L2 switch on the makerspace `10.2.30.0/24` — +**DATA VLAN 30** (`ether1` copper uplink + `ether2-7` + SFP+) bridged through, and an +**isolated MGMT VLAN 99 on `ether8`** (switch admin at `192.168.88.1`, no gateway/NTP/DNS). +The mgmt port also serves DHCP + the web UI as an experiment (plug into `ether8`, get a +lease, admin at `http://192.168.88.1`; login still required, default `admin` disabled). +SFP+ 10G uplink and real VLAN segmentation are future work. See +`docs/superpowers/specs/2026-06-09-crs310-flat-mgmtvlan-design.md` for the design + the +lockout-safe cutover runbook. ## Layout