docs(spec): flat data path + isolated mgmt VLAN topology

ether1 copper uplink (SFP+ deferred), flat 10.2.30.0/24 data VLAN 30, isolated
mgmt VLAN 99 on ether8 with switch mgmt 192.168.88.1/24, no gateway/NTP/DNS.
Includes the lockout-safe on-site cutover runbook.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
sjat 2026-06-09 12:12:22 +02:00
parent 67554c0b38
commit 8a42f5482f

View file

@ -0,0 +1,97 @@
# CRS310 — flat data path + isolated management VLAN — Design
**Date:** 2026-06-09
**Status:** Approved (brainstorming complete)
**Author:** sjat + Claude
**Supersedes** the placeholder topology in `host_vars/crs310-maker.yml` (the
`10.0.99.x` / SFP+-trunk example). Builds on
`2026-06-07-mikrotik-crs310-ansible-design.md`.
## Purpose
Bring the makerspace CRS310 into service as a **flat L2 switch** on the existing
`10.2.30.0/24` network, with its **management plane isolated on a dedicated VLAN**
reached through one physical port. No SFP+ yet — the 10G uplink is deferred until the
connectors arrive; **`ether1` is the (copper) uplink** for now.
## Context (as found on 2026-06-09)
- Switch on factory **defconf**: one flat `bridge` with all ports, mgmt IP
`192.168.88.1/24` sitting directly on `bridge`, `vlan-filtering=no`.
- Upstream LAN is **flat**: DHCP/gateway at `10.2.30.1`, untagged. Verified by leasing
`10.2.30.227` to mamba *through* the switch's flat bridge.
- mamba is the management station (patched into the switch, reached from fisi over a
`kuku` jump + port-forward tunnel to `192.168.88.1`).
## Topology
VLAN-aware bridge (`bridge`), `vlan-filtering=yes` enabled **last**. All ports are
untagged access ports — **no trunks**.
| Port | Mode | PVID | VLAN | Notes |
|---|---|---|---|---|
| `ether1` | access | 30 | DATA | copper uplink to `10.2.30.0/24` |
| `ether2``ether7` | access | 30 | DATA | device access ports |
| `sfp-sfpplus1/2` | access | 30 | DATA | unused until connectors arrive |
| `ether8` | access | 99 | MGMT | dedicated management port (mamba lives here) |
- **DATA VLAN 30** — internal-only id; ingress/egress on `ether1` is untagged, so the
upstream router sees a plain flat network. The switch CPU (`bridge`) is **not** a
member of VLAN 30 → no switch L3 presence on the user network.
- **MGMT VLAN 99**`vlan-mgmt` interface on the bridge, IP **`192.168.88.1/24`**, the
bridge/CPU is the only tagged member, `ether8` the only untagged member.
**No default gateway** — management is intentionally isolated.
## Management & internet
- Reachable only from `ether8` (plug the management laptop / mamba there, addressed
`192.168.88.2/24`). The switch does **no routing or DHCP**; `10.2.30.1` keeps both.
- The control plane has **no internet** by design → **NTP/DNS disabled** (they would
only error on an isolated segment; clock won't sync, updates are done manually when
the switch is temporarily patched to the data network).
## Required changes to the IaC
1. `host_vars/crs310-maker.yml`: replace the placeholder topology with the table above;
`switch_mgmt_address: 192.168.88.1/24`, `switch_mgmt_vlan_id: 99`, **no gateway**;
drop the `10.0.99.x` DNS/NTP/gateway placeholders.
2. Role `vlans.yml`: make the **default-route** task conditional on a gateway being set
(skip when isolated); **remove the legacy defconf IP** off the bare `bridge` so it
doesn't collide with the `vlan-mgmt` IP (`192.168.88.1` must live only on
`vlan-mgmt`).
3. Role `identity.yml`: gate NTP (and DNS) behind a flag / empty-server check so an
isolated mgmt plane doesn't configure unreachable servers. Add
`switch_ntp_enabled: false` for this host.
The existing `vlans.yml` membership Jinja already produces the correct sets for an
all-access topology (DATA untagged = data ports, CPU tagged only on MGMT).
## Cutover runbook (lockout-safe; operator on-site at `ether8`)
1. **Restore mgmt path** (done): mamba `enp0s31f6``192.168.88.2/24` (profile
`crs310-bench`); fisi→mamba→switch tunnel up; Ansible reaches `192.168.88.1`.
2. **Move the cable: switch port 5 → port 8.** (Bridge is still flat, so mamba stays
reachable on either port.) Re-confirm reachability.
3. Apply config in order: bridge VLAN table → port PVIDs → create `vlan-mgmt` iface.
Verify the VLAN/PVID state with `vlan-filtering` still **off**. Then the **flip**, as
one ordered sequence (the address can't be on both interfaces at once): remove
`192.168.88.1` from `bridge`, add it to `vlan-mgmt`, set `vlan-filtering=yes`. mamba
(`ether8`, untagged VLAN 99, `.2`) ↔ switch (`.1`) is the canary; the SSH/tunnel may
blip during the flip but must come back. Pre-verifying PVID/membership before the
flip is what prevents a hard lockout.
4. Verify: `/interface/bridge/vlan/print` membership correct, mgmt still reachable, a
device on `ether1`-fed ports still gets `10.2.30.x`.
## Risks
- **Lockout** on enabling `vlan-filtering` if `ether8`/VLAN 99/mgmt-IP aren't aligned.
Mitigated by ordering (filtering last), the live canary connection, and the operator
being on-site to re-cable. WinBox-MAC recovery is unavailable (broken under Wine);
worst case is a no-defaults reset, which we avoid.
- **Removing the legacy bridge IP** is the delicate step — done while the new
`vlan-mgmt` IP is the same address, before filtering, with the connection watched.
## Out of scope
Real inter-VLAN segmentation, the SFP+ 10G uplink/trunk, and any upstream router VLAN
work — revisited when the connectors and a real VLAN plan are ready.