Compare commits

..

3 commits

Author SHA1 Message Date
sjat
199edf85ad fix(vlans): robust bridge-IP removal; record cutover + gotchas
RouterOS 'find ... address=<prefix>' never matches an ip/address value, so the
legacy-bridge-IP removal is now a :foreach get-and-compare. Refresh the committed
export.rsc to the post-cutover config (flat VLAN 30 + isolated mgmt VLAN 99 on
ether8, vlan-filtering on). Spec updated with execution notes (NM autoconnect flap,
the find-address quirk, and the commit-confirmed detached-flip technique used).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 12:38:04 +02:00
sjat
ebd21623ef feat: real flat+mgmt-VLAN topology in host_vars; role tweaks
host_vars: DATA VLAN 30 (ether1 uplink + ether2-7 + sfp1/2), isolated MGMT VLAN 99
on ether8, mgmt 192.168.88.1/24, no gateway, NTP disabled. Role: switch_ntp_enabled
flag (enable/disable NTP), conditional default route (skip when no gateway), and a
guarded removal of the legacy defconf bridge IP so the mgmt IP lives only on vlan-mgmt.
Membership Jinja re-validated; lint+syntax clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 12:15:23 +02:00
sjat
8a42f5482f docs(spec): flat data path + isolated mgmt VLAN topology
ether1 copper uplink (SFP+ deferred), flat 10.2.30.0/24 data VLAN 30, isolated
mgmt VLAN 99 on ether8 with switch mgmt 192.168.88.1/24, no gateway/NTP/DNS.
Includes the lockout-safe on-site cutover runbook.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 12:12:22 +02:00
6 changed files with 192 additions and 42 deletions

View file

@ -1,26 +1,30 @@
# 2025-09-11 09:49:07 by RouterOS 7.19.6 # 2025-09-11 10:03:39 by RouterOS 7.19.6
# software id = 73S3-5F2W # software id = 73S3-5F2W
# #
# model = CRS310-8G+2S+ # model = CRS310-8G+2S+
# serial number = HM40B8TDNDD # serial number = HM40B8TDNDD
/interface bridge /interface bridge
add admin-mac=D0:EA:11:24:F4:AA auto-mac=no comment=defconf name=bridge add admin-mac=D0:EA:11:24:F4:AA auto-mac=no comment=defconf name=bridge \
vlan-filtering=yes
/interface vlan
add interface=bridge name=vlan-mgmt vlan-id=99
/interface bridge port /interface bridge port
add bridge=bridge comment=defconf interface=ether1 add bridge=bridge comment=defconf interface=ether1 pvid=30
add bridge=bridge comment=defconf interface=ether2 add bridge=bridge comment=defconf interface=ether2 pvid=30
add bridge=bridge comment=defconf interface=ether3 add bridge=bridge comment=defconf interface=ether3 pvid=30
add bridge=bridge comment=defconf interface=ether4 add bridge=bridge comment=defconf interface=ether4 pvid=30
add bridge=bridge comment=defconf interface=ether5 add bridge=bridge comment=defconf interface=ether5 pvid=30
add bridge=bridge comment=defconf interface=ether6 add bridge=bridge comment=defconf interface=ether6 pvid=30
add bridge=bridge comment=defconf interface=ether7 add bridge=bridge comment=defconf interface=ether7 pvid=30
add bridge=bridge comment=defconf interface=ether8 add bridge=bridge comment=defconf interface=ether8 pvid=99
add bridge=bridge comment=defconf interface=sfp-sfpplus1 add bridge=bridge comment=defconf interface=sfp-sfpplus1 pvid=30
add bridge=bridge comment=defconf interface=sfp-sfpplus2 add bridge=bridge comment=defconf interface=sfp-sfpplus2 pvid=30
/interface bridge vlan
add bridge=bridge untagged="ether1,ether2,ether3,ether4,ether5,ether6,ether7,s\
fp-sfpplus1,sfp-sfpplus2" vlan-ids=30
add bridge=bridge tagged=bridge untagged=ether8 vlan-ids=99
/ip address /ip address
add address=192.168.88.1/24 comment=defconf interface=bridge network=\ add address=192.168.88.1/24 interface=vlan-mgmt network=192.168.88.0
192.168.88.0
/ip dns
set servers=10.0.99.1
/ip service /ip service
set ftp disabled=yes set ftp disabled=yes
set telnet disabled=yes set telnet disabled=yes
@ -29,7 +33,5 @@ set api disabled=yes
set api-ssl disabled=yes set api-ssl disabled=yes
/system identity /system identity
set name=crs310-maker set name=crs310-maker
/system ntp client
set enabled=yes
/system ntp client servers /system ntp client servers
add address=10.0.99.1 add address=10.0.99.1

View file

@ -0,0 +1,116 @@
# CRS310 — flat data path + isolated management VLAN — Design
**Date:** 2026-06-09
**Status:** Approved (brainstorming complete)
**Author:** sjat + Claude
**Supersedes** the placeholder topology in `host_vars/crs310-maker.yml` (the
`10.0.99.x` / SFP+-trunk example). Builds on
`2026-06-07-mikrotik-crs310-ansible-design.md`.
## Purpose
Bring the makerspace CRS310 into service as a **flat L2 switch** on the existing
`10.2.30.0/24` network, with its **management plane isolated on a dedicated VLAN**
reached through one physical port. No SFP+ yet — the 10G uplink is deferred until the
connectors arrive; **`ether1` is the (copper) uplink** for now.
## Context (as found on 2026-06-09)
- Switch on factory **defconf**: one flat `bridge` with all ports, mgmt IP
`192.168.88.1/24` sitting directly on `bridge`, `vlan-filtering=no`.
- Upstream LAN is **flat**: DHCP/gateway at `10.2.30.1`, untagged. Verified by leasing
`10.2.30.227` to mamba *through* the switch's flat bridge.
- mamba is the management station (patched into the switch, reached from fisi over a
`kuku` jump + port-forward tunnel to `192.168.88.1`).
## Topology
VLAN-aware bridge (`bridge`), `vlan-filtering=yes` enabled **last**. All ports are
untagged access ports — **no trunks**.
| Port | Mode | PVID | VLAN | Notes |
|---|---|---|---|---|
| `ether1` | access | 30 | DATA | copper uplink to `10.2.30.0/24` |
| `ether2``ether7` | access | 30 | DATA | device access ports |
| `sfp-sfpplus1/2` | access | 30 | DATA | unused until connectors arrive |
| `ether8` | access | 99 | MGMT | dedicated management port (mamba lives here) |
- **DATA VLAN 30** — internal-only id; ingress/egress on `ether1` is untagged, so the
upstream router sees a plain flat network. The switch CPU (`bridge`) is **not** a
member of VLAN 30 → no switch L3 presence on the user network.
- **MGMT VLAN 99**`vlan-mgmt` interface on the bridge, IP **`192.168.88.1/24`**, the
bridge/CPU is the only tagged member, `ether8` the only untagged member.
**No default gateway** — management is intentionally isolated.
## Management & internet
- Reachable only from `ether8` (plug the management laptop / mamba there, addressed
`192.168.88.2/24`). The switch does **no routing or DHCP**; `10.2.30.1` keeps both.
- The control plane has **no internet** by design → **NTP/DNS disabled** (they would
only error on an isolated segment; clock won't sync, updates are done manually when
the switch is temporarily patched to the data network).
## Required changes to the IaC
1. `host_vars/crs310-maker.yml`: replace the placeholder topology with the table above;
`switch_mgmt_address: 192.168.88.1/24`, `switch_mgmt_vlan_id: 99`, **no gateway**;
drop the `10.0.99.x` DNS/NTP/gateway placeholders.
2. Role `vlans.yml`: make the **default-route** task conditional on a gateway being set
(skip when isolated); **remove the legacy defconf IP** off the bare `bridge` so it
doesn't collide with the `vlan-mgmt` IP (`192.168.88.1` must live only on
`vlan-mgmt`).
3. Role `identity.yml`: gate NTP (and DNS) behind a flag / empty-server check so an
isolated mgmt plane doesn't configure unreachable servers. Add
`switch_ntp_enabled: false` for this host.
The existing `vlans.yml` membership Jinja already produces the correct sets for an
all-access topology (DATA untagged = data ports, CPU tagged only on MGMT).
## Cutover runbook (lockout-safe; operator on-site at `ether8`)
1. **Restore mgmt path** (done): mamba `enp0s31f6``192.168.88.2/24` (profile
`crs310-bench`); fisi→mamba→switch tunnel up; Ansible reaches `192.168.88.1`.
2. **Move the cable: switch port 5 → port 8.** (Bridge is still flat, so mamba stays
reachable on either port.) Re-confirm reachability.
3. Apply config in order: bridge VLAN table → port PVIDs → create `vlan-mgmt` iface.
Verify the VLAN/PVID state with `vlan-filtering` still **off**. Then the **flip**, as
one ordered sequence (the address can't be on both interfaces at once): remove
`192.168.88.1` from `bridge`, add it to `vlan-mgmt`, set `vlan-filtering=yes`. mamba
(`ether8`, untagged VLAN 99, `.2`) ↔ switch (`.1`) is the canary; the SSH/tunnel may
blip during the flip but must come back. Pre-verifying PVID/membership before the
flip is what prevents a hard lockout.
4. Verify: `/interface/bridge/vlan/print` membership correct, mgmt still reachable, a
device on `ether1`-fed ports still gets `10.2.30.x`.
## Risks
- **Lockout** on enabling `vlan-filtering` if `ether8`/VLAN 99/mgmt-IP aren't aligned.
Mitigated by ordering (filtering last), the live canary connection, and the operator
being on-site to re-cable. WinBox-MAC recovery is unavailable (broken under Wine);
worst case is a no-defaults reset, which we avoid.
- **Removing the legacy bridge IP** is the delicate step — done while the new
`vlan-mgmt` IP is the same address, before filtering, with the connection watched.
## Execution notes (applied 2026-06-09)
Cutover completed; switch is VLAN-filtered with isolated mgmt reachable on `ether8`.
`play_switch.yml` runs idempotently over the new mgmt path. Two gotchas surfaced:
- **NetworkManager autoconnect flap:** moving mamba's cable bounced the link; NM
re-selected the DHCP profile and dropped the static mgmt IP. Fixed by making
`crs310-bench` (192.168.88.2) sticky (`autoconnect yes`, priority 10) and turning
`autoconnect off` on `Wired connection 1`.
- **RouterOS `find ... address=<prefix>` never matches** an `/ip/address` value (returns
0 even on an exact string). The first flip therefore failed to remove the defconf IP
off `bridge`, duplicating `192.168.88.1` onto `vlan-mgmt` and breaking ARP. Fix: remove
by `[find interface=bridge]`, or match via `:foreach`+`/ip/address/get $a address`.
- **The flip was run as a detached, self-reverting on-device job** (`:execute { … :delay
240s; :if ($mgmtok=false) do={ revert } }`) — a commit-confirmed pattern. The first
(failed) attempt auto-healed at the timer; the corrected attempt was confirmed by
setting `:global mgmtok true` within the window. Recommended for any future
`vlan-filtering`/mgmt-IP change made over the network.
## Out of scope
Real inter-VLAN segmentation, the SFP+ 10G uplink/trunk, and any upstream router VLAN
work — revisited when the connectors and a real VLAN plan are ready.

View file

@ -5,40 +5,46 @@
# base MAC (ether1): D0:EA:11:24:F4:AA # base MAC (ether1): D0:EA:11:24:F4:AA
# RouterOS: 7.19.6 stable (bootloader already current) -> pinned target below # RouterOS: 7.19.6 stable (bootloader already current) -> pinned target below
# #
# Bootstrap status (2026-06-08): identity set; user `sjat` (full) created with the # Topology (decided 2026-06-09, see docs/superpowers/specs/
# operator ed25519 key imported + a vaulted password (vault_switch_admin_password in # 2026-06-09-crs310-flat-mgmtvlan-design.md): the switch is a FLAT L2 switch on the
# group_vars/mikrotik.vault.yml). Key login verified. Default `admin` still enabled # makerspace 10.2.30.0/24 network with its management isolated on a dedicated VLAN.
# (not yet hardened). Switch currently on the bench at 192.168.88.1 (defconf, not yet # - ether1 is the copper UPLINK (SFP+ deferred until connectors arrive).
# reset/VLAN-configured). Real mgmt addressing below is the FUTURE production plan. # - DATA VLAN 30: flat 10.2.30.0/24 bridged through; the switch does NO routing/DHCP
# and the CPU is not a member (no switch presence on the user network).
# - MGMT VLAN 99: isolated; switch mgmt IP 192.168.88.1/24 on vlan-mgmt, reachable
# only from the dedicated mgmt port ether8. No gateway, no NTP/DNS (no internet).
# Day-2 connection: key auth as the named admin user (overrides the bootstrap # Day-2 connection: key auth as the named admin user (overrides the bootstrap
# default ansible_user=admin in group_vars/mikrotik.yml). # default ansible_user=admin in group_vars/mikrotik.yml).
ansible_user: sjat ansible_user: sjat
switch_identity_name: "crs310-maker" switch_identity_name: "crs310-maker"
# ----- Management (isolated VLAN 99) -----
switch_mgmt_vlan_id: 99 switch_mgmt_vlan_id: 99
switch_mgmt_address: "10.0.99.2/24" # EDIT: real mgmt IP switch_mgmt_address: "192.168.88.1/24"
switch_mgmt_gateway: "10.0.99.1" # EDIT: real gateway switch_mgmt_gateway: "" # isolated mgmt -> no default route
switch_dns_servers: "10.0.99.1" switch_dns_servers: "" # no DNS on an isolated mgmt plane
switch_ntp_servers: "10.0.99.1" switch_ntp_enabled: false # no internet on mgmt -> NTP would only error
switch_admin_user: "sjat" switch_admin_user: "sjat"
# PLACEHOLDER VLAN/port topology — vlans.yml is correct mechanism, but these IDs # ----- VLANs + per-port map (all untagged access; no trunks) -----
# and the per-port map are NOT the real makerspace plan. Replace with the real # DATA = flat 10.2.30.0/24 (uplink + device ports); MGMT = isolated admin VLAN.
# VLAN ids + full ether1-8/sfp map before any on-site VLAN run. Notes:
# - mode: access -> untagged member of `pvid`; mode: trunk -> tagged member of
# each id in `tagged_vlans`, with `pvid` as the native (untagged) VLAN.
# - trunk pvid: 1 means untagged frames on the uplink land in VLAN 1 (unused in a
# hardened design). Decide deliberately whether the uplink should carry any
# untagged traffic; set pvid to an intended native VLAN or leave 1 as a dead end.
# - the bridge (CPU) is tagged ONLY on switch_mgmt_vlan_id (see vlans.yml).
switch_vlans: switch_vlans:
- {id: 30, name: "data"}
- {id: 99, name: "mgmt"} - {id: 99, name: "mgmt"}
- {id: 10, name: "members"}
switch_bridge_ports: switch_bridge_ports:
- {interface: "ether1", pvid: 10, mode: access} - {interface: "ether1", pvid: 30, mode: access} # copper uplink
- {interface: "ether2", pvid: 10, mode: access} - {interface: "ether2", pvid: 30, mode: access}
- {interface: "sfp-sfpplus1", pvid: 1, mode: trunk, tagged_vlans: [99, 10]} - {interface: "ether3", pvid: 30, mode: access}
- {interface: "ether4", pvid: 30, mode: access}
- {interface: "ether5", pvid: 30, mode: access}
- {interface: "ether6", pvid: 30, mode: access}
- {interface: "ether7", pvid: 30, mode: access}
- {interface: "sfp-sfpplus1", pvid: 30, mode: access}
- {interface: "sfp-sfpplus2", pvid: 30, mode: access}
- {interface: "ether8", pvid: 99, mode: access} # dedicated mgmt port
# Firmware: pinned at the version already installed (no upgrade planned now). # Firmware: pinned at the version already installed (no upgrade planned now).
switch_firmware_target: "7.19.6" switch_firmware_target: "7.19.6"

View file

@ -5,6 +5,7 @@ switch_mgmt_vlan_id: 99
switch_mgmt_address: "192.168.88.1/24" # PLACEHOLDER — override in host_vars switch_mgmt_address: "192.168.88.1/24" # PLACEHOLDER — override in host_vars
switch_mgmt_gateway: "192.168.88.254" # PLACEHOLDER — override in host_vars switch_mgmt_gateway: "192.168.88.254" # PLACEHOLDER — override in host_vars
switch_dns_servers: "192.168.88.254" switch_dns_servers: "192.168.88.254"
switch_ntp_enabled: true # set false for an isolated mgmt plane
switch_ntp_servers: "192.168.88.254" switch_ntp_servers: "192.168.88.254"
# Services to disable for hardening (winbox kept on by default for recovery) # Services to disable for hardening (winbox kept on by default for recovery)

View file

@ -15,10 +15,18 @@
- /ip/dns/set servers="{{ switch_dns_servers }}" allow-remote-requests=no - /ip/dns/set servers="{{ switch_dns_servers }}" allow-remote-requests=no
changed_when: false changed_when: false
- name: Configure NTP client - name: Enable NTP client
community.routeros.command: community.routeros.command:
commands: commands:
- /system/ntp/client/set enabled=yes servers="{{ switch_ntp_servers }}" - /system/ntp/client/set enabled=yes servers="{{ switch_ntp_servers }}"
when: switch_ntp_enabled | bool
changed_when: false
- name: Disable NTP client (isolated mgmt plane has no upstream time source)
community.routeros.command:
commands:
- /system/ntp/client/set enabled=no
when: not (switch_ntp_enabled | bool)
changed_when: false changed_when: false
- name: Disable unused IP services (hardening; winbox kept for recovery) - name: Disable unused IP services (hardening; winbox kept for recovery)

View file

@ -76,7 +76,23 @@
interface="{{ switch_bridge_name }}" vlan-id={{ switch_mgmt_vlan_id }} } interface="{{ switch_bridge_name }}" vlan-id={{ switch_mgmt_vlan_id }} }
changed_when: false changed_when: false
- name: Assign the management IP address # On a defconf switch the mgmt IP lives directly on the bare `bridge`; it must move to
# vlan-mgmt (same address can't be on both). Removing it drops a session reaching the
# switch THROUGH the bridge, so during the first cutover this is done out-of-band as one
# detached flip (see the design doc's runbook), not by a straight play run. In steady
# state both tasks below are no-ops.
- name: Remove the legacy management IP from the bare bridge interface
# NOTE: RouterOS `find ... address=<v>` does NOT match an ip/address prefix value
# (it returns 0 even on an exact string), so match by get-and-compare instead.
community.routeros.command:
commands:
- >-
:foreach a in=[/ip/address/find interface="{{ switch_bridge_name }}"]
do={ :if ([/ip/address/get $a address]="{{ switch_mgmt_address }}")
do={ /ip/address/remove $a } }
changed_when: false
- name: Assign the management IP address to vlan-mgmt
community.routeros.command: community.routeros.command:
commands: commands:
- >- - >-
@ -92,6 +108,7 @@
:if ([:len [/ip/route/find dst-address="0.0.0.0/0"]] = 0) :if ([:len [/ip/route/find dst-address="0.0.0.0/0"]] = 0)
do={ /ip/route/add dst-address=0.0.0.0/0 do={ /ip/route/add dst-address=0.0.0.0/0
gateway="{{ switch_mgmt_gateway }}" } gateway="{{ switch_mgmt_gateway }}" }
when: switch_mgmt_gateway | length > 0
changed_when: false changed_when: false
- name: Enable VLAN filtering (LAST — prove mgmt reachability first) - name: Enable VLAN filtering (LAST — prove mgmt reachability first)