MakerFLOSS_Troubleshooting/runbooks/switch-crs310.md
sjat 9ff12700ae Initial troubleshooting workspace: access, network map, runbooks
Scaffold for troubleshooting MakerFLOSS hosts at the makerspace.
Reference + thin runbooks model — authoritative data stays in the
source repos (AnsibleBaobabV4, MakerFLOSS_Mikrotik, MakerFLOSS).

- access.md: reach paths for mamba-on-LAN and fisi-tunneling-in
  (netbird on-demand, VPS bastion, ProxyJump via kuku->mamba),
  with the isolation rule.
- network-map.md: subnet pointers + open question on makerspace
  addressing (10.2.30/172.17.3/10.0.0).
- runbooks/switch-crs310.md: CRS310 connectivity + lockout recovery.
- incidents/: dated log scaffold.
- CLAUDE.md: operating rules for this repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 13:24:26 +02:00

3.7 KiB
Raw Blame History

Runbook — CRS310 switch (crs310-maker)

The "new switch" at the makerspace. MikroTik CRS310-8G+2S+IN, RouterOS 7.19.6. Managed by Ansible from ~/Projects/MakerFLOSS_Mikrotik.

Authoritative sources:

  • Live topology + cutover runbook: MakerFLOSS_Mikrotik/docs/superpowers/specs/2026-06-09-crs310-flat-mgmtvlan-design.md
  • Running config snapshot: MakerFLOSS_Mikrotik/backups/crs310-maker/export.rsc
  • Device vars: MakerFLOSS_Mikrotik/host_vars/crs310-maker.yml
  • Field guide: MakerFLOSS_Mikrotik/docs/makerspace-switch-fieldguide.md

Topology recap

  • Transparent L2. The switch is not a router — no inter-VLAN routing, no presence on the data network.
  • Data VLAN 30 (10.2.30.0/24, gw .1): ether1 = uplink, ether27 = access (untagged), SFP+ reserved (deferred). Users plug in here.
  • Mgmt VLAN 99 (192.168.88.0/24): switch at 192.168.88.1, reachable only via ether8 (untagged). DHCP .10.254, web UI on. CPU is the only tagged member. No default route, no DNS, no NTP — isolated by design.
  • vlan-filtering=yes went live 2026-06-09.

Reach

  • Mgmt (reconfig, SSH, web UI): you need a host on ether8. On-site: plug mamba into ether8, SSH 192.168.88.1 (or http://192.168.88.1). Remote: there is no standing tunnel to the mgmt VLAN — forward through a host that is on ether8. See access.md §A / §B.
  • Data path test: plug into ether27, expect a 10.2.30.0/24 lease.

Diagnose

Symptom Check
No link / no DHCP on an access port Confirm you're on ether27 (data), not ether8 (mgmt). Verify uplink ether1 is up to 10.2.30.1.
Can't reach switch mgmt Are you on ether8? Mgmt is reachable nowhere else. Confirm a 192.168.88.x lease.
Suspected config drift Diff live vs repo: run a backup play, compare backups/crs310-maker/export.rsc to git.
Lockout after a change See recovery below.

Connectivity test via Ansible (from MakerFLOSS_Mikrotik, mamba on ether8):

ansible -m community.routeros.command -a "commands='/system/resource/print'" crs310-maker

Fix

Changes land in MakerFLOSS_Mikrotik on main (per the repo's own workflow). Device-touching rules — do not skip:

  • Run any device-touching play twice; the second run must report no changes (idempotency).
  • Enable vlan-filtering last, after bridge/PVID/mgmt-VLAN are proven.
  • Network-affecting changes (mgmt IP/VLAN) should run as a self-reverting detached job (240s timeout) so a bad flip auto-rolls-back.
  • Keep a WinBox MAC-telnet or serial recovery channel open when touching network settings.
# from ~/Projects/MakerFLOSS_Mikrotik, mamba on ether8
yamllint . && ansible-lint && ansible-playbook play_switch.yml --syntax-check
ansible-playbook play_switch.yml                 # full day-2
ansible-playbook play_switch.yml --tags vlans    # one domain
ansible-playbook play_backup.yml                 # snapshot config into the repo

Recovery (lockout)

Documented gotchas from the 2026-06-09 cutover (see the spec):

  • mamba NetworkManager flap on the bench — pin the crs310-bench profile autoconnect yes, static 192.168.88.2/24.
  • RouterOS find ... address= does not match IP prefixes — use find interface= instead (caused a bridge-IP removal bug).
  • If locked out over the network, recover via WinBox MAC-telnet on ether8 or serial console; the detached-job timeout should also self-revert.

Verify

  • ansible-playbook play_switch.yml second run → no changes.
  • Access-port client gets a 10.2.30.0/24 lease and reaches the gateway.
  • ether8 client gets 192.168.88.x and can SSH 192.168.88.1.
  • export.rsc committed and matches intent.