MakerFLOSS_Mikrotik/README.md
sjat 2796616d05 docs: capture topology + operational learnings in CLAUDE.md/README
Bring the everyday guides up to the live state (flat data VLAN 30 + isolated mgmt
VLAN 99 on ether8, DHCP + web UI experiment) and record the gotchas that cost time:
the bench tunnel (paramiko ignores ProxyJump), mamba NM-profile stickiness on cable
flap, the RouterOS find-by-address quirk, and the commit-confirmed detached-flip
pattern for lockout-prone changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 13:04:35 +02:00

95 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MakerFLOSS_Mikrotik
Infrastructure-as-Code for the makerspace's **MikroTik CRS310-8G+2S+IN** switch
(8× 2.5GbE + 2× SFP+ 10G, RouterOS 7). Configuration is managed declaratively with
Ansible over SSH using the `community.routeros` collection — identity, management
access, users/keys, VLAN switching, backups, and firmware — so the switch can be
rebuilt from this repo instead of by hand in WinBox.
## Status
| Area | State |
|---|---|
| Repo scaffolding, role skeleton, vault | ✅ done |
| On-site device prep + **bootstrap** (named user + SSH key + identity) | ✅ done (2026-06-08) |
| `identity` / `users` / `backup` / `firmware` + `play_bootstrap` / `play_backup` | ✅ implemented; idempotency-verified against the device (firmware is opt-in, lint/syntax only) |
| `vlans` (VLAN-aware bridge, ports, mgmt iface) | ✅ **applied & live** — flat data VLAN + isolated mgmt VLAN, `vlan-filtering` on |
**Live topology (2026-06-09):** a flat L2 switch on the makerspace `10.2.30.0/24`
**DATA VLAN 30** (`ether1` copper uplink + `ether2-7` + SFP+) bridged through, and an
**isolated MGMT VLAN 99 on `ether8`** (switch admin at `192.168.88.1`, no gateway/NTP/DNS).
The mgmt port also serves DHCP + the web UI as an experiment (plug into `ether8`, get a
lease, admin at `http://192.168.88.1`; login still required, default `admin` disabled).
SFP+ 10G uplink and real VLAN segmentation are future work. See
`docs/superpowers/specs/2026-06-09-crs310-flat-mgmtvlan-design.md` for the design + the
lockout-safe cutover runbook.
## Layout
```
inventories/prod/hosts.yml # group `mikrotik` -> the switch host
group_vars/mikrotik.yml # connection vars (network_cli + community.routeros) + enable-flags
group_vars/mikrotik.vault.yml # encrypted admin/user password (makerfloss vault id)
host_vars/crs310-maker.yml # device facts + real addressing + VLAN/port map
roles/makerfloss.mikrotik_switch/ # the role: defaults + per-domain task files
play_switch.yml # day-2 run (key auth), applies all enabled domains
docs/makerspace-switch-fieldguide.md # on-site, printable prep checklist
docs/superpowers/specs|plans/ # design spec + implementation plan
```
## Setup (control node)
```bash
direnv allow # or: python3 -m venv .venv && . .venv/bin/activate
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yml
```
**Vault:** secrets use a dedicated vault identity `makerfloss`, keyed by
`~/.ansible/vault-keys/makerfloss.txt` (referenced in `ansible.cfg`, kept outside the
repo). View a secret with `ansible-vault view group_vars/mikrotik.vault.yml`.
## Connectivity
The role connects with `ansible.netcommon.network_cli` + `ansible_network_os:
community.routeros.routeros`, authenticating with the operator SSH key
(`~/.ssh/id_ed25519`). Day-2 needs no password.
> **Bench note:** while the switch sits on an isolated bench reachable only through a
> jump host, Ansible's paramiko transport won't traverse `ProxyJump`. Run Ansible from a
> host on the switch's network, or forward the port:
> `ssh -J <jump> <user>@<jump-lan> -L 2222:192.168.88.1:22 -N` then set
> `ansible_host=127.0.0.1 ansible_port=2222`. In production (switch directly reachable)
> this is a non-issue.
## Usage
```bash
# Validate
yamllint . && ansible-lint && ansible-playbook play_switch.yml --syntax-check
# First contact on a fresh/reset device (password auth, one time)
ansible-playbook play_bootstrap.yml -e ansible_user=admin --ask-pass
# Day-2 configuration (key auth, idempotent)
ansible-playbook play_switch.yml
ansible-playbook play_switch.yml --tags identity,users # safe domains
ansible-playbook play_switch.yml --tags vlans # on-site only — see lockout note
ansible-playbook play_switch.yml --limit crs310-maker
# Backup config into the repo
ansible-playbook play_backup.yml
```
## ⚠️ Lockout safety
When changing management, services, or VLAN/bridge settings, keep an independent
recovery channel open (serial console, or WinBox MAC-telnet) and enable
`vlan-filtering` **last**, after the management path is proven. RouterOS config tasks
use `:if [find]` guards for idempotency; **run every device-touching play twice** and
confirm the second run reports no changes.
## Preparing a switch on-site
See **`docs/makerspace-switch-fieldguide.md`** — a printable checklist for what to do
physically at the makerspace before Ansible takes over.