From f1d7b3059c1b853da7d613c9d9654ffda5488a7d Mon Sep 17 00:00:00 2001 From: sjat Date: Sun, 7 Jun 2026 08:04:56 +0200 Subject: [PATCH] docs: CRS310 Ansible management design (brainstorming spec) Initial design doc for managing the makerspace MikroTik CRS310-8G+2S+IN switch as IaC over SSH with community.routeros. Single-switch scope, fresh repo in AnsibleBaobabV4 conventions, separate makerfloss vault. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...26-06-07-mikrotik-crs310-ansible-design.md | 172 ++++++++++++++++++ 1 file changed, 172 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-07-mikrotik-crs310-ansible-design.md diff --git a/docs/superpowers/specs/2026-06-07-mikrotik-crs310-ansible-design.md b/docs/superpowers/specs/2026-06-07-mikrotik-crs310-ansible-design.md new file mode 100644 index 0000000..ef40fe8 --- /dev/null +++ b/docs/superpowers/specs/2026-06-07-mikrotik-crs310-ansible-design.md @@ -0,0 +1,172 @@ +# MakerFLOSS_Mikrotik — CRS310 Ansible Management — Design + +**Date:** 2026-06-07 +**Status:** Approved (brainstorming complete; pending implementation plan) +**Author:** sjat + Claude + +## Purpose + +Manage the makerspace's MikroTik **CRS310-8G+2S+IN** 10-port switch +(8× 2.5GbE + 2× SFP+ 10G, RouterOS) as Infrastructure-as-Code with Ansible. +Goal: deterministic, idempotent, version-controlled switch configuration — +identity, management access, users/keys, VLAN switching, backups, and firmware — +so the switch can be rebuilt from the repo with no manual WinBox clicking. + +## Scope + +**In scope (this iteration):** a single CRS310 switch, configured over SSH. + +Configuration domains, each gated by an enable-flag: +1. **Identity + management + services** — hostname/identity, management IP/VLAN, + NTP/DNS, enable SSH, disable unused services (telnet, ftp, www, api; winbox decision in Open Items). +2. **Users + SSH keys** — named admin user, import operator SSH public key, + harden/disable the default `admin`. +3. **VLANs + bridge + ports** — bridge with hardware-offload VLAN filtering, + access/trunk port assignments, SFP+ as upstream trunk. Ships with a + **placeholder** example topology; real VLAN IDs/port map filled into `host_vars` later. +4. **Backups + firmware** — scheduled `/export` + `/system backup`, fetched into the + repo; RouterOS/RouterBOOT upgrade flow to a pinned target version. + +**Out of scope (for now):** additional MikroTik devices, APs, routers; the REST API +transport; CI/molecule testing; monitoring integration. Structure should not *prevent* +these later, but we build only the single-switch path. + +## Decisions (from brainstorming) + +| Topic | Decision | +|---|---| +| Project / repo name | `MakerFLOSS_Mikrotik` (underscore; hyphen acceptable) | +| Repo host | New repo on `forgejo.makerfloss.eu`, remote `origin`, default branch `main` | +| Location | Sibling directory `~/Projects/MakerFLOSS_Mikrotik` | +| Transport | **SSH** via `network_cli` (`community.routeros`), **key auth** for day-2 | +| Role namespace | `makerfloss.*` → role `makerfloss.mikrotik_switch` | +| Vault | **Separate** identity `makerfloss` at `~/.ansible/vault-keys/makerfloss.txt` — NOT the home `prod` key | +| Config location | All real values in `host_vars/.yml`; connection vars in `group_vars/mikrotik.yml`; mechanism + placeholders in role `defaults/` | +| Base | Fresh repo in AnsibleBaobabV4 conventions; cherry-pick narrowin/ansible-mikrotik command sequences for backup/upgrade | +| Clean slate | Factory-reset switch to **no default configuration**; Ansible owns the entire config | +| Default admin | Create named admin user + import key; **disable** the default `admin` after key login is proven | + +## What to bring over from AnsibleBaobabV4 + +Copy + trim (independent repos; do not symlink): + +- `.envrc` + `.venv` direnv bootstrap — verbatim. +- `ansible.cfg` — adapted: `host_key_checking=False`, `vault_identity_list = makerfloss@~/.ansible/vault-keys/makerfloss.txt`, network-CLI-friendly defaults. +- `.ansible-lint` + yamllint config — verbatim. +- `requirements.txt` — trimmed to `ansible`, `ansible-lint`, `yamllint` (drop molecule/docker/snipe/kuma). +- `requirements.yml` — `community.routeros` (pulls in `ansible.netcommon`). +- Inventory cascade pattern: `inventories/prod/hosts.yml` with one host in group `mikrotik`. +- **Operator SSH public key** `~/.ssh/id_ed25519.pub` → imported onto the switch admin user. +- Forgejo push key `~/.ssh/id_ed25519_forgejo` already exists (used for `git push`). + +## Architecture + +### Repo layout + +``` +MakerFLOSS_Mikrotik/ +├── .envrc / .ansible-lint / .yamllint / ansible.cfg +├── requirements.txt / requirements.yml +├── inventories/ +│ └── prod/hosts.yml # group: mikrotik -> one switch host +├── group_vars/ +│ └── mikrotik.yml # connection/platform vars (network_cli, network_os, user, key) +├── host_vars/ +│ └── .yml # identity, mgmt IP/VLAN, VLAN+port map, firmware_target +├── roles/ +│ └── makerfloss.mikrotik_switch/ +│ ├── defaults/main.yml # enable-flags, safe defaults, PLACEHOLDER vlan/port map +│ ├── tasks/main.yml # imports domain task files, each gated by a flag +│ ├── tasks/identity.yml # identity, mgmt IP, NTP/DNS, SSH on, unused services off +│ ├── tasks/users.yml # named admin, import ssh pubkey, disable default admin +│ ├── tasks/vlans.yml # bridge + hw VLAN filtering, access/trunk ports, SFP+ uplink +│ ├── tasks/backup.yml # /export + /system backup save, fetch into repo +│ └── tasks/firmware.yml # RouterOS + RouterBOOT upgrade to firmware_target +├── playbooks (or top-level): +│ ├── play_bootstrap.yml # FIRST CONTACT: password auth -> create user, import key +│ ├── play_switch.yml # day-2: key-only, applies all enabled domains +│ └── play_backup.yml # on-demand/scheduled backup fetch +├── backups// # fetched config exports + .backup files +└── docs/superpowers/specs/ # this design doc +``` + +### Connection model + +`group_vars/mikrotik.yml`: +- `ansible_connection: ansible.netcommon.network_cli` +- `ansible_network_os: community.routeros.routeros` +- `ansible_user: ` +- `ansible_ssh_private_key_file: ~/.ssh/id_ed25519` (day-2, key auth) + +`play_bootstrap.yml` overrides with password auth (`--ask-pass`) for first contact only. + +### Idempotency strategy (key design challenge) + +Over `network_cli`/SSH the primary module is `community.routeros.command` (RouterOS has +no rich declarative module set like `ios_*`). Idempotency is therefore the main risk and +must be deliberate: +- Prefer naturally-idempotent commands: `/.../ set` on known, named items. +- For `add`-style items, guard with RouterOS scripting: `:if ([find ] = "") do={ add ... }`. +- Use `changed_when` based on command output where guards are impractical. +- Keep each domain's command set small and readable; one logical change per task. +- Cross-check against `community.routeros.facts` / `/export` output where useful. + +This is explicitly called out so the implementation plan budgets for testing idempotency +(run twice, assert no changes on second run). + +## Operational flows + +### On-switch preparation (manual, before Ansible) + +1. Confirm boot OS is **RouterOS** (not SwOS) — VLAN filtering + `community.routeros` require it. +2. Upgrade RouterOS **and** RouterBOOT firmware to a known-good stable; record as `firmware_target`. +3. **Factory-reset to no default configuration** so Ansible owns the whole config. +4. First-contact connectivity: laptop on a port, reach the device, confirm SSH reachable. +5. Decide addressing (into `host_vars`): mgmt IP/mask, mgmt VLAN, gateway, and which + port/SFP+ is the upstream **trunk/uplink** to OPNsense. +6. Record identity facts: serial, MAC, model, RouterOS version. +7. Physical: SFP+ module/DAC for the 10G uplink, PSU, mounting. + +### Bootstrap (run once) + +`play_bootstrap.yml`, SSH **password** auth (default/initial creds): +- create named admin user; set its password from vault; +- import `~/.ssh/id_ed25519.pub`, bind to the user; +- enable SSH service; +- verify key login works, then disable the default `admin`. + +### Day-2 (normal) + +`play_switch.yml`, **key-only**, applies all enabled domains idempotently. +`play_backup.yml` exports config + binary backup into `backups//`. + +## Secrets + +Vault identity `makerfloss` (`~/.ansible/vault-keys/makerfloss.txt`), referenced in +`ansible.cfg`. Initial contents: the switch admin password. SSH key auth means day-2 +runs need no secret at runtime. (Vault-less start is possible but we create the identity +up front.) + +## Success criteria + +- `play_bootstrap.yml` takes a factory-reset switch to key-based SSH access. +- `play_switch.yml` applies identity + services + users + a placeholder VLAN/port + topology, and is **idempotent** (second run reports no changes). +- `play_backup.yml` writes a usable `/export` and `.backup` into the repo. +- All real switch values live in `host_vars`; the role contains no makerspace specifics. +- `ansible-lint` and `yamllint` pass. + +## Open items to confirm during planning + +- Exact RouterOS `firmware_target` version to pin. +- Whether `winbox` service stays enabled (convenience) or is disabled (hardening). +- Named admin username (e.g. `sjat` vs a service account like `ansible`). +- Backup scheduling: Ansible-run on demand vs a RouterOS scheduler + fetch. + +## Reference + +- `narrowin/ansible-mikrotik` (GitHub) — playbook-centric; mine its backup/upgrade + command sequences. Not used as a dependency. +- `community.routeros` Ansible collection. +- AnsibleBaobabV4 — conventions source (direnv, ansible.cfg, lint, inventory cascade, + enable-flag role idiom).