MakerFLOSS_Mikrotik/docs/superpowers/specs/2026-06-07-mikrotik-crs310-ansible-design.md
sjat f1d7b3059c docs: CRS310 Ansible management design (brainstorming spec)
Initial design doc for managing the makerspace MikroTik CRS310-8G+2S+IN
switch as IaC over SSH with community.routeros. Single-switch scope,
fresh repo in AnsibleBaobabV4 conventions, separate makerfloss vault.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 08:04:56 +02:00

172 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MakerFLOSS_Mikrotik — CRS310 Ansible Management — Design
**Date:** 2026-06-07
**Status:** Approved (brainstorming complete; pending implementation plan)
**Author:** sjat + Claude
## Purpose
Manage the makerspace's MikroTik **CRS310-8G+2S+IN** 10-port switch
(8× 2.5GbE + 2× SFP+ 10G, RouterOS) as Infrastructure-as-Code with Ansible.
Goal: deterministic, idempotent, version-controlled switch configuration —
identity, management access, users/keys, VLAN switching, backups, and firmware —
so the switch can be rebuilt from the repo with no manual WinBox clicking.
## Scope
**In scope (this iteration):** a single CRS310 switch, configured over SSH.
Configuration domains, each gated by an enable-flag:
1. **Identity + management + services** — hostname/identity, management IP/VLAN,
NTP/DNS, enable SSH, disable unused services (telnet, ftp, www, api; winbox decision in Open Items).
2. **Users + SSH keys** — named admin user, import operator SSH public key,
harden/disable the default `admin`.
3. **VLANs + bridge + ports** — bridge with hardware-offload VLAN filtering,
access/trunk port assignments, SFP+ as upstream trunk. Ships with a
**placeholder** example topology; real VLAN IDs/port map filled into `host_vars` later.
4. **Backups + firmware** — scheduled `/export` + `/system backup`, fetched into the
repo; RouterOS/RouterBOOT upgrade flow to a pinned target version.
**Out of scope (for now):** additional MikroTik devices, APs, routers; the REST API
transport; CI/molecule testing; monitoring integration. Structure should not *prevent*
these later, but we build only the single-switch path.
## Decisions (from brainstorming)
| Topic | Decision |
|---|---|
| Project / repo name | `MakerFLOSS_Mikrotik` (underscore; hyphen acceptable) |
| Repo host | New repo on `forgejo.makerfloss.eu`, remote `origin`, default branch `main` |
| Location | Sibling directory `~/Projects/MakerFLOSS_Mikrotik` |
| Transport | **SSH** via `network_cli` (`community.routeros`), **key auth** for day-2 |
| Role namespace | `makerfloss.*` → role `makerfloss.mikrotik_switch` |
| Vault | **Separate** identity `makerfloss` at `~/.ansible/vault-keys/makerfloss.txt` — NOT the home `prod` key |
| Config location | All real values in `host_vars/<switch>.yml`; connection vars in `group_vars/mikrotik.yml`; mechanism + placeholders in role `defaults/` |
| Base | Fresh repo in AnsibleBaobabV4 conventions; cherry-pick narrowin/ansible-mikrotik command sequences for backup/upgrade |
| Clean slate | Factory-reset switch to **no default configuration**; Ansible owns the entire config |
| Default admin | Create named admin user + import key; **disable** the default `admin` after key login is proven |
## What to bring over from AnsibleBaobabV4
Copy + trim (independent repos; do not symlink):
- `.envrc` + `.venv` direnv bootstrap — verbatim.
- `ansible.cfg` — adapted: `host_key_checking=False`, `vault_identity_list = makerfloss@~/.ansible/vault-keys/makerfloss.txt`, network-CLI-friendly defaults.
- `.ansible-lint` + yamllint config — verbatim.
- `requirements.txt` — trimmed to `ansible`, `ansible-lint`, `yamllint` (drop molecule/docker/snipe/kuma).
- `requirements.yml``community.routeros` (pulls in `ansible.netcommon`).
- Inventory cascade pattern: `inventories/prod/hosts.yml` with one host in group `mikrotik`.
- **Operator SSH public key** `~/.ssh/id_ed25519.pub` → imported onto the switch admin user.
- Forgejo push key `~/.ssh/id_ed25519_forgejo` already exists (used for `git push`).
## Architecture
### Repo layout
```
MakerFLOSS_Mikrotik/
├── .envrc / .ansible-lint / .yamllint / ansible.cfg
├── requirements.txt / requirements.yml
├── inventories/
│ └── prod/hosts.yml # group: mikrotik -> one switch host
├── group_vars/
│ └── mikrotik.yml # connection/platform vars (network_cli, network_os, user, key)
├── host_vars/
│ └── <switch>.yml # identity, mgmt IP/VLAN, VLAN+port map, firmware_target
├── roles/
│ └── makerfloss.mikrotik_switch/
│ ├── defaults/main.yml # enable-flags, safe defaults, PLACEHOLDER vlan/port map
│ ├── tasks/main.yml # imports domain task files, each gated by a flag
│ ├── tasks/identity.yml # identity, mgmt IP, NTP/DNS, SSH on, unused services off
│ ├── tasks/users.yml # named admin, import ssh pubkey, disable default admin
│ ├── tasks/vlans.yml # bridge + hw VLAN filtering, access/trunk ports, SFP+ uplink
│ ├── tasks/backup.yml # /export + /system backup save, fetch into repo
│ └── tasks/firmware.yml # RouterOS + RouterBOOT upgrade to firmware_target
├── playbooks (or top-level):
│ ├── play_bootstrap.yml # FIRST CONTACT: password auth -> create user, import key
│ ├── play_switch.yml # day-2: key-only, applies all enabled domains
│ └── play_backup.yml # on-demand/scheduled backup fetch
├── backups/<switch>/ # fetched config exports + .backup files
└── docs/superpowers/specs/ # this design doc
```
### Connection model
`group_vars/mikrotik.yml`:
- `ansible_connection: ansible.netcommon.network_cli`
- `ansible_network_os: community.routeros.routeros`
- `ansible_user: <admin user>`
- `ansible_ssh_private_key_file: ~/.ssh/id_ed25519` (day-2, key auth)
`play_bootstrap.yml` overrides with password auth (`--ask-pass`) for first contact only.
### Idempotency strategy (key design challenge)
Over `network_cli`/SSH the primary module is `community.routeros.command` (RouterOS has
no rich declarative module set like `ios_*`). Idempotency is therefore the main risk and
must be deliberate:
- Prefer naturally-idempotent commands: `/.../ set` on known, named items.
- For `add`-style items, guard with RouterOS scripting: `:if ([find <selector>] = "") do={ add ... }`.
- Use `changed_when` based on command output where guards are impractical.
- Keep each domain's command set small and readable; one logical change per task.
- Cross-check against `community.routeros.facts` / `/export` output where useful.
This is explicitly called out so the implementation plan budgets for testing idempotency
(run twice, assert no changes on second run).
## Operational flows
### On-switch preparation (manual, before Ansible)
1. Confirm boot OS is **RouterOS** (not SwOS) — VLAN filtering + `community.routeros` require it.
2. Upgrade RouterOS **and** RouterBOOT firmware to a known-good stable; record as `firmware_target`.
3. **Factory-reset to no default configuration** so Ansible owns the whole config.
4. First-contact connectivity: laptop on a port, reach the device, confirm SSH reachable.
5. Decide addressing (into `host_vars`): mgmt IP/mask, mgmt VLAN, gateway, and which
port/SFP+ is the upstream **trunk/uplink** to OPNsense.
6. Record identity facts: serial, MAC, model, RouterOS version.
7. Physical: SFP+ module/DAC for the 10G uplink, PSU, mounting.
### Bootstrap (run once)
`play_bootstrap.yml`, SSH **password** auth (default/initial creds):
- create named admin user; set its password from vault;
- import `~/.ssh/id_ed25519.pub`, bind to the user;
- enable SSH service;
- verify key login works, then disable the default `admin`.
### Day-2 (normal)
`play_switch.yml`, **key-only**, applies all enabled domains idempotently.
`play_backup.yml` exports config + binary backup into `backups/<switch>/`.
## Secrets
Vault identity `makerfloss` (`~/.ansible/vault-keys/makerfloss.txt`), referenced in
`ansible.cfg`. Initial contents: the switch admin password. SSH key auth means day-2
runs need no secret at runtime. (Vault-less start is possible but we create the identity
up front.)
## Success criteria
- `play_bootstrap.yml` takes a factory-reset switch to key-based SSH access.
- `play_switch.yml` applies identity + services + users + a placeholder VLAN/port
topology, and is **idempotent** (second run reports no changes).
- `play_backup.yml` writes a usable `/export` and `.backup` into the repo.
- All real switch values live in `host_vars`; the role contains no makerspace specifics.
- `ansible-lint` and `yamllint` pass.
## Open items to confirm during planning
- Exact RouterOS `firmware_target` version to pin.
- Whether `winbox` service stays enabled (convenience) or is disabled (hardening).
- Named admin username (e.g. `sjat` vs a service account like `ansible`).
- Backup scheduling: Ansible-run on demand vs a RouterOS scheduler + fetch.
## Reference
- `narrowin/ansible-mikrotik` (GitHub) — playbook-centric; mine its backup/upgrade
command sequences. Not used as a dependency.
- `community.routeros` Ansible collection.
- AnsibleBaobabV4 — conventions source (direnv, ansible.cfg, lint, inventory cascade,
enable-flag role idiom).