MakerFLOSS_Mikrotik/docs/superpowers/specs/2026-06-07-mikrotik-crs310-ansible-design.md
sjat f1d7b3059c docs: CRS310 Ansible management design (brainstorming spec)
Initial design doc for managing the makerspace MikroTik CRS310-8G+2S+IN
switch as IaC over SSH with community.routeros. Single-switch scope,
fresh repo in AnsibleBaobabV4 conventions, separate makerfloss vault.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 08:04:56 +02:00

8.8 KiB
Raw Blame History

MakerFLOSS_Mikrotik — CRS310 Ansible Management — Design

Date: 2026-06-07 Status: Approved (brainstorming complete; pending implementation plan) Author: sjat + Claude

Purpose

Manage the makerspace's MikroTik CRS310-8G+2S+IN 10-port switch (8× 2.5GbE + 2× SFP+ 10G, RouterOS) as Infrastructure-as-Code with Ansible. Goal: deterministic, idempotent, version-controlled switch configuration — identity, management access, users/keys, VLAN switching, backups, and firmware — so the switch can be rebuilt from the repo with no manual WinBox clicking.

Scope

In scope (this iteration): a single CRS310 switch, configured over SSH.

Configuration domains, each gated by an enable-flag:

  1. Identity + management + services — hostname/identity, management IP/VLAN, NTP/DNS, enable SSH, disable unused services (telnet, ftp, www, api; winbox decision in Open Items).
  2. Users + SSH keys — named admin user, import operator SSH public key, harden/disable the default admin.
  3. VLANs + bridge + ports — bridge with hardware-offload VLAN filtering, access/trunk port assignments, SFP+ as upstream trunk. Ships with a placeholder example topology; real VLAN IDs/port map filled into host_vars later.
  4. Backups + firmware — scheduled /export + /system backup, fetched into the repo; RouterOS/RouterBOOT upgrade flow to a pinned target version.

Out of scope (for now): additional MikroTik devices, APs, routers; the REST API transport; CI/molecule testing; monitoring integration. Structure should not prevent these later, but we build only the single-switch path.

Decisions (from brainstorming)

Topic Decision
Project / repo name MakerFLOSS_Mikrotik (underscore; hyphen acceptable)
Repo host New repo on forgejo.makerfloss.eu, remote origin, default branch main
Location Sibling directory ~/Projects/MakerFLOSS_Mikrotik
Transport SSH via network_cli (community.routeros), key auth for day-2
Role namespace makerfloss.* → role makerfloss.mikrotik_switch
Vault Separate identity makerfloss at ~/.ansible/vault-keys/makerfloss.txt — NOT the home prod key
Config location All real values in host_vars/<switch>.yml; connection vars in group_vars/mikrotik.yml; mechanism + placeholders in role defaults/
Base Fresh repo in AnsibleBaobabV4 conventions; cherry-pick narrowin/ansible-mikrotik command sequences for backup/upgrade
Clean slate Factory-reset switch to no default configuration; Ansible owns the entire config
Default admin Create named admin user + import key; disable the default admin after key login is proven

What to bring over from AnsibleBaobabV4

Copy + trim (independent repos; do not symlink):

  • .envrc + .venv direnv bootstrap — verbatim.
  • ansible.cfg — adapted: host_key_checking=False, vault_identity_list = makerfloss@~/.ansible/vault-keys/makerfloss.txt, network-CLI-friendly defaults.
  • .ansible-lint + yamllint config — verbatim.
  • requirements.txt — trimmed to ansible, ansible-lint, yamllint (drop molecule/docker/snipe/kuma).
  • requirements.ymlcommunity.routeros (pulls in ansible.netcommon).
  • Inventory cascade pattern: inventories/prod/hosts.yml with one host in group mikrotik.
  • Operator SSH public key ~/.ssh/id_ed25519.pub → imported onto the switch admin user.
  • Forgejo push key ~/.ssh/id_ed25519_forgejo already exists (used for git push).

Architecture

Repo layout

MakerFLOSS_Mikrotik/
├── .envrc / .ansible-lint / .yamllint / ansible.cfg
├── requirements.txt / requirements.yml
├── inventories/
│   └── prod/hosts.yml                 # group: mikrotik -> one switch host
├── group_vars/
│   └── mikrotik.yml                   # connection/platform vars (network_cli, network_os, user, key)
├── host_vars/
│   └── <switch>.yml                   # identity, mgmt IP/VLAN, VLAN+port map, firmware_target
├── roles/
│   └── makerfloss.mikrotik_switch/
│       ├── defaults/main.yml          # enable-flags, safe defaults, PLACEHOLDER vlan/port map
│       ├── tasks/main.yml             # imports domain task files, each gated by a flag
│       ├── tasks/identity.yml         # identity, mgmt IP, NTP/DNS, SSH on, unused services off
│       ├── tasks/users.yml            # named admin, import ssh pubkey, disable default admin
│       ├── tasks/vlans.yml            # bridge + hw VLAN filtering, access/trunk ports, SFP+ uplink
│       ├── tasks/backup.yml           # /export + /system backup save, fetch into repo
│       └── tasks/firmware.yml         # RouterOS + RouterBOOT upgrade to firmware_target
├── playbooks (or top-level):
│   ├── play_bootstrap.yml             # FIRST CONTACT: password auth -> create user, import key
│   ├── play_switch.yml                # day-2: key-only, applies all enabled domains
│   └── play_backup.yml               # on-demand/scheduled backup fetch
├── backups/<switch>/                  # fetched config exports + .backup files
└── docs/superpowers/specs/            # this design doc

Connection model

group_vars/mikrotik.yml:

  • ansible_connection: ansible.netcommon.network_cli
  • ansible_network_os: community.routeros.routeros
  • ansible_user: <admin user>
  • ansible_ssh_private_key_file: ~/.ssh/id_ed25519 (day-2, key auth)

play_bootstrap.yml overrides with password auth (--ask-pass) for first contact only.

Idempotency strategy (key design challenge)

Over network_cli/SSH the primary module is community.routeros.command (RouterOS has no rich declarative module set like ios_*). Idempotency is therefore the main risk and must be deliberate:

  • Prefer naturally-idempotent commands: /.../ set on known, named items.
  • For add-style items, guard with RouterOS scripting: :if ([find <selector>] = "") do={ add ... }.
  • Use changed_when based on command output where guards are impractical.
  • Keep each domain's command set small and readable; one logical change per task.
  • Cross-check against community.routeros.facts / /export output where useful.

This is explicitly called out so the implementation plan budgets for testing idempotency (run twice, assert no changes on second run).

Operational flows

On-switch preparation (manual, before Ansible)

  1. Confirm boot OS is RouterOS (not SwOS) — VLAN filtering + community.routeros require it.
  2. Upgrade RouterOS and RouterBOOT firmware to a known-good stable; record as firmware_target.
  3. Factory-reset to no default configuration so Ansible owns the whole config.
  4. First-contact connectivity: laptop on a port, reach the device, confirm SSH reachable.
  5. Decide addressing (into host_vars): mgmt IP/mask, mgmt VLAN, gateway, and which port/SFP+ is the upstream trunk/uplink to OPNsense.
  6. Record identity facts: serial, MAC, model, RouterOS version.
  7. Physical: SFP+ module/DAC for the 10G uplink, PSU, mounting.

Bootstrap (run once)

play_bootstrap.yml, SSH password auth (default/initial creds):

  • create named admin user; set its password from vault;
  • import ~/.ssh/id_ed25519.pub, bind to the user;
  • enable SSH service;
  • verify key login works, then disable the default admin.

Day-2 (normal)

play_switch.yml, key-only, applies all enabled domains idempotently. play_backup.yml exports config + binary backup into backups/<switch>/.

Secrets

Vault identity makerfloss (~/.ansible/vault-keys/makerfloss.txt), referenced in ansible.cfg. Initial contents: the switch admin password. SSH key auth means day-2 runs need no secret at runtime. (Vault-less start is possible but we create the identity up front.)

Success criteria

  • play_bootstrap.yml takes a factory-reset switch to key-based SSH access.
  • play_switch.yml applies identity + services + users + a placeholder VLAN/port topology, and is idempotent (second run reports no changes).
  • play_backup.yml writes a usable /export and .backup into the repo.
  • All real switch values live in host_vars; the role contains no makerspace specifics.
  • ansible-lint and yamllint pass.

Open items to confirm during planning

  • Exact RouterOS firmware_target version to pin.
  • Whether winbox service stays enabled (convenience) or is disabled (hardening).
  • Named admin username (e.g. sjat vs a service account like ansible).
  • Backup scheduling: Ansible-run on demand vs a RouterOS scheduler + fetch.

Reference

  • narrowin/ansible-mikrotik (GitHub) — playbook-centric; mine its backup/upgrade command sequences. Not used as a dependency.
  • community.routeros Ansible collection.
  • AnsibleBaobabV4 — conventions source (direnv, ansible.cfg, lint, inventory cascade, enable-flag role idiom).