MakerFLOSS_Mikrotik/docs/superpowers/plans/2026-06-07-mikrotik-crs310-ansible.md
sjat 0721ecc34c docs(plan): carry-over notes from skeleton code review
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 08:38:23 +02:00

946 lines
30 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MakerFLOSS_Mikrotik — CRS310 Ansible Management — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Build a fresh Ansible repo that configures the makerspace MikroTik CRS310-8G+2S+IN switch over SSH (identity, services, users/keys, VLANs, backups, firmware), idempotently and version-controlled.
**Architecture:** One role `makerfloss.mikrotik_switch` with per-domain task files gated by enable-flags, driven by `community.routeros` over `network_cli`. Real values live in `host_vars`; connection vars in `group_vars/mikrotik.yml`; mechanism + placeholders in role defaults. A one-time `play_bootstrap.yml` (password auth) imports the operator SSH key; day-2 `play_switch.yml` runs key-only.
**Tech Stack:** Ansible 10.x / ansible-core 2.17, `community.routeros` (+ `ansible.netcommon`), RouterOS 7.x, SSH key auth, ansible-vault (`makerfloss` identity), ansible-lint + yamllint.
**Working directory:** `~/Projects/MakerFLOSS_Mikrotik` (git repo already initialised, `main`, with the design spec committed).
**Reference spec:** `docs/superpowers/specs/2026-06-07-mikrotik-crs310-ansible-design.md`
---
## How verification works in this plan
There is no unit-test framework for RouterOS config. "Tests" in this plan are:
- **Static:** `yamllint .`, `ansible-lint`, `ansible-playbook --syntax-check`.
- **Connectivity:** `ansible -m community.routeros.command -a "commands='/system/resource/print'" <switch>`.
- **Idempotency:** run the play twice; the **second run must report `changed=0`** for that domain.
Device-dependent tasks (Phase 4 onward) require the switch to be prepared and reachable
(Phase 0). Until then, only static checks pass — that is expected and fine.
> ⚠️ **Lockout safety:** When applying VLAN/bridge or service changes, keep an independent
> recovery channel open (WinBox MAC-telnet, or serial console) so a mistake in management
> reachability doesn't strand the switch. Enable `vlan-filtering` **last**, after the
> management path is proven.
---
## Phase 0: Prerequisites (manual, out-of-band — do before Phase 4)
These are not code tasks; they gate the device-dependent phases.
- [ ] **0.1 Create the empty repo** `MakerFLOSS_Mikrotik` on `forgejo.makerfloss.eu` (no README/license, so the first push isn't rejected).
- [ ] **0.2 Confirm boot OS is RouterOS** (not SwOS) on the CRS310. Switch in RouterBOOT if needed.
- [ ] **0.3 Upgrade + pin firmware:** update RouterOS and RouterBOOT to a known-good stable (e.g. latest 7.x stable). Record the exact version string for `firmware_target` (used in Task 9).
- [ ] **0.4 Factory-reset to NO default configuration** (`/system/reset-configuration no-defaults=yes` or Netinstall). Ansible will own the whole config.
- [ ] **0.5 First contact:** connect a laptop, reach the switch (default has no IP after no-defaults reset — assign a temporary IP via WinBox MAC session, e.g. `/ip/address/add address=192.168.88.1/24 interface=ether1`, and `/ip/service/enable ssh`). Confirm `ssh admin@<ip>` works.
- [ ] **0.6 Record identity facts** into a scratch note for Task 3 host_vars: serial, base MAC, model, RouterOS version, the temporary mgmt IP/port.
- [ ] **0.7 Physical:** fit the SFP+ module/DAC for the 10G uplink; confirm PSU/mounting.
---
## Phase 1: Repo scaffolding (no device required)
### Task 1: direnv, ansible.cfg, lint configs, requirements
**Files:**
- Create: `.envrc`, `ansible.cfg`, `.ansible-lint`, `.yamllint`, `requirements.txt`, `requirements.yml`, `.gitignore`
- [ ] **Step 1: Create `.gitignore`**
```gitignore
.venv/
*.retry
__pycache__/
*.pyc
.DS_Store
```
- [ ] **Step 2: Create `.envrc`** (verbatim from AnsibleBaobabV4)
```bash
# Create .venv automatically if it doesn't exist
if [ ! -d .venv ]; then
python3 -m venv .venv
.venv/bin/python -m pip install -U pip setuptools wheel
fi
# Activate the environment manually (avoids Python 3.13 deprecation warning)
export VIRTUAL_ENV=$PWD/.venv
PATH_add .venv/bin
```
- [ ] **Step 3: Create `ansible.cfg`**
```ini
[defaults]
inventory = inventories/prod/hosts.yml
roles_path = roles:~/.ansible/roles
collections_path = ~/.ansible/collections
host_key_checking = False
retry_files_enabled = False
interpreter_python = auto_silent
nocows = 1
timeout = 30
stdout_callback = yaml
bin_ansible_callbacks = True
vault_identity_list = makerfloss@~/.ansible/vault-keys/makerfloss.txt
[persistent_connection]
command_timeout = 60
connect_timeout = 60
[ssh_connection]
pipelining = True
```
- [ ] **Step 4: Create `requirements.txt`**
```text
# Core Ansible
ansible==10.3.0
# Linting & validation
ansible-lint==24.7.0
yamllint==1.35.1
# Network connection plugins / SCP for SSH key transfer to RouterOS
paramiko>=3.4.0
scp>=0.15.0
```
- [ ] **Step 5: Create `requirements.yml`**
```yaml
---
collections:
- name: community.routeros
version: ">=3.0.0,<4.0.0"
- name: ansible.netcommon
version: ">=6.0.0,<8.0.0"
```
- [ ] **Step 6: Create `.ansible-lint`**
```yaml
---
profile: production
skip_list:
- line-length
- no-changed-when
exclude_paths:
- .venv/
- backups/
```
- [ ] **Step 7: Create `.yamllint`**
```yaml
---
extends: default
rules:
line-length: disable
comments:
min-spaces-from-content: 1
truthy:
allowed-values: ["true", "false", "yes", "no"]
ignore: |
.venv/
backups/
```
- [ ] **Step 8: Bootstrap the venv and install**
Run:
```bash
cd ~/Projects/MakerFLOSS_Mikrotik
direnv allow 2>/dev/null || python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yml
```
Expected: ansible, ansible-lint, yamllint installed; `community.routeros` + `ansible.netcommon` installed.
- [ ] **Step 9: Verify yamllint passes on the config files**
Run: `yamllint .`
Expected: no errors.
- [ ] **Step 10: Commit**
```bash
git add .gitignore .envrc ansible.cfg requirements.txt requirements.yml .ansible-lint .yamllint
git commit -m "chore: repo scaffolding (direnv, ansible.cfg, lint, requirements)"
```
---
### Task 2: Vault identity + inventory + connection group_vars
**Files:**
- Create: `~/.ansible/vault-keys/makerfloss.txt` (outside repo), `inventories/prod/hosts.yml`, `group_vars/mikrotik.yml`, `group_vars/all.yml`
- [ ] **Step 1: Create the vault key (outside the repo)**
Run:
```bash
mkdir -p ~/.ansible/vault-keys
( umask 077; openssl rand -base64 48 > ~/.ansible/vault-keys/makerfloss.txt )
chmod 600 ~/.ansible/vault-keys/makerfloss.txt
```
Expected: a 600-perm key file exists. (This is the `makerfloss` identity from `ansible.cfg`.)
- [ ] **Step 2: Create `inventories/prod/hosts.yml`**
> Replace `crs310-maker` and the `ansible_host` IP with the real values from Phase 0.6.
```yaml
---
all:
children:
mikrotik:
hosts:
crs310-maker:
ansible_host: 192.168.88.1 # temp mgmt IP until Task 4 sets the real one
```
- [ ] **Step 3: Create `group_vars/mikrotik.yml`** (connection/platform vars)
```yaml
---
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: community.routeros.routeros
ansible_user: admin
ansible_ssh_private_key_file: "~/.ssh/id_ed25519"
# Domain enable-flags (day-2 play). Override per-host if needed.
switch_identity_enabled: true
switch_users_enabled: true
switch_vlans_enabled: true
switch_backup_enabled: true
switch_firmware_enabled: false # opt-in; upgrades are disruptive
```
- [ ] **Step 4: Create `group_vars/all.yml`** (placeholder for shared, non-secret defaults)
```yaml
---
# Shared non-secret defaults across all hosts go here.
# Secrets live in the vault (see host_vars / a vaulted file), not in this file.
org_name: "MakerFLOSS"
```
- [ ] **Step 5: Verify inventory parses**
Run: `ansible-inventory --graph`
Expected: shows `@mikrotik``crs310-maker`.
- [ ] **Step 6: Commit**
```bash
git add inventories group_vars
git commit -m "feat: inventory, connection group_vars, makerfloss vault identity"
```
---
## Phase 2: Role skeleton
### Task 3: Role skeleton + host_vars + meta
**Files:**
- Create: `roles/makerfloss.mikrotik_switch/defaults/main.yml`
- Create: `roles/makerfloss.mikrotik_switch/meta/main.yml`
- Create: `roles/makerfloss.mikrotik_switch/tasks/main.yml`
- Create: `host_vars/crs310-maker.yml`
- Create: `play_switch.yml`
- [ ] **Step 1: Create `roles/makerfloss.mikrotik_switch/meta/main.yml`**
```yaml
---
galaxy_info:
role_name: mikrotik_switch
namespace: makerfloss
author: sjat
description: Configure a MikroTik RouterOS switch (CRS310) over SSH.
license: MIT
min_ansible_version: "2.17"
platforms: []
dependencies: []
```
- [ ] **Step 2: Create `roles/makerfloss.mikrotik_switch/defaults/main.yml`** (mechanism + PLACEHOLDER topology)
```yaml
---
# ----- Identity / management -----
switch_identity_name: "{{ inventory_hostname }}"
switch_mgmt_vlan_id: 99
switch_mgmt_address: "192.168.88.1/24" # PLACEHOLDER — override in host_vars
switch_mgmt_gateway: "192.168.88.254" # PLACEHOLDER — override in host_vars
switch_dns_servers: "192.168.88.254"
switch_ntp_servers: "192.168.88.254"
# Services to disable for hardening (winbox kept on by default for recovery)
switch_disabled_services:
- telnet
- ftp
- www
- www-ssl
- api
- api-ssl
switch_ssh_port: 22
# ----- Users -----
switch_admin_user: "sjat"
switch_admin_group: "full"
switch_admin_ssh_pubkey_file: "~/.ssh/id_ed25519.pub"
switch_disable_default_admin: true
# ----- VLAN / bridge / ports (PLACEHOLDER example) -----
# Real topology is defined in host_vars/<switch>.yml.
switch_bridge_name: "bridge"
switch_vlans:
- { id: 99, name: "mgmt" }
- { id: 10, name: "members" }
switch_bridge_ports:
# ether1..ether8 = 2.5GbE access ports; sfp-sfpplus1/2 = 10G uplinks
- { interface: "ether1", pvid: 10, mode: access }
- { interface: "sfp-sfpplus1", pvid: 1, mode: trunk, tagged_vlans: [99, 10] }
# ----- Firmware -----
switch_firmware_target: "" # set in host_vars when opting into upgrades
```
- [ ] **Step 3: Create `roles/makerfloss.mikrotik_switch/tasks/main.yml`** (domain dispatch)
```yaml
---
- name: Identity, management and services
ansible.builtin.import_tasks: identity.yml
when: switch_identity_enabled | bool
tags: [identity]
- name: Users and SSH keys
ansible.builtin.import_tasks: users.yml
when: switch_users_enabled | bool
tags: [users]
- name: VLANs, bridge and ports
ansible.builtin.import_tasks: vlans.yml
when: switch_vlans_enabled | bool
tags: [vlans]
- name: Backup configuration
ansible.builtin.import_tasks: backup.yml
when: switch_backup_enabled | bool
tags: [backup]
- name: Firmware upgrade
ansible.builtin.import_tasks: firmware.yml
when: switch_firmware_enabled | bool
tags: [firmware]
```
- [ ] **Step 4: Create stub `identity.yml`, `users.yml`, `vlans.yml`, `backup.yml`, `firmware.yml`**
Each stub (replaced in later tasks) is just:
```yaml
---
- name: Placeholder
ansible.builtin.debug:
msg: "not yet implemented"
```
Create all five files in `roles/makerfloss.mikrotik_switch/tasks/` with that content.
- [ ] **Step 5: Create `host_vars/crs310-maker.yml`** (REAL values from Phase 0.6)
```yaml
---
# Identity facts recorded during Phase 0.6 (edit to match the device)
switch_identity_name: "crs310-maker"
switch_mgmt_vlan_id: 99
switch_mgmt_address: "10.0.99.2/24" # EDIT: real mgmt IP
switch_mgmt_gateway: "10.0.99.1" # EDIT: real gateway
switch_dns_servers: "10.0.99.1"
switch_ntp_servers: "10.0.99.1"
switch_admin_user: "sjat"
# Real VLAN/port topology (EDIT to the makerspace plan when known)
switch_vlans:
- { id: 99, name: "mgmt" }
- { id: 10, name: "members" }
switch_bridge_ports:
- { interface: "ether1", pvid: 10, mode: access }
- { interface: "ether2", pvid: 10, mode: access }
- { interface: "sfp-sfpplus1", pvid: 1, mode: trunk, tagged_vlans: [99, 10] }
# Firmware (opt-in)
# switch_firmware_enabled: true
# switch_firmware_target: "7.x.y" # EDIT to the version pinned in Phase 0.3
```
- [ ] **Step 6: Create `play_switch.yml`**
```yaml
---
- name: Configure MikroTik switches (day-2, key auth)
hosts: mikrotik
gather_facts: false
roles:
- makerfloss.mikrotik_switch
```
- [ ] **Step 7: Verify syntax + lint**
Run:
```bash
ansible-playbook play_switch.yml --syntax-check
yamllint .
ansible-lint
```
Expected: syntax OK; yamllint clean; ansible-lint clean (fix any findings).
- [ ] **Step 8: Commit**
```bash
git add roles host_vars play_switch.yml
git commit -m "feat: role skeleton, host_vars, day-2 play (stubbed domains)"
```
---
## Phase 3: Bootstrap play (device-dependent — needs Phase 0 done)
### Task 4: First-contact bootstrap — create user, import SSH key
**Files:**
- Create: `play_bootstrap.yml`
- Create: `group_vars/mikrotik.vault.yml` (vaulted admin password)
- [ ] **Step 1: Create the vaulted admin password file**
Run:
```bash
ansible-vault create group_vars/mikrotik.vault.yml
```
Put in it:
```yaml
---
vault_switch_admin_password: "CHOOSE-A-STRONG-PASSWORD"
```
Expected: file is encrypted (`head -1` shows `$ANSIBLE_VAULT`).
- [ ] **Step 2: Create `play_bootstrap.yml`**
> Run with password auth the first time: `ansible-playbook play_bootstrap.yml -e ansible_user=admin --ask-pass`.
> `net_put` copies the public key file to the device over SCP, then RouterOS imports it.
```yaml
---
- name: Bootstrap MikroTik switch (first contact, password auth)
hosts: mikrotik
gather_facts: false
vars:
pubkey_local: "{{ switch_admin_ssh_pubkey_file | default('~/.ssh/id_ed25519.pub') }}"
pubkey_remote: "id_ansible.pub"
tasks:
- name: Create named admin user (idempotent)
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="{{ switch_admin_user }}"]] = 0) do={
/user add name="{{ switch_admin_user }}" group="{{ switch_admin_group }}"
password="{{ vault_switch_admin_password }}" }
register: user_create
changed_when: true
- name: Copy operator public key to the switch
ansible.netcommon.net_put:
src: "{{ pubkey_local }}"
dest: "{{ pubkey_remote }}"
- name: Import the SSH public key for the admin user
community.routeros.command:
commands:
- /user/ssh-keys/import public-key-file="{{ pubkey_remote }}" user="{{ switch_admin_user }}"
register: key_import
changed_when: true
- name: Ensure SSH service is enabled
community.routeros.command:
commands:
- /ip/service/set ssh disabled=no port={{ switch_ssh_port | default(22) }}
changed_when: true
```
- [ ] **Step 3: Syntax check**
Run: `ansible-playbook play_bootstrap.yml --syntax-check`
Expected: OK.
- [ ] **Step 4: Run bootstrap against the switch (password auth)**
Run:
```bash
ansible-playbook play_bootstrap.yml -e ansible_user=admin --ask-pass
```
Expected: user created, key file copied, key imported, SSH enabled. (Keep a WinBox MAC session open per the lockout note.)
- [ ] **Step 5: Prove key login works**
Run:
```bash
ansible -m community.routeros.command -a "commands='/user/print'" crs310-maker
```
Expected: succeeds using `~/.ssh/id_ed25519` (no password prompt), and lists your named user.
- [ ] **Step 6: Commit**
```bash
git add play_bootstrap.yml group_vars/mikrotik.vault.yml
git commit -m "feat: first-contact bootstrap play (named admin + SSH key import)"
```
---
## Phase 4: Domain tasks (device-dependent — idempotency-verified)
> For every task below: run `ansible-playbook play_switch.yml --tags <domain> --limit crs310-maker`
> **twice**; the second run must report `changed=0` (or all `changed_when: false`).
> RouterOS `:if ([:len [... find ...]] = 0)` guards make `add` idempotent.
### Task 5: identity.yml — identity, mgmt IP, DNS/NTP, service hardening
**Files:**
- Modify: `roles/makerfloss.mikrotik_switch/tasks/identity.yml`
- [ ] **Step 1: Replace `identity.yml` with the real implementation**
```yaml
---
- name: Set system identity
community.routeros.command:
commands:
- /system/identity/set name="{{ switch_identity_name }}"
changed_when: false
- name: Configure DNS
community.routeros.command:
commands:
- /ip/dns/set servers="{{ switch_dns_servers }}" allow-remote-requests=no
changed_when: false
- name: Configure NTP client
community.routeros.command:
commands:
- /system/ntp/client/set enabled=yes servers="{{ switch_ntp_servers }}"
changed_when: false
- name: Disable unused services
community.routeros.command:
commands: >-
{{ switch_disabled_services
| map('regex_replace', '^(.*)$', '/ip/service/set \1 disabled=yes')
| list }}
changed_when: false
- name: Set SSH service port
community.routeros.command:
commands:
- /ip/service/set ssh disabled=no port={{ switch_ssh_port }}
changed_when: false
```
- [ ] **Step 2: Run twice, assert idempotent**
Run (twice):
```bash
ansible-playbook play_switch.yml --tags identity --limit crs310-maker
```
Expected: completes cleanly both runs; `/system/identity/print` shows the new name; disabled services show `X` (disabled).
- [ ] **Step 3: Commit**
```bash
git add roles/makerfloss.mikrotik_switch/tasks/identity.yml
git commit -m "feat(identity): identity, DNS, NTP, service hardening"
```
---
### Task 6: users.yml — ensure admin user, key, disable default admin
**Files:**
- Modify: `roles/makerfloss.mikrotik_switch/tasks/users.yml`
- [ ] **Step 1: Replace `users.yml`**
```yaml
---
- name: Ensure named admin user exists
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="{{ switch_admin_user }}"]] = 0) do={
/user add name="{{ switch_admin_user }}" group="{{ switch_admin_group }}" }
changed_when: false
- name: Disable the default admin user
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="admin"]] > 0) do={
/user/set admin disabled=yes }
when: switch_disable_default_admin | bool
changed_when: false
```
- [ ] **Step 2: Run twice, assert idempotent**
Run (twice):
```bash
ansible-playbook play_switch.yml --tags users --limit crs310-maker
```
Expected: clean both runs. **Before this lands, confirm key login as the named user works** (Task 4 Step 5), or disabling `admin` could lock you out.
- [ ] **Step 3: Commit**
```bash
git add roles/makerfloss.mikrotik_switch/tasks/users.yml
git commit -m "feat(users): ensure named admin, disable default admin"
```
---
### Task 7: vlans.yml — VLAN-aware bridge, ports, mgmt interface
**Files:**
- Modify: `roles/makerfloss.mikrotik_switch/tasks/vlans.yml`
> Ordering matters to avoid lockout: create bridge (filtering OFF) → add ports → define
> VLANs → add mgmt VLAN interface + IP → enable `vlan-filtering` LAST.
- [ ] **Step 1: Replace `vlans.yml`**
```yaml
---
- name: Create VLAN-aware bridge (filtering off initially)
community.routeros.command:
commands:
- >-
:if ([:len [/interface/bridge/find name="{{ switch_bridge_name }}"]] = 0) do={
/interface/bridge/add name="{{ switch_bridge_name }}" vlan-filtering=no }
changed_when: false
- name: Add bridge ports with PVIDs
community.routeros.command:
commands:
- >-
:if ([:len [/interface/bridge/port/find interface="{{ item.interface }}"]] = 0) do={
/interface/bridge/port/add bridge="{{ switch_bridge_name }}"
interface="{{ item.interface }}" pvid={{ item.pvid }} }
else={ /interface/bridge/port/set
[find interface="{{ item.interface }}"] pvid={{ item.pvid }} }
loop: "{{ switch_bridge_ports }}"
loop_control:
label: "{{ item.interface }}"
changed_when: false
- name: Define bridge VLANs (tagged/untagged membership)
community.routeros.command:
commands:
- >-
:if ([:len [/interface/bridge/vlan/find vlan-ids={{ item.id }}]] = 0) do={
/interface/bridge/vlan/add bridge="{{ switch_bridge_name }}" vlan-ids={{ item.id }}
tagged="{{ ([switch_bridge_name] + (switch_bridge_ports
| selectattr('mode','equalto','trunk')
| selectattr('tagged_vlans','defined')
| selectattr('tagged_vlans','contains', item.id)
| map(attribute='interface') | list)) | join(',') }}"
untagged="{{ switch_bridge_ports
| selectattr('mode','equalto','access')
| selectattr('pvid','equalto', item.id)
| map(attribute='interface') | list | join(',') }}" }
loop: "{{ switch_vlans }}"
loop_control:
label: "vlan {{ item.id }}"
changed_when: false
- name: Create management VLAN interface
community.routeros.command:
commands:
- >-
:if ([:len [/interface/vlan/find name="vlan-mgmt"]] = 0) do={
/interface/vlan/add name="vlan-mgmt" interface="{{ switch_bridge_name }}"
vlan-id={{ switch_mgmt_vlan_id }} }
changed_when: false
- name: Assign management IP address
community.routeros.command:
commands:
- >-
:if ([:len [/ip/address/find interface="vlan-mgmt"]] = 0) do={
/ip/address/add address="{{ switch_mgmt_address }}" interface="vlan-mgmt" }
changed_when: false
- name: Set default gateway route
community.routeros.command:
commands:
- >-
:if ([:len [/ip/route/find dst-address="0.0.0.0/0"]] = 0) do={
/ip/route/add dst-address=0.0.0.0/0 gateway="{{ switch_mgmt_gateway }}" }
changed_when: false
- name: Enable VLAN filtering (LAST — verify mgmt reachability first)
community.routeros.command:
commands:
- /interface/bridge/set "{{ switch_bridge_name }}" vlan-filtering=yes
changed_when: false
```
- [ ] **Step 2: Run twice, assert idempotent — WITH a recovery channel open**
Run (twice), keeping WinBox MAC session open:
```bash
ansible-playbook play_switch.yml --tags vlans --limit crs310-maker
```
Expected: clean both runs. Verify `/interface/bridge/vlan/print` shows correct tagged/untagged sets and you can still reach the mgmt IP after `vlan-filtering=yes`.
- [ ] **Step 3: Commit**
```bash
git add roles/makerfloss.mikrotik_switch/tasks/vlans.yml
git commit -m "feat(vlans): VLAN-aware bridge, ports, mgmt interface"
```
---
### Task 8: backup.yml — export + binary backup, fetch into repo
**Files:**
- Modify: `roles/makerfloss.mikrotik_switch/tasks/backup.yml`
- Create: `play_backup.yml`
- Create: `backups/.gitkeep`
- [ ] **Step 1: Replace `backup.yml`**
```yaml
---
- name: Generate a config export on the device
community.routeros.command:
commands:
- /export file=export
changed_when: false
- name: Generate a binary system backup on the device
community.routeros.command:
commands:
- /system/backup/save name=backup dont-encrypt=yes
changed_when: false
- name: Fetch the export file into the repo
ansible.netcommon.net_get:
src: "export.rsc"
dest: "{{ playbook_dir }}/backups/{{ inventory_hostname }}/export.rsc"
- name: Fetch the binary backup into the repo
ansible.netcommon.net_get:
src: "backup.backup"
dest: "{{ playbook_dir }}/backups/{{ inventory_hostname }}/backup.backup"
```
- [ ] **Step 2: Create `play_backup.yml`**
```yaml
---
- name: Back up MikroTik switch configuration
hosts: mikrotik
gather_facts: false
tasks:
- name: Ensure local backup directory exists
ansible.builtin.file:
path: "{{ playbook_dir }}/backups/{{ inventory_hostname }}"
state: directory
mode: "0755"
delegate_to: localhost
- name: Run backup tasks
ansible.builtin.include_role:
name: makerfloss.mikrotik_switch
tasks_from: backup.yml
```
- [ ] **Step 3: Create `backups/.gitkeep`** (empty file) so the dir exists.
- [ ] **Step 4: Run the backup play**
Run:
```bash
ansible-playbook play_backup.yml --limit crs310-maker
```
Expected: `backups/crs310-maker/export.rsc` and `backup.backup` appear locally and are non-empty.
- [ ] **Step 5: Commit (export only — binary backup may contain secrets)**
```bash
echo 'backups/**/*.backup' >> .gitignore
git add roles/makerfloss.mikrotik_switch/tasks/backup.yml play_backup.yml backups/crs310-maker/export.rsc .gitignore
git commit -m "feat(backup): export + binary backup, fetch into repo"
```
---
### Task 9: firmware.yml — RouterOS/RouterBOOT upgrade to pinned target
**Files:**
- Modify: `roles/makerfloss.mikrotik_switch/tasks/firmware.yml`
> Opt-in only (`switch_firmware_enabled: true` + `switch_firmware_target` set in host_vars).
> Upgrades reboot the switch — run deliberately, with a recovery channel.
- [ ] **Step 1: Replace `firmware.yml`**
```yaml
---
- name: Assert a firmware target is set
ansible.builtin.assert:
that:
- switch_firmware_target | length > 0
fail_msg: "switch_firmware_target must be set in host_vars to run firmware upgrades."
- name: Read current RouterOS version
community.routeros.facts:
register: ros_facts
- name: Upgrade RouterOS to the target channel and reboot
community.routeros.command:
commands:
- /system/package/update/set channel=stable
- /system/package/update/install
when: ansible_net_version is version(switch_firmware_target, '<')
changed_when: true
- name: Pause for device reboot
ansible.builtin.wait_for_connection:
delay: 30
timeout: 300
when: ansible_net_version is version(switch_firmware_target, '<')
- name: Upgrade RouterBOOT firmware to match RouterOS
community.routeros.command:
commands:
- /system/routerboard/upgrade
changed_when: true
- name: Reboot to apply RouterBOOT upgrade
community.routeros.command:
commands:
- /system/reboot
changed_when: true
ignore_errors: true # connection drops on reboot; expected
```
- [ ] **Step 2: Syntax + lint only (do NOT auto-run upgrades in CI)**
Run:
```bash
ansible-playbook play_switch.yml --syntax-check
ansible-lint
```
Expected: clean.
- [ ] **Step 3: (Manual, optional) run the upgrade deliberately**
Run:
```bash
ansible-playbook play_switch.yml --tags firmware --limit crs310-maker \
-e switch_firmware_enabled=true
```
Expected: upgrades only if current `< switch_firmware_target`; switch reboots and comes back.
- [ ] **Step 4: Commit**
```bash
git add roles/makerfloss.mikrotik_switch/tasks/firmware.yml
git commit -m "feat(firmware): opt-in RouterOS + RouterBOOT upgrade to pinned target"
```
---
## Phase 5: Docs and publish
### Task 10: README, role README, CLAUDE.md, push to Forgejo
**Files:**
- Create: `README.md`, `roles/makerfloss.mikrotik_switch/README.md`, `CLAUDE.md`
- [ ] **Step 1: Create `README.md`** covering: purpose, prerequisites (Phase 0 checklist), setup (`direnv allow`, `pip install`, `ansible-galaxy install`), bootstrap (`play_bootstrap.yml --ask-pass`), day-2 (`play_switch.yml`), backup (`play_backup.yml`), and the lockout-safety note.
- [ ] **Step 2: Create `roles/makerfloss.mikrotik_switch/README.md`** documenting every variable in `defaults/main.yml`, the enable-flags, and the `switch_bridge_ports`/`switch_vlans` data shapes with an example.
- [ ] **Step 3: Create `CLAUDE.md`** — short project guide: tech stack, structure, essential commands (lint, syntax-check, bootstrap, day-2, backup), the idempotency rule, and the lockout-safety rule.
- [ ] **Step 4: Final static verification**
Run:
```bash
yamllint . && ansible-lint && ansible-playbook play_switch.yml --syntax-check
```
Expected: all clean.
- [ ] **Step 5: Add remote and push**
Run:
```bash
git remote add origin git@forgejo.makerfloss.eu:<owner>/MakerFLOSS_Mikrotik.git
git add README.md roles/makerfloss.mikrotik_switch/README.md CLAUDE.md
git commit -m "docs: README, role README, CLAUDE.md"
git push -u origin main
```
Expected: repo populated on `forgejo.makerfloss.eu`.
---
## Self-review checklist (run before execution)
- [ ] **Spec coverage:** identity/services (Task 5), users/keys (Tasks 4,6), VLANs/bridge/ports (Task 7), backups (Task 8), firmware (Task 9), bring-over conventions (Tasks 12), separate vault (Task 2), placeholder topology overridable in host_vars (Tasks 3,7). ✔
- [ ] **Open items** from the spec are surfaced in the plan: firmware target (Phase 0.3 / Task 9), winbox on/off (`switch_disabled_services` default keeps winbox), admin username (`switch_admin_user`), backup scheduling (on-demand `play_backup.yml`; RouterOS scheduler left as a future enhancement).
- [ ] **Idempotency** is explicitly tested (run-twice) on every device-touching task.
- [ ] **Lockout safety** called out at the top and on Tasks 6 and 7.
## Notes / risks to validate during execution
- **RouterOS version drift:** exact CLI syntax (NTP `servers=` property, `ssh-keys/import` path) is RouterOS-7 specific; verify against the pinned version from Phase 0.3 and adjust.
- **`net_put`/`net_get` over `network_cli`:** depends on SCP being available on the RouterOS SSH service; if it fails, fall back to importing the key by pasting its contents via `/user/ssh-keys/...` or enabling SCP.
- **`changed_when: false`** is used widely because the `command` module can't detect RouterOS state changes; idempotency comes from the `:if [find]` guards. Revisit if you want accurate change reporting (parse command output).
## Carry-over notes from the skeleton code review (Tasks 13, done 2026-06-07)
The no-device tasks (13) are implemented, reviewed, and committed on branch
`feat/initial-scaffolding`. The code-quality review of the role skeleton raised these
points to handle WHEN the device task files (Tasks 59) are written:
- **`switch_ssh_port` (default 22):** the identity task will *set* the SSH port. If the
device was manually moved to a non-standard port before Ansible manages it, the first
run resets it to 22 and the connection drops. Confirm the live port matches before the
identity task runs, or override `switch_ssh_port` in host_vars.
- **`switch_bridge_name` / `switch_admin_group`:** these default to the CRS310 factory
values (`bridge` / `full`) and are NOT overridden in host_vars. Correct for this one
device; if the bridge/group name ever differs, the VLAN and users tasks silently target
the wrong object. Add explicit host_vars overrides if a second device is ever onboarded.
- **Trunk `pvid: 1` (sfp-sfpplus1):** untagged frames on the uplink land in VLAN 1. In a
hardened VLAN design VLAN 1 is usually unused — when writing `vlans.yml`, decide
deliberately whether the trunk should accept untagged traffic at all, and comment intent.
- **host_vars `# EDIT:` placeholders:** `switch_mgmt_address/gateway/dns/ntp` in
`host_vars/crs310-maker.yml` hold plausible `10.0.99.x` placeholders. Replace with the
real values from the field guide (Step 7) and remove the `# EDIT` comments so it's
unambiguous they were updated.