Compare commits

..

No commits in common. "67554c0b38065dbdbeb82e4c51e38617f1c0979c" and "12001abac6ab99c6e5487c741486671966b23675" have entirely different histories.

14 changed files with 29 additions and 371 deletions

1
.gitignore vendored
View file

@ -3,4 +3,3 @@
__pycache__/ __pycache__/
*.pyc *.pyc
.DS_Store .DS_Store
backups/**/*.backup

View file

@ -18,7 +18,7 @@ conventions this repo copies); independent repo on `forgejo.makerfloss.eu`.
- `group_vars/mikrotik.vault.yml` — encrypted password (excluded from linters) - `group_vars/mikrotik.vault.yml` — encrypted password (excluded from linters)
- `host_vars/crs310-maker.yml` — device facts, real addressing, VLAN/port map - `host_vars/crs310-maker.yml` — device facts, real addressing, VLAN/port map
- `roles/makerfloss.mikrotik_switch/` — one role, per-domain task files gated by flags - `roles/makerfloss.mikrotik_switch/` — one role, per-domain task files gated by flags
- `play_switch.yml` (day-2), `play_bootstrap.yml` (first contact), `play_backup.yml` - `play_switch.yml` (day-2), `play_bootstrap.yml` / `play_backup.yml` (to implement)
- `docs/` — field guide, design spec, implementation plan - `docs/` — field guide, design spec, implementation plan
## Essential commands ## Essential commands
@ -43,12 +43,7 @@ ansible-vault view group_vars/mikrotik.vault.yml # read a secret
## Status / next ## Status / next
Bootstrap is done (user `sjat` + key + identity `crs310-maker`, RouterOS 7.19.6 pinned; Bootstrap is done (user `sjat` + key + identity `crs310-maker`, RouterOS 7.19.6 pinned).
default `admin` now disabled). All per-domain task files are **implemented**: The per-domain task files are **stubs**; implement them per
`identity`, `users`, `backup`, `firmware` (opt-in) and `play_bootstrap` / `play_backup` `docs/superpowers/plans/2026-06-07-mikrotik-crs310-ansible.md` (Tasks 59), reading the
are idempotency-verified against the device. `vlans` is implemented and Jinja-validated "carry-over notes" at the end of that plan first.
but its **device run is deferred** — the `host_vars` topology is still a placeholder.
Next, on-site with a recovery channel: drop the real VLAN/port map into `host_vars`,
reconcile the legacy defconf IP (`192.168.88.1/24` lives directly on `bridge`), then run
`--tags vlans` and confirm mgmt reachability before/after `vlan-filtering=yes`.

View file

@ -12,13 +12,10 @@ rebuilt from this repo instead of by hand in WinBox.
|---|---| |---|---|
| Repo scaffolding, role skeleton, vault | ✅ done | | Repo scaffolding, role skeleton, vault | ✅ done |
| On-site device prep + **bootstrap** (named user + SSH key + identity) | ✅ done (2026-06-08) | | On-site device prep + **bootstrap** (named user + SSH key + identity) | ✅ done (2026-06-08) |
| `identity` / `users` / `backup` / `firmware` + `play_bootstrap` / `play_backup` | ✅ implemented; idempotency-verified against the device (firmware is opt-in, lint/syntax only) | | Day-2 config: `identity` / `users` / `vlans` / `backup` / `firmware` tasks | ⏳ **stubs** — to implement (see `docs/superpowers/plans/`) |
| `vlans` (VLAN-aware bridge, ports, mgmt iface) | ✅ implemented + Jinja-validated; **device run deferred** — needs the real VLAN/port plan and an on-site recovery channel before `vlan-filtering` is enabled |
The switch is reachable today by key auth as user `sjat`. All task files now carry their The switch is reachable today by key auth as user `sjat`; the per-domain task files
real RouterOS logic. The `vlans` topology in `host_vars` is still a **placeholder**: still need their real RouterOS logic written and idempotency-tested.
replace it with the real makerspace VLAN ids + per-port map before running `--tags vlans`
on the live device, and do so on-site with a serial/WinBox-MAC recovery channel open.
## Layout ## Layout
@ -65,16 +62,15 @@ community.routeros.routeros`, authenticating with the operator SSH key
yamllint . && ansible-lint && ansible-playbook play_switch.yml --syntax-check yamllint . && ansible-lint && ansible-playbook play_switch.yml --syntax-check
# First contact on a fresh/reset device (password auth, one time) # First contact on a fresh/reset device (password auth, one time)
ansible-playbook play_bootstrap.yml -e ansible_user=admin --ask-pass ansible-playbook play_bootstrap.yml --ask-pass # (play to be implemented)
# Day-2 configuration (key auth, idempotent) # Day-2 configuration (key auth, idempotent)
ansible-playbook play_switch.yml ansible-playbook play_switch.yml
ansible-playbook play_switch.yml --tags identity,users # safe domains ansible-playbook play_switch.yml --tags vlans # one domain
ansible-playbook play_switch.yml --tags vlans # on-site only — see lockout note
ansible-playbook play_switch.yml --limit crs310-maker ansible-playbook play_switch.yml --limit crs310-maker
# Backup config into the repo # Backup config into the repo
ansible-playbook play_backup.yml ansible-playbook play_backup.yml # (play to be implemented)
``` ```
## ⚠️ Lockout safety ## ⚠️ Lockout safety

View file

View file

@ -1,35 +0,0 @@
# 2025-09-11 09:49:07 by RouterOS 7.19.6
# software id = 73S3-5F2W
#
# model = CRS310-8G+2S+
# serial number = HM40B8TDNDD
/interface bridge
add admin-mac=D0:EA:11:24:F4:AA auto-mac=no comment=defconf name=bridge
/interface bridge port
add bridge=bridge comment=defconf interface=ether1
add bridge=bridge comment=defconf interface=ether2
add bridge=bridge comment=defconf interface=ether3
add bridge=bridge comment=defconf interface=ether4
add bridge=bridge comment=defconf interface=ether5
add bridge=bridge comment=defconf interface=ether6
add bridge=bridge comment=defconf interface=ether7
add bridge=bridge comment=defconf interface=ether8
add bridge=bridge comment=defconf interface=sfp-sfpplus1
add bridge=bridge comment=defconf interface=sfp-sfpplus2
/ip address
add address=192.168.88.1/24 comment=defconf interface=bridge network=\
192.168.88.0
/ip dns
set servers=10.0.99.1
/ip service
set ftp disabled=yes
set telnet disabled=yes
set www disabled=yes
set api disabled=yes
set api-ssl disabled=yes
/system identity
set name=crs310-maker
/system ntp client
set enabled=yes
/system ntp client servers
add address=10.0.99.1

View file

@ -10,10 +10,6 @@
# group_vars/mikrotik.vault.yml). Key login verified. Default `admin` still enabled # group_vars/mikrotik.vault.yml). Key login verified. Default `admin` still enabled
# (not yet hardened). Switch currently on the bench at 192.168.88.1 (defconf, not yet # (not yet hardened). Switch currently on the bench at 192.168.88.1 (defconf, not yet
# reset/VLAN-configured). Real mgmt addressing below is the FUTURE production plan. # reset/VLAN-configured). Real mgmt addressing below is the FUTURE production plan.
# Day-2 connection: key auth as the named admin user (overrides the bootstrap
# default ansible_user=admin in group_vars/mikrotik.yml).
ansible_user: sjat
switch_identity_name: "crs310-maker" switch_identity_name: "crs310-maker"
switch_mgmt_vlan_id: 99 switch_mgmt_vlan_id: 99
switch_mgmt_address: "10.0.99.2/24" # EDIT: real mgmt IP switch_mgmt_address: "10.0.99.2/24" # EDIT: real mgmt IP
@ -23,15 +19,7 @@ switch_ntp_servers: "10.0.99.1"
switch_admin_user: "sjat" switch_admin_user: "sjat"
# PLACEHOLDER VLAN/port topology — vlans.yml is correct mechanism, but these IDs # Real VLAN/port topology (EDIT to the makerspace plan when known)
# and the per-port map are NOT the real makerspace plan. Replace with the real
# VLAN ids + full ether1-8/sfp map before any on-site VLAN run. Notes:
# - mode: access -> untagged member of `pvid`; mode: trunk -> tagged member of
# each id in `tagged_vlans`, with `pvid` as the native (untagged) VLAN.
# - trunk pvid: 1 means untagged frames on the uplink land in VLAN 1 (unused in a
# hardened design). Decide deliberately whether the uplink should carry any
# untagged traffic; set pvid to an intended native VLAN or leave 1 as a dead end.
# - the bridge (CPU) is tagged ONLY on switch_mgmt_vlan_id (see vlans.yml).
switch_vlans: switch_vlans:
- {id: 99, name: "mgmt"} - {id: 99, name: "mgmt"}
- {id: 10, name: "members"} - {id: 10, name: "members"}

View file

@ -1,16 +0,0 @@
---
- name: Back up MikroTik switch configuration
hosts: mikrotik
gather_facts: false
tasks:
- name: Ensure local backup directory exists
ansible.builtin.file:
path: "{{ playbook_dir }}/backups/{{ inventory_hostname }}"
state: directory
mode: "0755"
delegate_to: localhost
- name: Run backup tasks
ansible.builtin.include_role:
name: makerfloss.mikrotik_switch
tasks_from: backup.yml

View file

@ -1,52 +0,0 @@
---
# FIRST-CONTACT bootstrap (run once, password auth):
# ansible-playbook play_bootstrap.yml -e ansible_user=admin --ask-pass
# Creates the named admin user, imports the operator SSH public key over SCP, and
# enables SSH so day-2 runs (play_switch.yml) can use key auth as that user.
# Keep a WinBox MAC / serial recovery channel open while running this.
#
# vault_switch_admin_password is decrypted automatically from
# group_vars/mikrotik.vault.yml via the `makerfloss` vault id in ansible.cfg.
# All device-touching tasks are :if [find] guarded, so the play is safe to re-run.
- name: Bootstrap MikroTik switch (first contact, password auth)
hosts: mikrotik
gather_facts: false
# The vaulted admin password is NOT auto-loaded: group_vars/mikrotik.vault.yml
# doesn't match the group-name convention (only mikrotik.yml or group_vars/mikrotik/
# auto-load), so load it explicitly here. Day-2 (play_switch.yml) is key auth and
# needs no secret. Decrypted automatically via the makerfloss vault id in ansible.cfg.
vars_files:
- group_vars/mikrotik.vault.yml
vars:
pubkey_local: "{{ switch_admin_ssh_pubkey_file | default('~/.ssh/id_ed25519.pub') }}"
pubkey_remote: "id_ansible.pub"
tasks:
- name: Create named admin user (idempotent)
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="{{ switch_admin_user }}"]] = 0)
do={ /user add name="{{ switch_admin_user }}"
group="{{ switch_admin_group | default('full') }}"
password="{{ vault_switch_admin_password }}" }
changed_when: false
- name: Copy operator public key to the switch
ansible.netcommon.net_put:
src: "{{ pubkey_local }}"
dest: "{{ pubkey_remote }}"
- name: Import the SSH public key for the admin user (only if none yet)
community.routeros.command:
commands:
- >-
:if ([:len [/user/ssh-keys/find user="{{ switch_admin_user }}"]] = 0)
do={ /user/ssh-keys/import public-key-file="{{ pubkey_remote }}"
user="{{ switch_admin_user }}" }
changed_when: false
- name: Ensure SSH service is enabled
community.routeros.command:
commands:
- /ip/service/set ssh disabled=no port={{ switch_ssh_port | default(22) }}
changed_when: false

View file

@ -15,11 +15,8 @@ subset with `--tags`.
| `switch_backup_enabled` | `backup.yml` | `backup` | `/export` + binary backup, fetched into the repo | | `switch_backup_enabled` | `backup.yml` | `backup` | `/export` + binary backup, fetched into the repo |
| `switch_firmware_enabled` | `firmware.yml` | `firmware` | RouterOS + RouterBOOT upgrade to `switch_firmware_target` (opt-in) | | `switch_firmware_enabled` | `firmware.yml` | `firmware` | RouterOS + RouterBOOT upgrade to `switch_firmware_target` (opt-in) |
> All per-domain task files are implemented. `identity`, `users`, `backup` and > The per-domain task files are currently **stubs** pending implementation (see the
> `firmware` are idempotency-verified against the device; `vlans` is implemented and > plan in `docs/superpowers/plans/`).
> Jinja-validated but its device run is deferred until the real topology is in
> `host_vars` and an on-site recovery channel is available (it enables
> `vlan-filtering` last, which can strand management if the mgmt path is wrong).
## Variables (`defaults/main.yml`) ## Variables (`defaults/main.yml`)

View file

@ -1,26 +1,4 @@
--- ---
# Generate a human-readable /export and a binary /system backup on the device, - name: Placeholder
# then pull both into the repo under backups/<host>/. net_get uses SCP over the ansible.builtin.debug:
# RouterOS SSH service (same channel play_bootstrap.yml uses for net_put). msg: "not yet implemented"
- name: Generate a config export on the device
community.routeros.command:
commands:
- /export file=export
changed_when: false
- name: Generate a binary system backup on the device
community.routeros.command:
commands:
- /system/backup/save name=backup dont-encrypt=yes
changed_when: false
- name: Fetch the export file into the repo
ansible.netcommon.net_get:
src: "export.rsc"
dest: "{{ playbook_dir }}/backups/{{ inventory_hostname }}/export.rsc"
- name: Fetch the binary backup into the repo
ansible.netcommon.net_get:
src: "backup.backup"
dest: "{{ playbook_dir }}/backups/{{ inventory_hostname }}/backup.backup"

View file

@ -1,48 +1,4 @@
--- ---
# Opt-in RouterOS + RouterBOOT upgrade to switch_firmware_target. - name: Placeholder
# Disabled by default (switch_firmware_enabled: false). Upgrades REBOOT the switch, ansible.builtin.debug:
# so run deliberately with a recovery channel open. Naturally a no-op when the device msg: "not yet implemented"
# is already at or above the target version (the version guard skips every step).
- name: Assert a firmware target is set
ansible.builtin.assert:
that:
- switch_firmware_target | length > 0
fail_msg: >-
switch_firmware_target must be set in host_vars to run firmware upgrades.
- name: Gather RouterOS facts (current version)
community.routeros.facts:
- name: Upgrade RouterOS to the target and reboot
when: ansible_net_version is version(switch_firmware_target, '<')
block:
- name: Install the target RouterOS package from the stable channel
community.routeros.command:
commands:
- /system/package/update/set channel=stable
- /system/package/update/install
changed_when: true
- name: Wait for the switch to reboot and come back
ansible.builtin.wait_for_connection:
delay: 30
timeout: 300
- name: Upgrade RouterBOOT to match the installed RouterOS
community.routeros.command:
commands:
- /system/routerboard/upgrade
changed_when: true
- name: Reboot to apply the RouterBOOT upgrade
community.routeros.command:
commands:
- /system/reboot
changed_when: true
ignore_unreachable: true # connection drops on reboot; expected
- name: Wait for the switch to come back after the RouterBOOT reboot
ansible.builtin.wait_for_connection:
delay: 30
timeout: 300

View file

@ -1,37 +1,4 @@
--- ---
# Identity, management services, DNS/NTP and service hardening. - name: Placeholder
# All commands here are `set` on singleton/named items, so they are naturally ansible.builtin.debug:
# idempotent; RouterOS `command` cannot report change, hence `changed_when: false`. msg: "not yet implemented"
- name: Set system identity
community.routeros.command:
commands:
- /system/identity/set name="{{ switch_identity_name }}"
changed_when: false
- name: Configure DNS servers
community.routeros.command:
commands:
- /ip/dns/set servers="{{ switch_dns_servers }}" allow-remote-requests=no
changed_when: false
- name: Configure NTP client
community.routeros.command:
commands:
- /system/ntp/client/set enabled=yes servers="{{ switch_ntp_servers }}"
changed_when: false
- name: Disable unused IP services (hardening; winbox kept for recovery)
community.routeros.command:
commands:
- /ip/service/set {{ item }} disabled=yes
loop: "{{ switch_disabled_services }}"
loop_control:
label: "{{ item }}"
changed_when: false
- name: Ensure SSH service is enabled on the configured port
community.routeros.command:
commands:
- /ip/service/set ssh disabled=no port={{ switch_ssh_port }}
changed_when: false

View file

@ -1,22 +1,4 @@
--- ---
# Ensure the named admin user exists and (optionally) disable the built-in `admin`. - name: Placeholder
# The operator SSH key is imported once by play_bootstrap.yml; day-2 only guarantees ansible.builtin.debug:
# the user is present and the default account is hardened. Idempotency comes from the msg: "not yet implemented"
# RouterOS `:if [find]` guards, so `changed_when: false` is correct here.
- name: Ensure named admin user exists
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="{{ switch_admin_user }}"]] = 0) do={
/user add name="{{ switch_admin_user }}" group="{{ switch_admin_group }}" }
changed_when: false
- name: Disable the default admin user
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="admin"]] > 0) do={
/user/set admin disabled=yes }
when: switch_disable_default_admin | bool
changed_when: false

View file

@ -1,101 +1,4 @@
--- ---
# VLAN-aware bridge, access/trunk ports, and the management VLAN interface. - name: Placeholder
# ansible.builtin.debug:
# ORDERING IS DELIBERATE (lockout safety): bridge (filtering OFF) -> ports+pvid -> msg: "not yet implemented"
# VLAN membership -> mgmt VLAN iface + IP -> default route -> vlan-filtering LAST.
# Enabling vlan-filtering is the point at which a wrong management path strands the
# switch, so it runs only after the mgmt VLAN/IP exist. Keep a serial/WinBox-MAC
# recovery channel open when running this against a live device.
#
# DEFCONF NOTE: on a factory-default CRS310 the `bridge` already exists with every
# port as an untagged member and the management IP sits directly on `bridge`
# (192.168.88.1/24). This role does NOT delete that legacy IP — after you have
# proven reachability on the new mgmt VLAN, remove the old bridge IP on-site so the
# device is reachable only via vlan-mgmt. The guards below adopt the existing bridge
# and ports rather than recreating them.
#
# Idempotency comes from the RouterOS `:if [find]` guards (changed_when: false).
- name: Create VLAN-aware bridge (filtering off initially)
community.routeros.command:
commands:
- >-
:if ([:len [/interface/bridge/find name="{{ switch_bridge_name }}"]] = 0)
do={ /interface/bridge/add name="{{ switch_bridge_name }}"
vlan-filtering=no }
changed_when: false
- name: Add or adopt bridge ports and set their PVID
community.routeros.command:
commands:
- >-
:if ([:len [/interface/bridge/port/find interface="{{ item.interface }}"]] = 0)
do={ /interface/bridge/port/add bridge="{{ switch_bridge_name }}"
interface="{{ item.interface }}" pvid={{ item.pvid }} }
else={ /interface/bridge/port/set [find interface="{{ item.interface }}"]
pvid={{ item.pvid }} }
loop: "{{ switch_bridge_ports }}"
loop_control:
label: "{{ item.interface }} (pvid {{ item.pvid }})"
changed_when: false
# tagged = trunk ports whose tagged_vlans include this id, plus the bridge (CPU)
# ONLY on the management VLAN so the vlan-mgmt interface is reachable.
# untagged = access ports whose pvid equals this id.
- name: Define bridge VLANs (tagged/untagged membership)
community.routeros.command:
commands:
- >-
:local tagged "{{ ((switch_bridge_ports
| selectattr('mode', 'equalto', 'trunk')
| selectattr('tagged_vlans', 'defined')
| selectattr('tagged_vlans', 'contains', item.id)
| map(attribute='interface') | list)
+ ([switch_bridge_name] if item.id == switch_mgmt_vlan_id else []))
| join(',') }}";
:local untagged "{{ switch_bridge_ports
| selectattr('mode', 'equalto', 'access')
| selectattr('pvid', 'equalto', item.id)
| map(attribute='interface') | list | join(',') }}";
:if ([:len [/interface/bridge/vlan/find vlan-ids={{ item.id }}]] = 0)
do={ /interface/bridge/vlan/add bridge="{{ switch_bridge_name }}"
vlan-ids={{ item.id }} tagged=$tagged untagged=$untagged }
else={ /interface/bridge/vlan/set [find vlan-ids={{ item.id }}]
tagged=$tagged untagged=$untagged }
loop: "{{ switch_vlans }}"
loop_control:
label: "vlan {{ item.id }} ({{ item.name }})"
changed_when: false
- name: Create the management VLAN interface
community.routeros.command:
commands:
- >-
:if ([:len [/interface/vlan/find name="vlan-mgmt"]] = 0)
do={ /interface/vlan/add name="vlan-mgmt"
interface="{{ switch_bridge_name }}" vlan-id={{ switch_mgmt_vlan_id }} }
changed_when: false
- name: Assign the management IP address
community.routeros.command:
commands:
- >-
:if ([:len [/ip/address/find interface="vlan-mgmt"]] = 0)
do={ /ip/address/add address="{{ switch_mgmt_address }}"
interface="vlan-mgmt" }
changed_when: false
- name: Set the default gateway route
community.routeros.command:
commands:
- >-
:if ([:len [/ip/route/find dst-address="0.0.0.0/0"]] = 0)
do={ /ip/route/add dst-address=0.0.0.0/0
gateway="{{ switch_mgmt_gateway }}" }
changed_when: false
- name: Enable VLAN filtering (LAST — prove mgmt reachability first)
community.routeros.command:
commands:
- /interface/bridge/set "{{ switch_bridge_name }}" vlan-filtering=yes
changed_when: false