Compare commits

...

7 commits

Author SHA1 Message Date
sjat
67554c0b38 docs: mark domain tasks implemented; note deferred vlans device run
Implements Task 10 doc updates. README/CLAUDE/role-README now reflect that all
task files + play_bootstrap/play_backup are implemented and idempotency-verified,
that vlans is built+validated but its device run is deferred (placeholder topology,
on-site recovery needed), and that the bootstrap/backup plays exist. Corrects the
bootstrap invocation example (-e ansible_user=admin --ask-pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:45:36 +02:00
sjat
5931542473 feat: first-contact bootstrap play (named admin + SSH key import)
Implements Task 4 (the play was run on-site but never committed). Creates the
named admin user, imports the operator pubkey over SCP (net_put), enables SSH.
Improvements over the plan: the key import is :if [find] guarded so re-runs don't
create duplicate keys, and the vaulted password is loaded via vars_files (it is
not auto-loaded because group_vars/mikrotik.vault.yml doesn't match the group-name
convention). Verified idempotent (changed=0) against crs310-maker; no duplicate key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:42:56 +02:00
sjat
5a5a194437 feat(firmware): opt-in RouterOS + RouterBOOT upgrade to pinned target
Implements Task 9. Version-guarded (no-op when already >= switch_firmware_target,
as crs310-maker is at 7.19.6). Upgrade steps grouped in a block; reboot uses
ignore_unreachable + wait_for_connection instead of ignore_errors so it stays
lint-clean under the production profile. Syntax + lint only; not run (opt-in).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:40:24 +02:00
sjat
33dc378c3c feat(vlans): VLAN-aware bridge, ports, mgmt interface (mechanism)
Implements Task 7. Deliberate lockout-safe ordering (vlan-filtering LAST) with
:if [find] guards that adopt the existing defconf bridge/ports rather than
recreating them. Membership Jinja: trunk ports tagged per tagged_vlans, access
ports untagged per pvid, bridge/CPU tagged only on the mgmt VLAN; else={set} makes
membership declarative. Jinja render validated offline against the placeholder
topology. Device run DEFERRED to an on-site session with a recovery channel
(remote bench has no serial/WinBox-MAC fallback). Topology stays placeholder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:39:04 +02:00
sjat
39a12ae23b feat(backup): export + binary backup, fetch into repo
Implements Task 8. play_backup.yml ensures the local dir then includes backup.yml,
which runs /export + /system backup save and pulls both over SCP (net_get).
Binary .backup is gitignored (may contain secrets); export.rsc is committed.
Verified against crs310-maker on the bench: both artifacts fetched non-empty.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:36:14 +02:00
sjat
ea7cf5ec03 feat(users): ensure named admin, disable default admin
Implements Task 6. Guards user creation with :if [find]; disables the built-in
admin (switch_disable_default_admin) now that sjat key login is proven. Verified
run-twice idempotent (changed=0); admin disabled=true, sjat reachable on bench.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:34:58 +02:00
sjat
cfc6ec9721 feat(identity): identity, DNS, NTP, service hardening
Implements Task 5. Disables telnet/ftp/www/www-ssl/api/api-ssl (winbox kept
for recovery), sets DNS + NTP client, ensures SSH on the configured port.
Verified run-twice idempotent (changed=0) against crs310-maker on the bench.
Also sets ansible_user=sjat in host_vars for day-2 key auth.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 19:33:48 +02:00
14 changed files with 371 additions and 29 deletions

1
.gitignore vendored
View file

@ -3,3 +3,4 @@
__pycache__/
*.pyc
.DS_Store
backups/**/*.backup

View file

@ -18,7 +18,7 @@ conventions this repo copies); independent repo on `forgejo.makerfloss.eu`.
- `group_vars/mikrotik.vault.yml` — encrypted password (excluded from linters)
- `host_vars/crs310-maker.yml` — device facts, real addressing, VLAN/port map
- `roles/makerfloss.mikrotik_switch/` — one role, per-domain task files gated by flags
- `play_switch.yml` (day-2), `play_bootstrap.yml` / `play_backup.yml` (to implement)
- `play_switch.yml` (day-2), `play_bootstrap.yml` (first contact), `play_backup.yml`
- `docs/` — field guide, design spec, implementation plan
## Essential commands
@ -43,7 +43,12 @@ ansible-vault view group_vars/mikrotik.vault.yml # read a secret
## Status / next
Bootstrap is done (user `sjat` + key + identity `crs310-maker`, RouterOS 7.19.6 pinned).
The per-domain task files are **stubs**; implement them per
`docs/superpowers/plans/2026-06-07-mikrotik-crs310-ansible.md` (Tasks 59), reading the
"carry-over notes" at the end of that plan first.
Bootstrap is done (user `sjat` + key + identity `crs310-maker`, RouterOS 7.19.6 pinned;
default `admin` now disabled). All per-domain task files are **implemented**:
`identity`, `users`, `backup`, `firmware` (opt-in) and `play_bootstrap` / `play_backup`
are idempotency-verified against the device. `vlans` is implemented and Jinja-validated
but its **device run is deferred** — the `host_vars` topology is still a placeholder.
Next, on-site with a recovery channel: drop the real VLAN/port map into `host_vars`,
reconcile the legacy defconf IP (`192.168.88.1/24` lives directly on `bridge`), then run
`--tags vlans` and confirm mgmt reachability before/after `vlan-filtering=yes`.

View file

@ -12,10 +12,13 @@ rebuilt from this repo instead of by hand in WinBox.
|---|---|
| Repo scaffolding, role skeleton, vault | ✅ done |
| On-site device prep + **bootstrap** (named user + SSH key + identity) | ✅ done (2026-06-08) |
| Day-2 config: `identity` / `users` / `vlans` / `backup` / `firmware` tasks | ⏳ **stubs** — to implement (see `docs/superpowers/plans/`) |
| `identity` / `users` / `backup` / `firmware` + `play_bootstrap` / `play_backup` | ✅ implemented; idempotency-verified against the device (firmware is opt-in, lint/syntax only) |
| `vlans` (VLAN-aware bridge, ports, mgmt iface) | ✅ implemented + Jinja-validated; **device run deferred** — needs the real VLAN/port plan and an on-site recovery channel before `vlan-filtering` is enabled |
The switch is reachable today by key auth as user `sjat`; the per-domain task files
still need their real RouterOS logic written and idempotency-tested.
The switch is reachable today by key auth as user `sjat`. All task files now carry their
real RouterOS logic. The `vlans` topology in `host_vars` is still a **placeholder**:
replace it with the real makerspace VLAN ids + per-port map before running `--tags vlans`
on the live device, and do so on-site with a serial/WinBox-MAC recovery channel open.
## Layout
@ -62,15 +65,16 @@ community.routeros.routeros`, authenticating with the operator SSH key
yamllint . && ansible-lint && ansible-playbook play_switch.yml --syntax-check
# First contact on a fresh/reset device (password auth, one time)
ansible-playbook play_bootstrap.yml --ask-pass # (play to be implemented)
ansible-playbook play_bootstrap.yml -e ansible_user=admin --ask-pass
# Day-2 configuration (key auth, idempotent)
ansible-playbook play_switch.yml
ansible-playbook play_switch.yml --tags vlans # one domain
ansible-playbook play_switch.yml --tags identity,users # safe domains
ansible-playbook play_switch.yml --tags vlans # on-site only — see lockout note
ansible-playbook play_switch.yml --limit crs310-maker
# Backup config into the repo
ansible-playbook play_backup.yml # (play to be implemented)
ansible-playbook play_backup.yml
```
## ⚠️ Lockout safety

0
backups/.gitkeep Normal file
View file

View file

@ -0,0 +1,35 @@
# 2025-09-11 09:49:07 by RouterOS 7.19.6
# software id = 73S3-5F2W
#
# model = CRS310-8G+2S+
# serial number = HM40B8TDNDD
/interface bridge
add admin-mac=D0:EA:11:24:F4:AA auto-mac=no comment=defconf name=bridge
/interface bridge port
add bridge=bridge comment=defconf interface=ether1
add bridge=bridge comment=defconf interface=ether2
add bridge=bridge comment=defconf interface=ether3
add bridge=bridge comment=defconf interface=ether4
add bridge=bridge comment=defconf interface=ether5
add bridge=bridge comment=defconf interface=ether6
add bridge=bridge comment=defconf interface=ether7
add bridge=bridge comment=defconf interface=ether8
add bridge=bridge comment=defconf interface=sfp-sfpplus1
add bridge=bridge comment=defconf interface=sfp-sfpplus2
/ip address
add address=192.168.88.1/24 comment=defconf interface=bridge network=\
192.168.88.0
/ip dns
set servers=10.0.99.1
/ip service
set ftp disabled=yes
set telnet disabled=yes
set www disabled=yes
set api disabled=yes
set api-ssl disabled=yes
/system identity
set name=crs310-maker
/system ntp client
set enabled=yes
/system ntp client servers
add address=10.0.99.1

View file

@ -10,6 +10,10 @@
# group_vars/mikrotik.vault.yml). Key login verified. Default `admin` still enabled
# (not yet hardened). Switch currently on the bench at 192.168.88.1 (defconf, not yet
# reset/VLAN-configured). Real mgmt addressing below is the FUTURE production plan.
# Day-2 connection: key auth as the named admin user (overrides the bootstrap
# default ansible_user=admin in group_vars/mikrotik.yml).
ansible_user: sjat
switch_identity_name: "crs310-maker"
switch_mgmt_vlan_id: 99
switch_mgmt_address: "10.0.99.2/24" # EDIT: real mgmt IP
@ -19,7 +23,15 @@ switch_ntp_servers: "10.0.99.1"
switch_admin_user: "sjat"
# Real VLAN/port topology (EDIT to the makerspace plan when known)
# PLACEHOLDER VLAN/port topology — vlans.yml is correct mechanism, but these IDs
# and the per-port map are NOT the real makerspace plan. Replace with the real
# VLAN ids + full ether1-8/sfp map before any on-site VLAN run. Notes:
# - mode: access -> untagged member of `pvid`; mode: trunk -> tagged member of
# each id in `tagged_vlans`, with `pvid` as the native (untagged) VLAN.
# - trunk pvid: 1 means untagged frames on the uplink land in VLAN 1 (unused in a
# hardened design). Decide deliberately whether the uplink should carry any
# untagged traffic; set pvid to an intended native VLAN or leave 1 as a dead end.
# - the bridge (CPU) is tagged ONLY on switch_mgmt_vlan_id (see vlans.yml).
switch_vlans:
- {id: 99, name: "mgmt"}
- {id: 10, name: "members"}

16
play_backup.yml Normal file
View file

@ -0,0 +1,16 @@
---
- name: Back up MikroTik switch configuration
hosts: mikrotik
gather_facts: false
tasks:
- name: Ensure local backup directory exists
ansible.builtin.file:
path: "{{ playbook_dir }}/backups/{{ inventory_hostname }}"
state: directory
mode: "0755"
delegate_to: localhost
- name: Run backup tasks
ansible.builtin.include_role:
name: makerfloss.mikrotik_switch
tasks_from: backup.yml

52
play_bootstrap.yml Normal file
View file

@ -0,0 +1,52 @@
---
# FIRST-CONTACT bootstrap (run once, password auth):
# ansible-playbook play_bootstrap.yml -e ansible_user=admin --ask-pass
# Creates the named admin user, imports the operator SSH public key over SCP, and
# enables SSH so day-2 runs (play_switch.yml) can use key auth as that user.
# Keep a WinBox MAC / serial recovery channel open while running this.
#
# vault_switch_admin_password is decrypted automatically from
# group_vars/mikrotik.vault.yml via the `makerfloss` vault id in ansible.cfg.
# All device-touching tasks are :if [find] guarded, so the play is safe to re-run.
- name: Bootstrap MikroTik switch (first contact, password auth)
hosts: mikrotik
gather_facts: false
# The vaulted admin password is NOT auto-loaded: group_vars/mikrotik.vault.yml
# doesn't match the group-name convention (only mikrotik.yml or group_vars/mikrotik/
# auto-load), so load it explicitly here. Day-2 (play_switch.yml) is key auth and
# needs no secret. Decrypted automatically via the makerfloss vault id in ansible.cfg.
vars_files:
- group_vars/mikrotik.vault.yml
vars:
pubkey_local: "{{ switch_admin_ssh_pubkey_file | default('~/.ssh/id_ed25519.pub') }}"
pubkey_remote: "id_ansible.pub"
tasks:
- name: Create named admin user (idempotent)
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="{{ switch_admin_user }}"]] = 0)
do={ /user add name="{{ switch_admin_user }}"
group="{{ switch_admin_group | default('full') }}"
password="{{ vault_switch_admin_password }}" }
changed_when: false
- name: Copy operator public key to the switch
ansible.netcommon.net_put:
src: "{{ pubkey_local }}"
dest: "{{ pubkey_remote }}"
- name: Import the SSH public key for the admin user (only if none yet)
community.routeros.command:
commands:
- >-
:if ([:len [/user/ssh-keys/find user="{{ switch_admin_user }}"]] = 0)
do={ /user/ssh-keys/import public-key-file="{{ pubkey_remote }}"
user="{{ switch_admin_user }}" }
changed_when: false
- name: Ensure SSH service is enabled
community.routeros.command:
commands:
- /ip/service/set ssh disabled=no port={{ switch_ssh_port | default(22) }}
changed_when: false

View file

@ -15,8 +15,11 @@ subset with `--tags`.
| `switch_backup_enabled` | `backup.yml` | `backup` | `/export` + binary backup, fetched into the repo |
| `switch_firmware_enabled` | `firmware.yml` | `firmware` | RouterOS + RouterBOOT upgrade to `switch_firmware_target` (opt-in) |
> The per-domain task files are currently **stubs** pending implementation (see the
> plan in `docs/superpowers/plans/`).
> All per-domain task files are implemented. `identity`, `users`, `backup` and
> `firmware` are idempotency-verified against the device; `vlans` is implemented and
> Jinja-validated but its device run is deferred until the real topology is in
> `host_vars` and an on-site recovery channel is available (it enables
> `vlan-filtering` last, which can strand management if the mgmt path is wrong).
## Variables (`defaults/main.yml`)

View file

@ -1,4 +1,26 @@
---
- name: Placeholder
ansible.builtin.debug:
msg: "not yet implemented"
# Generate a human-readable /export and a binary /system backup on the device,
# then pull both into the repo under backups/<host>/. net_get uses SCP over the
# RouterOS SSH service (same channel play_bootstrap.yml uses for net_put).
- name: Generate a config export on the device
community.routeros.command:
commands:
- /export file=export
changed_when: false
- name: Generate a binary system backup on the device
community.routeros.command:
commands:
- /system/backup/save name=backup dont-encrypt=yes
changed_when: false
- name: Fetch the export file into the repo
ansible.netcommon.net_get:
src: "export.rsc"
dest: "{{ playbook_dir }}/backups/{{ inventory_hostname }}/export.rsc"
- name: Fetch the binary backup into the repo
ansible.netcommon.net_get:
src: "backup.backup"
dest: "{{ playbook_dir }}/backups/{{ inventory_hostname }}/backup.backup"

View file

@ -1,4 +1,48 @@
---
- name: Placeholder
ansible.builtin.debug:
msg: "not yet implemented"
# Opt-in RouterOS + RouterBOOT upgrade to switch_firmware_target.
# Disabled by default (switch_firmware_enabled: false). Upgrades REBOOT the switch,
# so run deliberately with a recovery channel open. Naturally a no-op when the device
# is already at or above the target version (the version guard skips every step).
- name: Assert a firmware target is set
ansible.builtin.assert:
that:
- switch_firmware_target | length > 0
fail_msg: >-
switch_firmware_target must be set in host_vars to run firmware upgrades.
- name: Gather RouterOS facts (current version)
community.routeros.facts:
- name: Upgrade RouterOS to the target and reboot
when: ansible_net_version is version(switch_firmware_target, '<')
block:
- name: Install the target RouterOS package from the stable channel
community.routeros.command:
commands:
- /system/package/update/set channel=stable
- /system/package/update/install
changed_when: true
- name: Wait for the switch to reboot and come back
ansible.builtin.wait_for_connection:
delay: 30
timeout: 300
- name: Upgrade RouterBOOT to match the installed RouterOS
community.routeros.command:
commands:
- /system/routerboard/upgrade
changed_when: true
- name: Reboot to apply the RouterBOOT upgrade
community.routeros.command:
commands:
- /system/reboot
changed_when: true
ignore_unreachable: true # connection drops on reboot; expected
- name: Wait for the switch to come back after the RouterBOOT reboot
ansible.builtin.wait_for_connection:
delay: 30
timeout: 300

View file

@ -1,4 +1,37 @@
---
- name: Placeholder
ansible.builtin.debug:
msg: "not yet implemented"
# Identity, management services, DNS/NTP and service hardening.
# All commands here are `set` on singleton/named items, so they are naturally
# idempotent; RouterOS `command` cannot report change, hence `changed_when: false`.
- name: Set system identity
community.routeros.command:
commands:
- /system/identity/set name="{{ switch_identity_name }}"
changed_when: false
- name: Configure DNS servers
community.routeros.command:
commands:
- /ip/dns/set servers="{{ switch_dns_servers }}" allow-remote-requests=no
changed_when: false
- name: Configure NTP client
community.routeros.command:
commands:
- /system/ntp/client/set enabled=yes servers="{{ switch_ntp_servers }}"
changed_when: false
- name: Disable unused IP services (hardening; winbox kept for recovery)
community.routeros.command:
commands:
- /ip/service/set {{ item }} disabled=yes
loop: "{{ switch_disabled_services }}"
loop_control:
label: "{{ item }}"
changed_when: false
- name: Ensure SSH service is enabled on the configured port
community.routeros.command:
commands:
- /ip/service/set ssh disabled=no port={{ switch_ssh_port }}
changed_when: false

View file

@ -1,4 +1,22 @@
---
- name: Placeholder
ansible.builtin.debug:
msg: "not yet implemented"
# Ensure the named admin user exists and (optionally) disable the built-in `admin`.
# The operator SSH key is imported once by play_bootstrap.yml; day-2 only guarantees
# the user is present and the default account is hardened. Idempotency comes from the
# RouterOS `:if [find]` guards, so `changed_when: false` is correct here.
- name: Ensure named admin user exists
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="{{ switch_admin_user }}"]] = 0) do={
/user add name="{{ switch_admin_user }}" group="{{ switch_admin_group }}" }
changed_when: false
- name: Disable the default admin user
community.routeros.command:
commands:
- >-
:if ([:len [/user find name="admin"]] > 0) do={
/user/set admin disabled=yes }
when: switch_disable_default_admin | bool
changed_when: false

View file

@ -1,4 +1,101 @@
---
- name: Placeholder
ansible.builtin.debug:
msg: "not yet implemented"
# VLAN-aware bridge, access/trunk ports, and the management VLAN interface.
#
# ORDERING IS DELIBERATE (lockout safety): bridge (filtering OFF) -> ports+pvid ->
# VLAN membership -> mgmt VLAN iface + IP -> default route -> vlan-filtering LAST.
# Enabling vlan-filtering is the point at which a wrong management path strands the
# switch, so it runs only after the mgmt VLAN/IP exist. Keep a serial/WinBox-MAC
# recovery channel open when running this against a live device.
#
# DEFCONF NOTE: on a factory-default CRS310 the `bridge` already exists with every
# port as an untagged member and the management IP sits directly on `bridge`
# (192.168.88.1/24). This role does NOT delete that legacy IP — after you have
# proven reachability on the new mgmt VLAN, remove the old bridge IP on-site so the
# device is reachable only via vlan-mgmt. The guards below adopt the existing bridge
# and ports rather than recreating them.
#
# Idempotency comes from the RouterOS `:if [find]` guards (changed_when: false).
- name: Create VLAN-aware bridge (filtering off initially)
community.routeros.command:
commands:
- >-
:if ([:len [/interface/bridge/find name="{{ switch_bridge_name }}"]] = 0)
do={ /interface/bridge/add name="{{ switch_bridge_name }}"
vlan-filtering=no }
changed_when: false
- name: Add or adopt bridge ports and set their PVID
community.routeros.command:
commands:
- >-
:if ([:len [/interface/bridge/port/find interface="{{ item.interface }}"]] = 0)
do={ /interface/bridge/port/add bridge="{{ switch_bridge_name }}"
interface="{{ item.interface }}" pvid={{ item.pvid }} }
else={ /interface/bridge/port/set [find interface="{{ item.interface }}"]
pvid={{ item.pvid }} }
loop: "{{ switch_bridge_ports }}"
loop_control:
label: "{{ item.interface }} (pvid {{ item.pvid }})"
changed_when: false
# tagged = trunk ports whose tagged_vlans include this id, plus the bridge (CPU)
# ONLY on the management VLAN so the vlan-mgmt interface is reachable.
# untagged = access ports whose pvid equals this id.
- name: Define bridge VLANs (tagged/untagged membership)
community.routeros.command:
commands:
- >-
:local tagged "{{ ((switch_bridge_ports
| selectattr('mode', 'equalto', 'trunk')
| selectattr('tagged_vlans', 'defined')
| selectattr('tagged_vlans', 'contains', item.id)
| map(attribute='interface') | list)
+ ([switch_bridge_name] if item.id == switch_mgmt_vlan_id else []))
| join(',') }}";
:local untagged "{{ switch_bridge_ports
| selectattr('mode', 'equalto', 'access')
| selectattr('pvid', 'equalto', item.id)
| map(attribute='interface') | list | join(',') }}";
:if ([:len [/interface/bridge/vlan/find vlan-ids={{ item.id }}]] = 0)
do={ /interface/bridge/vlan/add bridge="{{ switch_bridge_name }}"
vlan-ids={{ item.id }} tagged=$tagged untagged=$untagged }
else={ /interface/bridge/vlan/set [find vlan-ids={{ item.id }}]
tagged=$tagged untagged=$untagged }
loop: "{{ switch_vlans }}"
loop_control:
label: "vlan {{ item.id }} ({{ item.name }})"
changed_when: false
- name: Create the management VLAN interface
community.routeros.command:
commands:
- >-
:if ([:len [/interface/vlan/find name="vlan-mgmt"]] = 0)
do={ /interface/vlan/add name="vlan-mgmt"
interface="{{ switch_bridge_name }}" vlan-id={{ switch_mgmt_vlan_id }} }
changed_when: false
- name: Assign the management IP address
community.routeros.command:
commands:
- >-
:if ([:len [/ip/address/find interface="vlan-mgmt"]] = 0)
do={ /ip/address/add address="{{ switch_mgmt_address }}"
interface="vlan-mgmt" }
changed_when: false
- name: Set the default gateway route
community.routeros.command:
commands:
- >-
:if ([:len [/ip/route/find dst-address="0.0.0.0/0"]] = 0)
do={ /ip/route/add dst-address=0.0.0.0/0
gateway="{{ switch_mgmt_gateway }}" }
changed_when: false
- name: Enable VLAN filtering (LAST — prove mgmt reachability first)
community.routeros.command:
commands:
- /interface/bridge/set "{{ switch_bridge_name }}" vlan-filtering=yes
changed_when: false