MakerFLOSS/notes/dev/plans/2026-06-24-rack-network.md

594 lines
23 KiB
Markdown
Raw Normal View History

# Rack Network (Phase 3) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add network-cabling data (`links:` feeds + switch/patch-panel peer files) to the rack pipeline, validate it (rule 4), and render a mermaid network graph on the generated rack page — reusing every Phase 1/2 mechanism.
**Architecture:** Extend the existing `scripts/gen_rack.py` with `load_hardware_index` (global hostname→frontmatter map for peer resolution), `validate_links` (rule 4), and `render_network` (a `flowchart LR` with local interface, peer port, and speed on each edge label); insert a `## Network` section into `render_page` between Power and Occupancy. Switch/patch-panel files are normal placed items that Phase 1 already draws and `gen_overview.py` already lists. Mermaid is already enabled.
**Tech Stack:** Python 3 (stdlib + PyYAML only), pytest, MkDocs Material, Forgejo Actions CI.
**Spec:** `notes/dev/specs/2026-06-24-rack-network-design.md`.
## Global Constraints
- Scripts use **stdlib + PyYAML only**; deterministic and offline (copy existing `gen_rack.py` style). No randomness/time in generated output.
- `re` and `yaml` are already imported in `scripts/gen_rack.py`; do not add new imports.
- `_node_id` (Phase 2) is reused for mermaid node ids — do not redefine it.
- Validation failures raise `SchemaError`; `generate` prints `ERROR: …` to stderr and returns `1`, **writing nothing** on failure (existing behaviour).
- Generated files keep the existing `_Auto-generated … do not edit by hand_` banner (already emitted by `render_page`).
- **Peer resolution is global** (against all `docs/hardware/*.md` hostnames), not per-rack — rule 4 says "resolves to a real file".
- `peer_port` range is checked **only when the peer declares an integer `ports`**.
- Edge label format: `{local} → p{peer_port} · {speed}G`, with the ` · {speed}G` suffix omitted when `speed_gbps` is absent. Use the unicode arrow `→` (not `->`) to avoid clashing with mermaid's `-->` syntax.
- A node whose kind is `switch` or `patch-panel` renders as `{name}<br/>{kind}`; all other nodes render as the bare hostname.
- Network data added here is **provisional placeholder data** (like the mfNN positions and the Phase 2 power data), not real values.
- **No edits** to `mkdocs.yml`, `Makefile`, `.forgejo/workflows/docs.yml`, or `scripts/overview_config.yml` (`switch`/`patch-panel`/`ap` already in the enum; drift already covers `racks/`).
- `mkdocs build --strict` must pass; `make docs-check` must exit 0 after regeneration.
---
### Task 1: `load_hardware_index` + `validate_links` — rule 4 (TDD)
Add the global peer index and link validation, and wire `validate_links` into `generate`. Testable on validation alone.
**Files:**
- Modify: `scripts/gen_rack.py` (add `load_hardware_index`, `validate_links`; build the index and call `validate_links` in `generate`)
- Modify: `tests/test_gen_rack.py` (append tests)
**Interfaces:**
- Consumes: `SchemaError`, `parse_frontmatter`, the `item()`/`_write_item` test helpers, `generate`.
- Produces:
- `load_hardware_index(hardware_dir: Path) -> dict[str, dict]``{hostname: frontmatter}` for every `*.md` (excluding `index.md`).
- `validate_links(items: list[dict], hw_index: dict[str, dict]) -> None` — raises `SchemaError` on a malformed/dangling link.
- [ ] **Step 1: Append failing tests to `tests/test_gen_rack.py`**
```python
def test_load_hardware_index_maps_all_hostnames(tmp_path):
hw = tmp_path / "hardware"
hw.mkdir()
_write_item(
hw, "sw01",
"---\nhostname: sw01\nkind: switch\nstatus: in-use\nports: 24\n---\n",
)
_write_item(
hw, "mf00",
"---\nhostname: mf00\nkind: server\nstatus: in-use\n"
"rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n---\n",
)
idx = gen_rack.load_hardware_index(hw)
assert set(idx) == {"sw01", "mf00"}
assert idx["sw01"]["ports"] == 24
def test_validate_links_accepts_valid_link():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 1, "speed_gbps": 1}])]
hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
gen_rack.validate_links(items, hw_index)
def test_validate_links_rejects_unknown_peer():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "ghost", "peer_port": 1}])]
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, {})
def test_validate_links_rejects_peer_port_over_count():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01", "peer_port": 25}])]
hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, hw_index)
def test_validate_links_accepts_peer_without_ports():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "rtr01", "peer_port": 99}])]
hw_index = {"rtr01": item(hostname="rtr01", kind="server")}
gen_rack.validate_links(items, hw_index) # no ports -> range check skipped
def test_validate_links_rejects_missing_local():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"peer": "sw01", "peer_port": 1}])]
hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, hw_index)
def test_validate_links_rejects_malformed_entry():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=["sw01"])]
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, {})
def test_generate_returns_1_on_bad_link_peer(tmp_path):
hw = tmp_path / "hardware"
out = tmp_path / "out"
hw.mkdir()
_write_item(
hw, "mf00",
"---\nhostname: mf00\nkind: server\nstatus: in-use\n"
"rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n"
"links:\n - { local: eth0, peer: ghost, peer_port: 1 }\n---\n",
)
rc = gen_rack.generate(hw, out)
assert rc == 1
assert not (out / "rack01.md").exists()
```
- [ ] **Step 2: Run to verify failure**
Run: `pytest tests/test_gen_rack.py -q`
Expected: FAIL — `AttributeError: module 'gen_rack' has no attribute 'load_hardware_index'`.
- [ ] **Step 3: Add `load_hardware_index` and `validate_links` after `check_overlaps` in `scripts/gen_rack.py`**
Add these two functions (place them just after `check_overlaps`, before `_pdu_index`):
```python
def load_hardware_index(hardware_dir: Path) -> dict[str, dict]:
"""Map hostname -> frontmatter for every hardware file (global peer lookup)."""
index: dict[str, dict] = {}
for path in sorted(hardware_dir.glob("*.md")):
if path.name == "index.md":
continue
fm = parse_frontmatter(path)
if fm is None:
continue
name = fm.get("hostname")
if isinstance(name, str) and name:
index[name] = fm
return index
def validate_links(items: list[dict], hw_index: dict[str, dict]) -> None:
"""Validate `links` cable declarations (rule 4).
Every links[].peer must resolve to a real hardware file (global lookup via
hw_index); peer_port must fall within the peer's declared `ports` when it
declares an integer count.
"""
for fm in items:
links = fm.get("links")
if links is None:
continue
name = fm.get("hostname", "?")
if not isinstance(links, list):
raise SchemaError(f"{name}: links must be a list")
for link in links:
if not isinstance(link, dict):
raise SchemaError(f"{name}: links entry must be a mapping")
local = link.get("local")
peer = link.get("peer")
peer_port = link.get("peer_port")
if not isinstance(local, str) or not local:
raise SchemaError(f"{name}: links entry needs a non-empty 'local'")
if not isinstance(peer, str) or not peer:
raise SchemaError(f"{name}: links entry needs a non-empty 'peer'")
if not isinstance(peer_port, int):
raise SchemaError(
f"{name}: links entry for {peer} needs an integer 'peer_port'"
)
target = hw_index.get(peer)
if target is None:
raise SchemaError(
f"{name}: links peer={peer!r} is not a known hardware file"
)
ports = target.get("ports")
if isinstance(ports, int) and (peer_port < 1 or peer_port > ports):
raise SchemaError(
f"{name}: peer_port {peer_port} out of range 1..{ports} on {peer}"
)
```
- [ ] **Step 4: Wire `validate_links` into `generate` in `scripts/gen_rack.py`**
`generate` currently begins:
```python
def generate(hardware_dir: Path, output_dir: Path) -> int:
items = load_rack_items(hardware_dir)
errors: list[str] = []
```
Add the global index right after `items` is loaded:
```python
def generate(hardware_dir: Path, output_dir: Path) -> int:
items = load_rack_items(hardware_dir)
hw_index = load_hardware_index(hardware_dir)
errors: list[str] = []
```
Then extend the per-rack validation loop. Replace:
```python
if not errors: # only check overlaps once placements are individually valid
for rack, ritems in racks.items():
try:
check_overlaps(ritems)
validate_power(ritems)
except SchemaError as e:
errors.append(f"{rack}: {e}")
```
with:
```python
if not errors: # only check overlaps once placements are individually valid
for rack, ritems in racks.items():
try:
check_overlaps(ritems)
validate_power(ritems)
validate_links(ritems, hw_index)
except SchemaError as e:
errors.append(f"{rack}: {e}")
```
- [ ] **Step 5: Run to verify pass**
Run: `pytest tests/test_gen_rack.py -q`
Expected: PASS (all prior tests + 8 new).
- [ ] **Step 6: Commit**
```bash
git add scripts/gen_rack.py tests/test_gen_rack.py
git commit -m "feat(rack): validate network links against peer files and ports"
```
---
### Task 2: `render_network` + page section (TDD)
**Files:**
- Modify: `scripts/gen_rack.py` (add `render_network`; edit `render_page`)
- Modify: `tests/test_gen_rack.py` (append tests)
**Interfaces:**
- Consumes: `_node_id` (Phase 2), `render_page`, `generate`.
- Produces: `render_network(rack: str, items: list[dict]) -> str` — a fenced `mermaid` `flowchart LR` ending in a newline, or `""` when no item has a `links` feed.
- [ ] **Step 1: Append failing tests to `tests/test_gen_rack.py`**
```python
def test_render_network_has_nodes_and_edge_labels():
items = [
item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
rack_face="front", ports=24),
item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 1, "speed_gbps": 1}]),
]
out = gen_rack.render_network("rack01", items)
assert "```mermaid" in out
assert "flowchart LR" in out
assert "sw01<br/>switch" in out
assert "mf00" in out
assert "eth0" in out
assert "p1" in out
assert "1G" in out
def test_render_network_patch_panel_subtitle():
items = [
item(hostname="pp01", kind="patch-panel", rack_u=24, u_height=1,
rack_face="front", ports=24),
item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "pp01",
"peer_port": 1, "speed_gbps": 1}]),
]
out = gen_rack.render_network("rack01", items)
assert "pp01<br/>patch-panel" in out
def test_render_network_empty_when_no_links():
items = [item(hostname="mf00", rack_u=1, u_height=1, rack_face="front")]
assert gen_rack.render_network("rack01", items) == ""
def test_render_network_omits_speed_when_absent():
items = [
item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
rack_face="front", ports=24),
item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01", "peer_port": 1}]),
]
out = gen_rack.render_network("rack01", items)
assert "eth0" in out and "p1" in out
assert "·" not in out # no speed suffix rendered
def test_render_network_is_deterministic():
a = item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
rack_face="front", ports=24)
b = item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 2, "speed_gbps": 1}])
c = item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 1, "speed_gbps": 1}])
assert gen_rack.render_network("rack01", [a, b, c]) == \
gen_rack.render_network("rack01", [c, b, a])
def test_generate_includes_network_section(tmp_path):
hw = tmp_path / "hardware"
out = tmp_path / "out"
hw.mkdir()
_write_item(
hw, "sw01",
"---\nhostname: sw01\nkind: switch\nstatus: in-use\n"
"rack: rack01\nrack_u: 10\nu_height: 1\nrack_face: front\nports: 24\n---\n",
)
_write_item(
hw, "mf00",
"---\nhostname: mf00\nkind: server\nstatus: in-use\n"
"rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n"
"links:\n - { local: eth0, peer: sw01, peer_port: 1, speed_gbps: 1 }\n---\n",
)
rc = gen_rack.generate(hw, out)
assert rc == 0
page = (out / "rack01.md").read_text()
assert "## Network" in page
assert "```mermaid" in page
assert "eth0" in page
```
- [ ] **Step 2: Run to verify failure**
Run: `pytest tests/test_gen_rack.py -q`
Expected: FAIL — `AttributeError: module 'gen_rack' has no attribute 'render_network'`.
- [ ] **Step 3: Add `render_network` after `render_power` in `scripts/gen_rack.py`**
```python
def render_network(rack: str, items: list[dict]) -> str:
"""Return a mermaid network-cabling flowchart, or '' if no links.
Assumes `validate_links` has already passed: every link has a non-empty
`local`/`peer` and an integer `peer_port`, and `peer` resolves to a real
hardware file. `generate` validates before any render call.
"""
linked = [fm for fm in items if fm.get("links")]
if not linked:
return ""
by_host = {fm.get("hostname"): fm for fm in items}
edges: list[tuple[str, str, str, int, object]] = []
nodes: set[str] = set()
for fm in linked:
source = fm.get("hostname", "?")
nodes.add(source)
for link in fm["links"]:
peer = link["peer"]
nodes.add(peer)
edges.append(
(source, link["local"], peer, link["peer_port"],
link.get("speed_gbps"))
)
edges.sort(key=lambda e: (e[0], e[1], e[2], e[3]))
def node_label(name: str) -> str:
fm = by_host.get(name)
kind = fm.get("kind") if fm else None
if kind in ("switch", "patch-panel"):
return f"{name}<br/>{kind}"
return name
lines: list[str] = ["```mermaid", "flowchart LR"]
for name in sorted(nodes):
lines.append(f' {_node_id(name)}["{node_label(name)}"]')
for source, local, peer, peer_port, speed in edges:
label = f"{local} → p{peer_port}"
if speed is not None:
label += f" · {speed}G"
lines.append(f" {_node_id(source)} -->|{label}| {_node_id(peer)}")
lines.append("```")
return "\n".join(lines) + "\n"
```
- [ ] **Step 4: Insert the `## Network` section in `render_page` in `scripts/gen_rack.py`**
`render_page` currently has this block (the Power section followed directly by Occupancy):
```python
power = render_power(rack, items)
if power:
lines.append("## Power")
lines.append("")
lines.append(power.rstrip())
lines.append("")
lines.append("## Occupancy")
```
Insert the Network section between the Power block and the Occupancy line:
```python
power = render_power(rack, items)
if power:
lines.append("## Power")
lines.append("")
lines.append(power.rstrip())
lines.append("")
network = render_network(rack, items)
if network:
lines.append("## Network")
lines.append("")
lines.append(network.rstrip())
lines.append("")
lines.append("## Occupancy")
```
- [ ] **Step 5: Run to verify pass**
Run: `pytest tests/test_gen_rack.py -q`
Expected: PASS (all prior tests + 6 new).
- [ ] **Step 6: Commit**
```bash
git add scripts/gen_rack.py tests/test_gen_rack.py
git commit -m "feat(rack): render mermaid network graph into the rack page"
```
---
### Task 3: Populate provisional network data, regenerate
**Files:**
- Create: `docs/hardware/sw01.md`, `docs/hardware/pp01.md`
- Modify: `docs/hardware/mf00.md`..`mf04.md` (add `links:`)
- Regenerate: `docs/hardware/index.md`, `docs/infrastructure/racks/rack01.md`, `docs/infrastructure/racks/rack01-elevation.svg`
**Interfaces:**
- Consumes: `python3 scripts/gen_rack.py` / `make docs-index`, `mkdocs build --strict`, `make docs-check`.
> **Operator note — provisional data.** The switch/patch-panel placements and the cable assignments below are placeholders proving the feature, matching the existing fictional mfNN positions and Phase 2 power data. Replace with real values when known; `validate_links` rejects dangling peers and over-count ports loudly. sw01/pp01 deliberately get no `power:` feeds in this phase.
- [ ] **Step 1: Create the switch and patch-panel files**
Create `docs/hardware/sw01.md`:
```markdown
---
hostname: sw01
kind: switch
status: in-use
rack: rack01
rack_u: 10
u_height: 1
rack_face: front
ports: 24
---
## Notes
- Provisional placeholder switch. Port assignments are not yet real.
```
Create `docs/hardware/pp01.md`:
```markdown
---
hostname: pp01
kind: patch-panel
status: in-use
rack: rack01
rack_u: 24
u_height: 1
rack_face: front
ports: 24
links:
- { local: uplink, peer: sw01, peer_port: 24, speed_gbps: 1 }
---
## Notes
- Provisional placeholder patch panel. Devices patch in here; rear uplink to sw01.
```
- [ ] **Step 2: Add `links:` to the five host files**
These files already carry rack-placement and `power:` frontmatter. ADD a `links:` block to each (before the closing `---`); do not remove anything.
In `docs/hardware/mf00.md` add:
```yaml
links:
- { local: eth0, peer: sw01, peer_port: 1, speed_gbps: 1 }
```
In `docs/hardware/mf01.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 1, speed_gbps: 1 }
```
In `docs/hardware/mf02.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 2, speed_gbps: 1 }
```
In `docs/hardware/mf03.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 3, speed_gbps: 1 }
```
In `docs/hardware/mf04.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 4, speed_gbps: 1 }
```
- [ ] **Step 3: Regenerate all indices and rack artifacts**
Run: `make docs-index`
Expected: `gen_overview.py` rewrites `docs/hardware/index.md` (now listing sw01 under "Switches" and pp01 under "Patch panels"); `gen_rack.py` prints `Wrote rack01.md + rack01-elevation.svg (9 item(s))`.
- [ ] **Step 4: Confirm the generated page has a network graph and the new boxes**
Run: `grep -c "→ p" docs/infrastructure/racks/rack01.md`
Expected: `6` (one network edge per link: mf00→sw01, mf01..mf04→pp01, pp01→sw01).
Run: `grep -q "sw01" docs/infrastructure/racks/rack01-elevation.svg && grep -q "pp01" docs/infrastructure/racks/rack01-elevation.svg && echo OK`
Expected: `OK` (switch and patch-panel drawn as boxes in the elevation).
- [ ] **Step 5: Run the full test suite**
Run: `make test`
Expected: PASS (all tests).
- [ ] **Step 6: Build the site strictly**
Run: `mkdocs build --strict` (if `mkdocs` is not on PATH, use `python3 -m mkdocs build --strict`)
Expected: build succeeds with no warnings-as-errors.
Verify: `grep -c "mermaid" site/infrastructure/racks/rack01/index.html`
Expected: `≥ 2` (a power block and a network block both render as mermaid diagrams).
- [ ] **Step 7: Confirm the drift guard is satisfied**
Run: `make docs-check`
Expected: exit 0 — committed artifacts match a fresh regeneration.
- [ ] **Step 8: Commit**
```bash
git add docs/hardware/ docs/infrastructure/racks/
git commit -m "feat(rack): populate provisional network topology (sw01, pp01, links)"
```
---
## Self-Review
**Spec coverage (`2026-06-24-rack-network-design.md`):**
- `links:` frontmatter on devices/peers — Task 3 (populate); validated Task 1. ✔
- Switch + patch-panel peer files (`ports`, placed 1U front) — Task 3; appear via Phase 1 SVG + gen_overview, no new code. ✔
- Validation rule 4 (peer resolves to a real file globally; peer_port within `ports` when declared; malformed/missing fields) — Task 1 (`validate_links` + `load_hardware_index`), wired into `generate`. ✔
- Global peer resolution (not per-rack) — Task 1 (`load_hardware_index` over all files; `generate` passes `hw_index`). ✔
- Mermaid network graph, full edge label (local → port · speed), kind subtitle for switch/patch-panel, omit-when-empty, deterministic — Task 2 (`render_network`), inserted in `render_page` between Power and Occupancy. ✔
- Node-id sanitization reused (`_node_id`) — Task 2. ✔
- Speed omitted when absent; unicode `→` — Task 2 (label build), tested. ✔
- No mkdocs/Makefile/CI/overview_config changes — honored (Global Constraints); drift covered by existing `racks/` diff — Task 3 Steps 3/7. ✔
- Provisional data (mf01mf04 → pp01 14; pp01 uplink → sw01:24; mf00 → sw01:1) — Task 3 Steps 12. ✔
**Placeholder scan:** No "TBD"/"handle edge cases"/"similar to Task N". The only operator-judgement item is provisional network values, explicitly bounded and guarded by `validate_links`.
**Type consistency:** `load_hardware_index``dict[str, dict]`; `validate_links(items, hw_index)`/`check_overlaps`/`validate_power``None` (raise `SchemaError`); `render_network`/`render_power`/`render_page`/`_node_id``str`; `generate``int` (0/1). `validate_links(ritems, hw_index)` is called per-rack alongside `check_overlaps`/`validate_power`, with `hw_index` built once at the top of `generate`. `render_network` consumes `_node_id` and feeds `render_page`. Names match across tasks and tests.