MakerFLOSS/notes/dev/plans/2026-06-24-rack-network.md

593 lines
23 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Rack Network (Phase 3) Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add network-cabling data (`links:` feeds + switch/patch-panel peer files) to the rack pipeline, validate it (rule 4), and render a mermaid network graph on the generated rack page — reusing every Phase 1/2 mechanism.
**Architecture:** Extend the existing `scripts/gen_rack.py` with `load_hardware_index` (global hostname→frontmatter map for peer resolution), `validate_links` (rule 4), and `render_network` (a `flowchart LR` with local interface, peer port, and speed on each edge label); insert a `## Network` section into `render_page` between Power and Occupancy. Switch/patch-panel files are normal placed items that Phase 1 already draws and `gen_overview.py` already lists. Mermaid is already enabled.
**Tech Stack:** Python 3 (stdlib + PyYAML only), pytest, MkDocs Material, Forgejo Actions CI.
**Spec:** `notes/dev/specs/2026-06-24-rack-network-design.md`.
## Global Constraints
- Scripts use **stdlib + PyYAML only**; deterministic and offline (copy existing `gen_rack.py` style). No randomness/time in generated output.
- `re` and `yaml` are already imported in `scripts/gen_rack.py`; do not add new imports.
- `_node_id` (Phase 2) is reused for mermaid node ids — do not redefine it.
- Validation failures raise `SchemaError`; `generate` prints `ERROR: …` to stderr and returns `1`, **writing nothing** on failure (existing behaviour).
- Generated files keep the existing `_Auto-generated … do not edit by hand_` banner (already emitted by `render_page`).
- **Peer resolution is global** (against all `docs/hardware/*.md` hostnames), not per-rack — rule 4 says "resolves to a real file".
- `peer_port` range is checked **only when the peer declares an integer `ports`**.
- Edge label format: `{local} → p{peer_port} · {speed}G`, with the ` · {speed}G` suffix omitted when `speed_gbps` is absent. Use the unicode arrow `→` (not `->`) to avoid clashing with mermaid's `-->` syntax.
- A node whose kind is `switch` or `patch-panel` renders as `{name}<br/>{kind}`; all other nodes render as the bare hostname.
- Network data added here is **provisional placeholder data** (like the mfNN positions and the Phase 2 power data), not real values.
- **No edits** to `mkdocs.yml`, `Makefile`, `.forgejo/workflows/docs.yml`, or `scripts/overview_config.yml` (`switch`/`patch-panel`/`ap` already in the enum; drift already covers `racks/`).
- `mkdocs build --strict` must pass; `make docs-check` must exit 0 after regeneration.
---
### Task 1: `load_hardware_index` + `validate_links` — rule 4 (TDD)
Add the global peer index and link validation, and wire `validate_links` into `generate`. Testable on validation alone.
**Files:**
- Modify: `scripts/gen_rack.py` (add `load_hardware_index`, `validate_links`; build the index and call `validate_links` in `generate`)
- Modify: `tests/test_gen_rack.py` (append tests)
**Interfaces:**
- Consumes: `SchemaError`, `parse_frontmatter`, the `item()`/`_write_item` test helpers, `generate`.
- Produces:
- `load_hardware_index(hardware_dir: Path) -> dict[str, dict]``{hostname: frontmatter}` for every `*.md` (excluding `index.md`).
- `validate_links(items: list[dict], hw_index: dict[str, dict]) -> None` — raises `SchemaError` on a malformed/dangling link.
- [ ] **Step 1: Append failing tests to `tests/test_gen_rack.py`**
```python
def test_load_hardware_index_maps_all_hostnames(tmp_path):
hw = tmp_path / "hardware"
hw.mkdir()
_write_item(
hw, "sw01",
"---\nhostname: sw01\nkind: switch\nstatus: in-use\nports: 24\n---\n",
)
_write_item(
hw, "mf00",
"---\nhostname: mf00\nkind: server\nstatus: in-use\n"
"rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n---\n",
)
idx = gen_rack.load_hardware_index(hw)
assert set(idx) == {"sw01", "mf00"}
assert idx["sw01"]["ports"] == 24
def test_validate_links_accepts_valid_link():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 1, "speed_gbps": 1}])]
hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
gen_rack.validate_links(items, hw_index)
def test_validate_links_rejects_unknown_peer():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "ghost", "peer_port": 1}])]
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, {})
def test_validate_links_rejects_peer_port_over_count():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01", "peer_port": 25}])]
hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, hw_index)
def test_validate_links_accepts_peer_without_ports():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "rtr01", "peer_port": 99}])]
hw_index = {"rtr01": item(hostname="rtr01", kind="server")}
gen_rack.validate_links(items, hw_index) # no ports -> range check skipped
def test_validate_links_rejects_missing_local():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"peer": "sw01", "peer_port": 1}])]
hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, hw_index)
def test_validate_links_rejects_malformed_entry():
items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=["sw01"])]
with pytest.raises(gen_rack.SchemaError):
gen_rack.validate_links(items, {})
def test_generate_returns_1_on_bad_link_peer(tmp_path):
hw = tmp_path / "hardware"
out = tmp_path / "out"
hw.mkdir()
_write_item(
hw, "mf00",
"---\nhostname: mf00\nkind: server\nstatus: in-use\n"
"rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n"
"links:\n - { local: eth0, peer: ghost, peer_port: 1 }\n---\n",
)
rc = gen_rack.generate(hw, out)
assert rc == 1
assert not (out / "rack01.md").exists()
```
- [ ] **Step 2: Run to verify failure**
Run: `pytest tests/test_gen_rack.py -q`
Expected: FAIL — `AttributeError: module 'gen_rack' has no attribute 'load_hardware_index'`.
- [ ] **Step 3: Add `load_hardware_index` and `validate_links` after `check_overlaps` in `scripts/gen_rack.py`**
Add these two functions (place them just after `check_overlaps`, before `_pdu_index`):
```python
def load_hardware_index(hardware_dir: Path) -> dict[str, dict]:
"""Map hostname -> frontmatter for every hardware file (global peer lookup)."""
index: dict[str, dict] = {}
for path in sorted(hardware_dir.glob("*.md")):
if path.name == "index.md":
continue
fm = parse_frontmatter(path)
if fm is None:
continue
name = fm.get("hostname")
if isinstance(name, str) and name:
index[name] = fm
return index
def validate_links(items: list[dict], hw_index: dict[str, dict]) -> None:
"""Validate `links` cable declarations (rule 4).
Every links[].peer must resolve to a real hardware file (global lookup via
hw_index); peer_port must fall within the peer's declared `ports` when it
declares an integer count.
"""
for fm in items:
links = fm.get("links")
if links is None:
continue
name = fm.get("hostname", "?")
if not isinstance(links, list):
raise SchemaError(f"{name}: links must be a list")
for link in links:
if not isinstance(link, dict):
raise SchemaError(f"{name}: links entry must be a mapping")
local = link.get("local")
peer = link.get("peer")
peer_port = link.get("peer_port")
if not isinstance(local, str) or not local:
raise SchemaError(f"{name}: links entry needs a non-empty 'local'")
if not isinstance(peer, str) or not peer:
raise SchemaError(f"{name}: links entry needs a non-empty 'peer'")
if not isinstance(peer_port, int):
raise SchemaError(
f"{name}: links entry for {peer} needs an integer 'peer_port'"
)
target = hw_index.get(peer)
if target is None:
raise SchemaError(
f"{name}: links peer={peer!r} is not a known hardware file"
)
ports = target.get("ports")
if isinstance(ports, int) and (peer_port < 1 or peer_port > ports):
raise SchemaError(
f"{name}: peer_port {peer_port} out of range 1..{ports} on {peer}"
)
```
- [ ] **Step 4: Wire `validate_links` into `generate` in `scripts/gen_rack.py`**
`generate` currently begins:
```python
def generate(hardware_dir: Path, output_dir: Path) -> int:
items = load_rack_items(hardware_dir)
errors: list[str] = []
```
Add the global index right after `items` is loaded:
```python
def generate(hardware_dir: Path, output_dir: Path) -> int:
items = load_rack_items(hardware_dir)
hw_index = load_hardware_index(hardware_dir)
errors: list[str] = []
```
Then extend the per-rack validation loop. Replace:
```python
if not errors: # only check overlaps once placements are individually valid
for rack, ritems in racks.items():
try:
check_overlaps(ritems)
validate_power(ritems)
except SchemaError as e:
errors.append(f"{rack}: {e}")
```
with:
```python
if not errors: # only check overlaps once placements are individually valid
for rack, ritems in racks.items():
try:
check_overlaps(ritems)
validate_power(ritems)
validate_links(ritems, hw_index)
except SchemaError as e:
errors.append(f"{rack}: {e}")
```
- [ ] **Step 5: Run to verify pass**
Run: `pytest tests/test_gen_rack.py -q`
Expected: PASS (all prior tests + 8 new).
- [ ] **Step 6: Commit**
```bash
git add scripts/gen_rack.py tests/test_gen_rack.py
git commit -m "feat(rack): validate network links against peer files and ports"
```
---
### Task 2: `render_network` + page section (TDD)
**Files:**
- Modify: `scripts/gen_rack.py` (add `render_network`; edit `render_page`)
- Modify: `tests/test_gen_rack.py` (append tests)
**Interfaces:**
- Consumes: `_node_id` (Phase 2), `render_page`, `generate`.
- Produces: `render_network(rack: str, items: list[dict]) -> str` — a fenced `mermaid` `flowchart LR` ending in a newline, or `""` when no item has a `links` feed.
- [ ] **Step 1: Append failing tests to `tests/test_gen_rack.py`**
```python
def test_render_network_has_nodes_and_edge_labels():
items = [
item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
rack_face="front", ports=24),
item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 1, "speed_gbps": 1}]),
]
out = gen_rack.render_network("rack01", items)
assert "```mermaid" in out
assert "flowchart LR" in out
assert "sw01<br/>switch" in out
assert "mf00" in out
assert "eth0" in out
assert "p1" in out
assert "1G" in out
def test_render_network_patch_panel_subtitle():
items = [
item(hostname="pp01", kind="patch-panel", rack_u=24, u_height=1,
rack_face="front", ports=24),
item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "pp01",
"peer_port": 1, "speed_gbps": 1}]),
]
out = gen_rack.render_network("rack01", items)
assert "pp01<br/>patch-panel" in out
def test_render_network_empty_when_no_links():
items = [item(hostname="mf00", rack_u=1, u_height=1, rack_face="front")]
assert gen_rack.render_network("rack01", items) == ""
def test_render_network_omits_speed_when_absent():
items = [
item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
rack_face="front", ports=24),
item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01", "peer_port": 1}]),
]
out = gen_rack.render_network("rack01", items)
assert "eth0" in out and "p1" in out
assert "·" not in out # no speed suffix rendered
def test_render_network_is_deterministic():
a = item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
rack_face="front", ports=24)
b = item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 2, "speed_gbps": 1}])
c = item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
links=[{"local": "eth0", "peer": "sw01",
"peer_port": 1, "speed_gbps": 1}])
assert gen_rack.render_network("rack01", [a, b, c]) == \
gen_rack.render_network("rack01", [c, b, a])
def test_generate_includes_network_section(tmp_path):
hw = tmp_path / "hardware"
out = tmp_path / "out"
hw.mkdir()
_write_item(
hw, "sw01",
"---\nhostname: sw01\nkind: switch\nstatus: in-use\n"
"rack: rack01\nrack_u: 10\nu_height: 1\nrack_face: front\nports: 24\n---\n",
)
_write_item(
hw, "mf00",
"---\nhostname: mf00\nkind: server\nstatus: in-use\n"
"rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n"
"links:\n - { local: eth0, peer: sw01, peer_port: 1, speed_gbps: 1 }\n---\n",
)
rc = gen_rack.generate(hw, out)
assert rc == 0
page = (out / "rack01.md").read_text()
assert "## Network" in page
assert "```mermaid" in page
assert "eth0" in page
```
- [ ] **Step 2: Run to verify failure**
Run: `pytest tests/test_gen_rack.py -q`
Expected: FAIL — `AttributeError: module 'gen_rack' has no attribute 'render_network'`.
- [ ] **Step 3: Add `render_network` after `render_power` in `scripts/gen_rack.py`**
```python
def render_network(rack: str, items: list[dict]) -> str:
"""Return a mermaid network-cabling flowchart, or '' if no links.
Assumes `validate_links` has already passed: every link has a non-empty
`local`/`peer` and an integer `peer_port`, and `peer` resolves to a real
hardware file. `generate` validates before any render call.
"""
linked = [fm for fm in items if fm.get("links")]
if not linked:
return ""
by_host = {fm.get("hostname"): fm for fm in items}
edges: list[tuple[str, str, str, int, object]] = []
nodes: set[str] = set()
for fm in linked:
source = fm.get("hostname", "?")
nodes.add(source)
for link in fm["links"]:
peer = link["peer"]
nodes.add(peer)
edges.append(
(source, link["local"], peer, link["peer_port"],
link.get("speed_gbps"))
)
edges.sort(key=lambda e: (e[0], e[1], e[2], e[3]))
def node_label(name: str) -> str:
fm = by_host.get(name)
kind = fm.get("kind") if fm else None
if kind in ("switch", "patch-panel"):
return f"{name}<br/>{kind}"
return name
lines: list[str] = ["```mermaid", "flowchart LR"]
for name in sorted(nodes):
lines.append(f' {_node_id(name)}["{node_label(name)}"]')
for source, local, peer, peer_port, speed in edges:
label = f"{local} → p{peer_port}"
if speed is not None:
label += f" · {speed}G"
lines.append(f" {_node_id(source)} -->|{label}| {_node_id(peer)}")
lines.append("```")
return "\n".join(lines) + "\n"
```
- [ ] **Step 4: Insert the `## Network` section in `render_page` in `scripts/gen_rack.py`**
`render_page` currently has this block (the Power section followed directly by Occupancy):
```python
power = render_power(rack, items)
if power:
lines.append("## Power")
lines.append("")
lines.append(power.rstrip())
lines.append("")
lines.append("## Occupancy")
```
Insert the Network section between the Power block and the Occupancy line:
```python
power = render_power(rack, items)
if power:
lines.append("## Power")
lines.append("")
lines.append(power.rstrip())
lines.append("")
network = render_network(rack, items)
if network:
lines.append("## Network")
lines.append("")
lines.append(network.rstrip())
lines.append("")
lines.append("## Occupancy")
```
- [ ] **Step 5: Run to verify pass**
Run: `pytest tests/test_gen_rack.py -q`
Expected: PASS (all prior tests + 6 new).
- [ ] **Step 6: Commit**
```bash
git add scripts/gen_rack.py tests/test_gen_rack.py
git commit -m "feat(rack): render mermaid network graph into the rack page"
```
---
### Task 3: Populate provisional network data, regenerate
**Files:**
- Create: `docs/hardware/sw01.md`, `docs/hardware/pp01.md`
- Modify: `docs/hardware/mf00.md`..`mf04.md` (add `links:`)
- Regenerate: `docs/hardware/index.md`, `docs/infrastructure/racks/rack01.md`, `docs/infrastructure/racks/rack01-elevation.svg`
**Interfaces:**
- Consumes: `python3 scripts/gen_rack.py` / `make docs-index`, `mkdocs build --strict`, `make docs-check`.
> **Operator note — provisional data.** The switch/patch-panel placements and the cable assignments below are placeholders proving the feature, matching the existing fictional mfNN positions and Phase 2 power data. Replace with real values when known; `validate_links` rejects dangling peers and over-count ports loudly. sw01/pp01 deliberately get no `power:` feeds in this phase.
- [ ] **Step 1: Create the switch and patch-panel files**
Create `docs/hardware/sw01.md`:
```markdown
---
hostname: sw01
kind: switch
status: in-use
rack: rack01
rack_u: 10
u_height: 1
rack_face: front
ports: 24
---
## Notes
- Provisional placeholder switch. Port assignments are not yet real.
```
Create `docs/hardware/pp01.md`:
```markdown
---
hostname: pp01
kind: patch-panel
status: in-use
rack: rack01
rack_u: 24
u_height: 1
rack_face: front
ports: 24
links:
- { local: uplink, peer: sw01, peer_port: 24, speed_gbps: 1 }
---
## Notes
- Provisional placeholder patch panel. Devices patch in here; rear uplink to sw01.
```
- [ ] **Step 2: Add `links:` to the five host files**
These files already carry rack-placement and `power:` frontmatter. ADD a `links:` block to each (before the closing `---`); do not remove anything.
In `docs/hardware/mf00.md` add:
```yaml
links:
- { local: eth0, peer: sw01, peer_port: 1, speed_gbps: 1 }
```
In `docs/hardware/mf01.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 1, speed_gbps: 1 }
```
In `docs/hardware/mf02.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 2, speed_gbps: 1 }
```
In `docs/hardware/mf03.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 3, speed_gbps: 1 }
```
In `docs/hardware/mf04.md` add:
```yaml
links:
- { local: eth0, peer: pp01, peer_port: 4, speed_gbps: 1 }
```
- [ ] **Step 3: Regenerate all indices and rack artifacts**
Run: `make docs-index`
Expected: `gen_overview.py` rewrites `docs/hardware/index.md` (now listing sw01 under "Switches" and pp01 under "Patch panels"); `gen_rack.py` prints `Wrote rack01.md + rack01-elevation.svg (9 item(s))`.
- [ ] **Step 4: Confirm the generated page has a network graph and the new boxes**
Run: `grep -c "→ p" docs/infrastructure/racks/rack01.md`
Expected: `6` (one network edge per link: mf00→sw01, mf01..mf04→pp01, pp01→sw01).
Run: `grep -q "sw01" docs/infrastructure/racks/rack01-elevation.svg && grep -q "pp01" docs/infrastructure/racks/rack01-elevation.svg && echo OK`
Expected: `OK` (switch and patch-panel drawn as boxes in the elevation).
- [ ] **Step 5: Run the full test suite**
Run: `make test`
Expected: PASS (all tests).
- [ ] **Step 6: Build the site strictly**
Run: `mkdocs build --strict` (if `mkdocs` is not on PATH, use `python3 -m mkdocs build --strict`)
Expected: build succeeds with no warnings-as-errors.
Verify: `grep -c "mermaid" site/infrastructure/racks/rack01/index.html`
Expected: `≥ 2` (a power block and a network block both render as mermaid diagrams).
- [ ] **Step 7: Confirm the drift guard is satisfied**
Run: `make docs-check`
Expected: exit 0 — committed artifacts match a fresh regeneration.
- [ ] **Step 8: Commit**
```bash
git add docs/hardware/ docs/infrastructure/racks/
git commit -m "feat(rack): populate provisional network topology (sw01, pp01, links)"
```
---
## Self-Review
**Spec coverage (`2026-06-24-rack-network-design.md`):**
- `links:` frontmatter on devices/peers — Task 3 (populate); validated Task 1. ✔
- Switch + patch-panel peer files (`ports`, placed 1U front) — Task 3; appear via Phase 1 SVG + gen_overview, no new code. ✔
- Validation rule 4 (peer resolves to a real file globally; peer_port within `ports` when declared; malformed/missing fields) — Task 1 (`validate_links` + `load_hardware_index`), wired into `generate`. ✔
- Global peer resolution (not per-rack) — Task 1 (`load_hardware_index` over all files; `generate` passes `hw_index`). ✔
- Mermaid network graph, full edge label (local → port · speed), kind subtitle for switch/patch-panel, omit-when-empty, deterministic — Task 2 (`render_network`), inserted in `render_page` between Power and Occupancy. ✔
- Node-id sanitization reused (`_node_id`) — Task 2. ✔
- Speed omitted when absent; unicode `→` — Task 2 (label build), tested. ✔
- No mkdocs/Makefile/CI/overview_config changes — honored (Global Constraints); drift covered by existing `racks/` diff — Task 3 Steps 3/7. ✔
- Provisional data (mf01mf04 → pp01 14; pp01 uplink → sw01:24; mf00 → sw01:1) — Task 3 Steps 12. ✔
**Placeholder scan:** No "TBD"/"handle edge cases"/"similar to Task N". The only operator-judgement item is provisional network values, explicitly bounded and guarded by `validate_links`.
**Type consistency:** `load_hardware_index``dict[str, dict]`; `validate_links(items, hw_index)`/`check_overlaps`/`validate_power``None` (raise `SchemaError`); `render_network`/`render_power`/`render_page`/`_node_id``str`; `generate``int` (0/1). `validate_links(ritems, hw_index)` is called per-rack alongside `check_overlaps`/`validate_power`, with `hw_index` built once at the top of `generate`. `render_network` consumes `_node_id` and feeds `render_page`. Names match across tasks and tests.