MakerFLOSS/notes/dev/plans/2026-06-24-rack-network.md

23 KiB
Raw Blame History

Rack Network (Phase 3) Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add network-cabling data (links: feeds + switch/patch-panel peer files) to the rack pipeline, validate it (rule 4), and render a mermaid network graph on the generated rack page — reusing every Phase 1/2 mechanism.

Architecture: Extend the existing scripts/gen_rack.py with load_hardware_index (global hostname→frontmatter map for peer resolution), validate_links (rule 4), and render_network (a flowchart LR with local interface, peer port, and speed on each edge label); insert a ## Network section into render_page between Power and Occupancy. Switch/patch-panel files are normal placed items that Phase 1 already draws and gen_overview.py already lists. Mermaid is already enabled.

Tech Stack: Python 3 (stdlib + PyYAML only), pytest, MkDocs Material, Forgejo Actions CI.

Spec: notes/dev/specs/2026-06-24-rack-network-design.md.

Global Constraints

  • Scripts use stdlib + PyYAML only; deterministic and offline (copy existing gen_rack.py style). No randomness/time in generated output.
  • re and yaml are already imported in scripts/gen_rack.py; do not add new imports.
  • _node_id (Phase 2) is reused for mermaid node ids — do not redefine it.
  • Validation failures raise SchemaError; generate prints ERROR: … to stderr and returns 1, writing nothing on failure (existing behaviour).
  • Generated files keep the existing _Auto-generated … do not edit by hand_ banner (already emitted by render_page).
  • Peer resolution is global (against all docs/hardware/*.md hostnames), not per-rack — rule 4 says "resolves to a real file".
  • peer_port range is checked only when the peer declares an integer ports.
  • Edge label format: {local} → p{peer_port} · {speed}G, with the · {speed}G suffix omitted when speed_gbps is absent. Use the unicode arrow (not ->) to avoid clashing with mermaid's --> syntax.
  • A node whose kind is switch or patch-panel renders as {name}<br/>{kind}; all other nodes render as the bare hostname.
  • Network data added here is provisional placeholder data (like the mfNN positions and the Phase 2 power data), not real values.
  • No edits to mkdocs.yml, Makefile, .forgejo/workflows/docs.yml, or scripts/overview_config.yml (switch/patch-panel/ap already in the enum; drift already covers racks/).
  • mkdocs build --strict must pass; make docs-check must exit 0 after regeneration.

Add the global peer index and link validation, and wire validate_links into generate. Testable on validation alone.

Files:

  • Modify: scripts/gen_rack.py (add load_hardware_index, validate_links; build the index and call validate_links in generate)
  • Modify: tests/test_gen_rack.py (append tests)

Interfaces:

  • Consumes: SchemaError, parse_frontmatter, the item()/_write_item test helpers, generate.

  • Produces:

    • load_hardware_index(hardware_dir: Path) -> dict[str, dict]{hostname: frontmatter} for every *.md (excluding index.md).
    • validate_links(items: list[dict], hw_index: dict[str, dict]) -> None — raises SchemaError on a malformed/dangling link.
  • Step 1: Append failing tests to tests/test_gen_rack.py

def test_load_hardware_index_maps_all_hostnames(tmp_path):
    hw = tmp_path / "hardware"
    hw.mkdir()
    _write_item(
        hw, "sw01",
        "---\nhostname: sw01\nkind: switch\nstatus: in-use\nports: 24\n---\n",
    )
    _write_item(
        hw, "mf00",
        "---\nhostname: mf00\nkind: server\nstatus: in-use\n"
        "rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n---\n",
    )
    idx = gen_rack.load_hardware_index(hw)
    assert set(idx) == {"sw01", "mf00"}
    assert idx["sw01"]["ports"] == 24


def test_validate_links_accepts_valid_link():
    items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
                  links=[{"local": "eth0", "peer": "sw01",
                          "peer_port": 1, "speed_gbps": 1}])]
    hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
    gen_rack.validate_links(items, hw_index)


def test_validate_links_rejects_unknown_peer():
    items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
                  links=[{"local": "eth0", "peer": "ghost", "peer_port": 1}])]
    with pytest.raises(gen_rack.SchemaError):
        gen_rack.validate_links(items, {})


def test_validate_links_rejects_peer_port_over_count():
    items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
                  links=[{"local": "eth0", "peer": "sw01", "peer_port": 25}])]
    hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
    with pytest.raises(gen_rack.SchemaError):
        gen_rack.validate_links(items, hw_index)


def test_validate_links_accepts_peer_without_ports():
    items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
                  links=[{"local": "eth0", "peer": "rtr01", "peer_port": 99}])]
    hw_index = {"rtr01": item(hostname="rtr01", kind="server")}
    gen_rack.validate_links(items, hw_index)  # no ports -> range check skipped


def test_validate_links_rejects_missing_local():
    items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
                  links=[{"peer": "sw01", "peer_port": 1}])]
    hw_index = {"sw01": item(hostname="sw01", kind="switch", ports=24)}
    with pytest.raises(gen_rack.SchemaError):
        gen_rack.validate_links(items, hw_index)


def test_validate_links_rejects_malformed_entry():
    items = [item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
                  links=["sw01"])]
    with pytest.raises(gen_rack.SchemaError):
        gen_rack.validate_links(items, {})


def test_generate_returns_1_on_bad_link_peer(tmp_path):
    hw = tmp_path / "hardware"
    out = tmp_path / "out"
    hw.mkdir()
    _write_item(
        hw, "mf00",
        "---\nhostname: mf00\nkind: server\nstatus: in-use\n"
        "rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n"
        "links:\n  - { local: eth0, peer: ghost, peer_port: 1 }\n---\n",
    )
    rc = gen_rack.generate(hw, out)
    assert rc == 1
    assert not (out / "rack01.md").exists()
  • Step 2: Run to verify failure

Run: pytest tests/test_gen_rack.py -q Expected: FAIL — AttributeError: module 'gen_rack' has no attribute 'load_hardware_index'.

  • Step 3: Add load_hardware_index and validate_links after check_overlaps in scripts/gen_rack.py

Add these two functions (place them just after check_overlaps, before _pdu_index):

def load_hardware_index(hardware_dir: Path) -> dict[str, dict]:
    """Map hostname -> frontmatter for every hardware file (global peer lookup)."""
    index: dict[str, dict] = {}
    for path in sorted(hardware_dir.glob("*.md")):
        if path.name == "index.md":
            continue
        fm = parse_frontmatter(path)
        if fm is None:
            continue
        name = fm.get("hostname")
        if isinstance(name, str) and name:
            index[name] = fm
    return index


def validate_links(items: list[dict], hw_index: dict[str, dict]) -> None:
    """Validate `links` cable declarations (rule 4).

    Every links[].peer must resolve to a real hardware file (global lookup via
    hw_index); peer_port must fall within the peer's declared `ports` when it
    declares an integer count.
    """
    for fm in items:
        links = fm.get("links")
        if links is None:
            continue
        name = fm.get("hostname", "?")
        if not isinstance(links, list):
            raise SchemaError(f"{name}: links must be a list")
        for link in links:
            if not isinstance(link, dict):
                raise SchemaError(f"{name}: links entry must be a mapping")
            local = link.get("local")
            peer = link.get("peer")
            peer_port = link.get("peer_port")
            if not isinstance(local, str) or not local:
                raise SchemaError(f"{name}: links entry needs a non-empty 'local'")
            if not isinstance(peer, str) or not peer:
                raise SchemaError(f"{name}: links entry needs a non-empty 'peer'")
            if not isinstance(peer_port, int):
                raise SchemaError(
                    f"{name}: links entry for {peer} needs an integer 'peer_port'"
                )
            target = hw_index.get(peer)
            if target is None:
                raise SchemaError(
                    f"{name}: links peer={peer!r} is not a known hardware file"
                )
            ports = target.get("ports")
            if isinstance(ports, int) and (peer_port < 1 or peer_port > ports):
                raise SchemaError(
                    f"{name}: peer_port {peer_port} out of range 1..{ports} on {peer}"
                )
  • Step 4: Wire validate_links into generate in scripts/gen_rack.py

generate currently begins:

def generate(hardware_dir: Path, output_dir: Path) -> int:
    items = load_rack_items(hardware_dir)

    errors: list[str] = []

Add the global index right after items is loaded:

def generate(hardware_dir: Path, output_dir: Path) -> int:
    items = load_rack_items(hardware_dir)
    hw_index = load_hardware_index(hardware_dir)

    errors: list[str] = []

Then extend the per-rack validation loop. Replace:

    if not errors:  # only check overlaps once placements are individually valid
        for rack, ritems in racks.items():
            try:
                check_overlaps(ritems)
                validate_power(ritems)
            except SchemaError as e:
                errors.append(f"{rack}: {e}")

with:

    if not errors:  # only check overlaps once placements are individually valid
        for rack, ritems in racks.items():
            try:
                check_overlaps(ritems)
                validate_power(ritems)
                validate_links(ritems, hw_index)
            except SchemaError as e:
                errors.append(f"{rack}: {e}")
  • Step 5: Run to verify pass

Run: pytest tests/test_gen_rack.py -q Expected: PASS (all prior tests + 8 new).

  • Step 6: Commit
git add scripts/gen_rack.py tests/test_gen_rack.py
git commit -m "feat(rack): validate network links against peer files and ports"

Task 2: render_network + page section (TDD)

Files:

  • Modify: scripts/gen_rack.py (add render_network; edit render_page)
  • Modify: tests/test_gen_rack.py (append tests)

Interfaces:

  • Consumes: _node_id (Phase 2), render_page, generate.

  • Produces: render_network(rack: str, items: list[dict]) -> str — a fenced mermaid flowchart LR ending in a newline, or "" when no item has a links feed.

  • Step 1: Append failing tests to tests/test_gen_rack.py

def test_render_network_has_nodes_and_edge_labels():
    items = [
        item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
             rack_face="front", ports=24),
        item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
             links=[{"local": "eth0", "peer": "sw01",
                     "peer_port": 1, "speed_gbps": 1}]),
    ]
    out = gen_rack.render_network("rack01", items)
    assert "```mermaid" in out
    assert "flowchart LR" in out
    assert "sw01<br/>switch" in out
    assert "mf00" in out
    assert "eth0" in out
    assert "p1" in out
    assert "1G" in out


def test_render_network_patch_panel_subtitle():
    items = [
        item(hostname="pp01", kind="patch-panel", rack_u=24, u_height=1,
             rack_face="front", ports=24),
        item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
             links=[{"local": "eth0", "peer": "pp01",
                     "peer_port": 1, "speed_gbps": 1}]),
    ]
    out = gen_rack.render_network("rack01", items)
    assert "pp01<br/>patch-panel" in out


def test_render_network_empty_when_no_links():
    items = [item(hostname="mf00", rack_u=1, u_height=1, rack_face="front")]
    assert gen_rack.render_network("rack01", items) == ""


def test_render_network_omits_speed_when_absent():
    items = [
        item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
             rack_face="front", ports=24),
        item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
             links=[{"local": "eth0", "peer": "sw01", "peer_port": 1}]),
    ]
    out = gen_rack.render_network("rack01", items)
    assert "eth0" in out and "p1" in out
    assert "·" not in out  # no speed suffix rendered


def test_render_network_is_deterministic():
    a = item(hostname="sw01", kind="switch", rack_u=10, u_height=1,
             rack_face="front", ports=24)
    b = item(hostname="mf01", rack_u=2, u_height=1, rack_face="front",
             links=[{"local": "eth0", "peer": "sw01",
                     "peer_port": 2, "speed_gbps": 1}])
    c = item(hostname="mf00", rack_u=1, u_height=1, rack_face="front",
             links=[{"local": "eth0", "peer": "sw01",
                     "peer_port": 1, "speed_gbps": 1}])
    assert gen_rack.render_network("rack01", [a, b, c]) == \
        gen_rack.render_network("rack01", [c, b, a])


def test_generate_includes_network_section(tmp_path):
    hw = tmp_path / "hardware"
    out = tmp_path / "out"
    hw.mkdir()
    _write_item(
        hw, "sw01",
        "---\nhostname: sw01\nkind: switch\nstatus: in-use\n"
        "rack: rack01\nrack_u: 10\nu_height: 1\nrack_face: front\nports: 24\n---\n",
    )
    _write_item(
        hw, "mf00",
        "---\nhostname: mf00\nkind: server\nstatus: in-use\n"
        "rack: rack01\nrack_u: 1\nu_height: 1\nrack_face: front\n"
        "links:\n  - { local: eth0, peer: sw01, peer_port: 1, speed_gbps: 1 }\n---\n",
    )
    rc = gen_rack.generate(hw, out)
    assert rc == 0
    page = (out / "rack01.md").read_text()
    assert "## Network" in page
    assert "```mermaid" in page
    assert "eth0" in page
  • Step 2: Run to verify failure

Run: pytest tests/test_gen_rack.py -q Expected: FAIL — AttributeError: module 'gen_rack' has no attribute 'render_network'.

  • Step 3: Add render_network after render_power in scripts/gen_rack.py
def render_network(rack: str, items: list[dict]) -> str:
    """Return a mermaid network-cabling flowchart, or '' if no links.

    Assumes `validate_links` has already passed: every link has a non-empty
    `local`/`peer` and an integer `peer_port`, and `peer` resolves to a real
    hardware file. `generate` validates before any render call.
    """
    linked = [fm for fm in items if fm.get("links")]
    if not linked:
        return ""

    by_host = {fm.get("hostname"): fm for fm in items}

    edges: list[tuple[str, str, str, int, object]] = []
    nodes: set[str] = set()
    for fm in linked:
        source = fm.get("hostname", "?")
        nodes.add(source)
        for link in fm["links"]:
            peer = link["peer"]
            nodes.add(peer)
            edges.append(
                (source, link["local"], peer, link["peer_port"],
                 link.get("speed_gbps"))
            )
    edges.sort(key=lambda e: (e[0], e[1], e[2], e[3]))

    def node_label(name: str) -> str:
        fm = by_host.get(name)
        kind = fm.get("kind") if fm else None
        if kind in ("switch", "patch-panel"):
            return f"{name}<br/>{kind}"
        return name

    lines: list[str] = ["```mermaid", "flowchart LR"]
    for name in sorted(nodes):
        lines.append(f'    {_node_id(name)}["{node_label(name)}"]')
    for source, local, peer, peer_port, speed in edges:
        label = f"{local} → p{peer_port}"
        if speed is not None:
            label += f" · {speed}G"
        lines.append(f"    {_node_id(source)} -->|{label}| {_node_id(peer)}")
    lines.append("```")
    return "\n".join(lines) + "\n"
  • Step 4: Insert the ## Network section in render_page in scripts/gen_rack.py

render_page currently has this block (the Power section followed directly by Occupancy):

    power = render_power(rack, items)
    if power:
        lines.append("## Power")
        lines.append("")
        lines.append(power.rstrip())
        lines.append("")
    lines.append("## Occupancy")

Insert the Network section between the Power block and the Occupancy line:

    power = render_power(rack, items)
    if power:
        lines.append("## Power")
        lines.append("")
        lines.append(power.rstrip())
        lines.append("")
    network = render_network(rack, items)
    if network:
        lines.append("## Network")
        lines.append("")
        lines.append(network.rstrip())
        lines.append("")
    lines.append("## Occupancy")
  • Step 5: Run to verify pass

Run: pytest tests/test_gen_rack.py -q Expected: PASS (all prior tests + 6 new).

  • Step 6: Commit
git add scripts/gen_rack.py tests/test_gen_rack.py
git commit -m "feat(rack): render mermaid network graph into the rack page"

Task 3: Populate provisional network data, regenerate

Files:

  • Create: docs/hardware/sw01.md, docs/hardware/pp01.md
  • Modify: docs/hardware/mf00.md..mf04.md (add links:)
  • Regenerate: docs/hardware/index.md, docs/infrastructure/racks/rack01.md, docs/infrastructure/racks/rack01-elevation.svg

Interfaces:

  • Consumes: python3 scripts/gen_rack.py / make docs-index, mkdocs build --strict, make docs-check.

Operator note — provisional data. The switch/patch-panel placements and the cable assignments below are placeholders proving the feature, matching the existing fictional mfNN positions and Phase 2 power data. Replace with real values when known; validate_links rejects dangling peers and over-count ports loudly. sw01/pp01 deliberately get no power: feeds in this phase.

  • Step 1: Create the switch and patch-panel files

Create docs/hardware/sw01.md:

---
hostname: sw01
kind: switch
status: in-use
rack: rack01
rack_u: 10
u_height: 1
rack_face: front
ports: 24
---

## Notes

- Provisional placeholder switch. Port assignments are not yet real.

Create docs/hardware/pp01.md:

---
hostname: pp01
kind: patch-panel
status: in-use
rack: rack01
rack_u: 24
u_height: 1
rack_face: front
ports: 24
links:
  - { local: uplink, peer: sw01, peer_port: 24, speed_gbps: 1 }
---

## Notes

- Provisional placeholder patch panel. Devices patch in here; rear uplink to sw01.
  • Step 2: Add links: to the five host files

These files already carry rack-placement and power: frontmatter. ADD a links: block to each (before the closing ---); do not remove anything.

In docs/hardware/mf00.md add:

links:
  - { local: eth0, peer: sw01, peer_port: 1, speed_gbps: 1 }

In docs/hardware/mf01.md add:

links:
  - { local: eth0, peer: pp01, peer_port: 1, speed_gbps: 1 }

In docs/hardware/mf02.md add:

links:
  - { local: eth0, peer: pp01, peer_port: 2, speed_gbps: 1 }

In docs/hardware/mf03.md add:

links:
  - { local: eth0, peer: pp01, peer_port: 3, speed_gbps: 1 }

In docs/hardware/mf04.md add:

links:
  - { local: eth0, peer: pp01, peer_port: 4, speed_gbps: 1 }
  • Step 3: Regenerate all indices and rack artifacts

Run: make docs-index Expected: gen_overview.py rewrites docs/hardware/index.md (now listing sw01 under "Switches" and pp01 under "Patch panels"); gen_rack.py prints Wrote rack01.md + rack01-elevation.svg (9 item(s)).

  • Step 4: Confirm the generated page has a network graph and the new boxes

Run: grep -c "→ p" docs/infrastructure/racks/rack01.md Expected: 6 (one network edge per link: mf00→sw01, mf01..mf04→pp01, pp01→sw01).

Run: grep -q "sw01" docs/infrastructure/racks/rack01-elevation.svg && grep -q "pp01" docs/infrastructure/racks/rack01-elevation.svg && echo OK Expected: OK (switch and patch-panel drawn as boxes in the elevation).

  • Step 5: Run the full test suite

Run: make test Expected: PASS (all tests).

  • Step 6: Build the site strictly

Run: mkdocs build --strict (if mkdocs is not on PATH, use python3 -m mkdocs build --strict) Expected: build succeeds with no warnings-as-errors.

Verify: grep -c "mermaid" site/infrastructure/racks/rack01/index.html Expected: ≥ 2 (a power block and a network block both render as mermaid diagrams).

  • Step 7: Confirm the drift guard is satisfied

Run: make docs-check Expected: exit 0 — committed artifacts match a fresh regeneration.

  • Step 8: Commit
git add docs/hardware/ docs/infrastructure/racks/
git commit -m "feat(rack): populate provisional network topology (sw01, pp01, links)"

Self-Review

Spec coverage (2026-06-24-rack-network-design.md):

  • links: frontmatter on devices/peers — Task 3 (populate); validated Task 1. ✔
  • Switch + patch-panel peer files (ports, placed 1U front) — Task 3; appear via Phase 1 SVG + gen_overview, no new code. ✔
  • Validation rule 4 (peer resolves to a real file globally; peer_port within ports when declared; malformed/missing fields) — Task 1 (validate_links + load_hardware_index), wired into generate. ✔
  • Global peer resolution (not per-rack) — Task 1 (load_hardware_index over all files; generate passes hw_index). ✔
  • Mermaid network graph, full edge label (local → port · speed), kind subtitle for switch/patch-panel, omit-when-empty, deterministic — Task 2 (render_network), inserted in render_page between Power and Occupancy. ✔
  • Node-id sanitization reused (_node_id) — Task 2. ✔
  • Speed omitted when absent; unicode — Task 2 (label build), tested. ✔
  • No mkdocs/Makefile/CI/overview_config changes — honored (Global Constraints); drift covered by existing racks/ diff — Task 3 Steps 3/7. ✔
  • Provisional data (mf01mf04 → pp01 14; pp01 uplink → sw01:24; mf00 → sw01:1) — Task 3 Steps 12. ✔

Placeholder scan: No "TBD"/"handle edge cases"/"similar to Task N". The only operator-judgement item is provisional network values, explicitly bounded and guarded by validate_links.

Type consistency: load_hardware_indexdict[str, dict]; validate_links(items, hw_index)/check_overlaps/validate_powerNone (raise SchemaError); render_network/render_power/render_page/_node_idstr; generateint (0/1). validate_links(ritems, hw_index) is called per-rack alongside check_overlaps/validate_power, with hw_index built once at the top of generate. render_network consumes _node_id and feeds render_page. Names match across tasks and tests.