GCodeOverlay/docs/superpowers/specs/2026-06-08-gcode-overlay-design.md
sjat 08767cf821 Add G-code overlay design spec
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:55:06 +02:00

124 lines
7.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# G-Code Overlay — Design
**Date:** 2026-06-08
**Status:** Approved (pending spec review)
## Purpose
At the makerspace, a fixed camera mounted on the wall films the CNC router table top-down, and the feed is restreamed to monitors around the space. This project is a web app that overlays a G-code toolpath on top of that camera stream, so an operator can:
- **Reality-check a job** — see where the machine will actually cut before/while it runs.
- **Plan fixturing** — see where the tool *won't* go, to decide where screws/clamps can safely be placed.
A user opens a local G-code file in the browser, the toolpath is drawn over the live video, and they align it to the real material.
## Scope
**In scope (v1):**
- Load a local G-code file (no upload; parsed in-browser).
- Draw cutting paths (G1/G2/G3) and rapid/travel moves (G0, styled distinctly) over the video.
- One-time camera→machine calibration (perspective transform), shared via a served config file.
- Per-job alignment of the toolpath to the material (drag+rotate, click-to-set-origin, numeric offset).
**Out of scope (v1), designed to add later:**
- Plunge/screw-placement highlighting.
- Depth cues (color/weight by Z or cut order).
- Broadcasting the overlay to all makerspace monitors (requires shared session state / backend).
- Lens-distortion correction.
- Multi-camera support.
**Audience:** local viewer only. The overlay appears on the screen of whoever loaded the G-code; the makerspace-wide monitors keep showing the existing raw stream. No shared session state.
## Architecture
A pure client-side static single-page app — **TypeScript + Vite + Canvas 2D**, no backend. The page stacks three layers:
1. **Video layer** — the camera stream, shown in an `<img>` (MJPEG) or `<video>` (HLS/WebRTC) element. Exact format confirmed early in implementation (see Open Questions). The overlay never reads the stream's pixels, so there is no CORS dependency and no proxy is needed.
2. **Overlay canvas** — a transparent `<canvas>` sized to the video element, on which the toolpaths are drawn.
3. **UI** — file open, calibration panel, alignment controls.
### Transforms
Two transforms do all the geometric work:
- **Calibration homography `H`** (fixed, one-time): maps **machine-mm → image-px**. A 3×3 projective transform. Calibrated once by jogging the spindle to known machine coordinates and clicking the tip in the video — this ties the camera image directly to the machine's own coordinate system (the same system the G-code lives in), collapsing "camera→table" and "table→machine" into one transform with no origin guesswork.
- **Per-job alignment `A`** (per session): maps **G-code work-coords → machine-mm** (rotation + translation), accounting for where the operator zeroed the material this job.
**Render pipeline:**
```
G-code text
→ parse → polylines in work-mm (arcs flattened to segments)
→ apply A (per-job align) → machine-mm
→ apply H (calibration homography) → image-px
→ draw on overlay canvas
```
> **Why transform points in JS rather than the canvas?** A camera-perspective transform is projective, and neither Canvas 2D nor SVG can apply a projective transform natively (both are affine-only). Points are therefore transformed in JS, then drawn as polylines. This also makes Canvas 2D the natural, dependency-light choice.
## Modules
- **`gcode-parser`** — text → modal-state interpreter → ordered list of segments `{kind: 'cut' | 'rapid', points: Vec2[]}` in mm. Handles G0/G1/G2/G3, G90/G91 (absolute/incremental), G20/G21 (units), G17 (XY plane assumed), arc forms (I/J and R), and full circles. Z is read for modal state but v1 projects to XY only.
- **`geometry`** — homography estimation from ≥4 point-pairs (DLT), apply and invert; affine compose/decompose for the per-job transform; arc flattening to a chord tolerance.
- **`calibration`** — UI to capture point-pairs (click image + type machine X/Y), compute `H`, display per-point residual error (px) to catch a bad click, and export/import the calibration as JSON.
- **`alignment`** — per-job placement: drag+rotate by eye (primary), click-to-set-origin (inverse-`H` a video click into machine space), numeric X/Y/rotation entry. All three set the same `A`.
- **`renderer`** — draw cuts (solid) vs rapids (dashed/faint), redraw on any change, track and match the video element's size/position.
- **`app/config`** — load `config.json`, hold app state, wire modules together.
## Config / persistence
A `config.json` served alongside the app:
```json
{
"streamUrl": "http://internal-server/stream",
"calibration": {
"imagePoints": [[px, py], ...],
"machinePoints": [[mx, my], ...],
"homography": [[...], [...], [...]]
},
"renderDefaults": { "cutColor": "...", "rapidColor": "...", "lineWidth": 1 }
}
```
There is no backend, so persistence of a new calibration = committing/dropping an updated `config.json`. The in-app calibration tool computes the new values and presents the JSON to paste back into the file. Per-job alignment is ephemeral session state (not persisted).
## Workflows
### Calibration (one-time)
1. Enter calibration mode.
2. For each of 46 well-spread points (near the table corners):
- Jog the CNC spindle to a known machine X/Y.
- Type those coordinates.
- Click the spindle tip where it appears in the video.
3. Compute `H`; display residual error per point so a misclick is obvious.
4. Export the JSON and save it into `config.json`.
### Per-job
1. Open a G-code file (drag-drop or file picker) — parsed locally.
2. The toolpath appears over the video at a default placement.
3. Align it: drag+rotate by eye (primary), or click-to-set-origin, or type a numeric offset; arrow keys nudge for fine tuning.
4. Reality-check the toolpath against the live cut.
## Error handling
- **Unsupported/garbage G-code lines** — skipped, counted, surfaced as a non-fatal warning. Never crash.
- **Arcs** — flattened to a chord tolerance; both R and I/J forms; full circles handled.
- **Missing/incomplete calibration** — app still loads; overlay disabled with a "calibrate first" prompt.
- **Stream not loading** — overlay still usable over a blank/last frame; a notice is shown.
- **Units** — normalized to mm internally regardless of G20/G21.
## Testing
TDD for the testable cores:
- **Parser** — sample G-code files → expected segment lists (modal state, incremental/absolute, units, arcs).
- **Geometry** — known point-pairs → known mapping; round-trip apply/invert; arc flattening within tolerance.
Manual visual check: overlay a known test pattern (e.g. a measured rectangle) over the real stream and confirm alignment.
## Risks / open questions
- **Lens distortion** — a wide-angle camera ~2.5 m up may bow straight cuts; a plain homography assumes straight lines stay straight. The calibration residual-error display will reveal if this is a real problem; if so, add a distortion-correction step (deferred).
- **Stream format unknown** — the internal server's protocol (MJPEG / HLS / WebRTC) is not yet confirmed; this determines the embedding element. Confirm early in implementation.
- **Camera not perfectly top-down** — fine for a homography as long as the table is planar and calibration points are well spread.