Coverage Manifold¶
Overview¶
The Coverage Manifold is a 2D visualization of MCC's high-dimensional configuration space. It shows all proven, failed, and untested configurations as points in a scatter plot, projected via PCA (Principal Component Analysis) from 7 categorical dimensions down to 2D coordinates.
The manifold helps you: - See where your config sits relative to proven paths - Identify coverage gaps (dark regions with few points) - Find nearest proven alternatives when your config is untested
How It Works¶
Dimensions¶
Each configuration is encoded as a vector of 7 categorical dimensions:
| Dimension | Values |
|---|---|
deployment_config |
http-flask, transformers-vllm, transformers-sglang, ... |
model_family |
qwen3, llama3, deepseek-r1, ... |
instance_family |
g5, g6, g6e, p5, ... |
quantization |
none, fp8, int8, awq, gptq, ... |
tp_degree |
1, 2, 4, 8 |
enable_lora |
true, false |
deployment_target |
realtime-inference, async-inference, batch-transform, hyperpod-eks |
PCA Projection¶
Categories are encoded as integers (see encoding maps in the manifold JSON). PCA reduces this 7-dimensional space to 2 components that capture the most variance, producing x/y coordinates for each point.
PCA is deterministic — the same data always produces the same projection — and supports projecting new points without re-running on the full dataset.
Visual Encoding¶
| Visual Property | Encodes |
|---|---|
| Color | Status: green = proven/passed, red = failed, grey = untested/unfeasible |
| Size | Throughput (requests per second) — larger = higher throughput |
| Shape | Deployment target: ● circle = realtime, ■ square = async, ▲ triangle = batch, ◆ diamond = hyperpod |
Interactions¶
- Hover a point to see: model_name, instance_type, deployment_config, throughput_rps, ttft_p50_ms, status, run_type
- Click a proven point to view the full Athena record
- Dark regions (semi-transparent grey zones) indicate coverage gaps
"Plot My Config"¶
When using the Command Generator widget, click "Show in Manifold" to project your current configuration onto the scatter plot:
- Your config appears as a ★ star marker at its projected position
- Connector lines link to the 3 nearest proven configurations
- Hover the connectors to see those configs' metrics
This tells you immediately whether you're in proven territory or uncharted space.
How Client-Side Projection Works¶
The manifold JSON includes encoding_maps, pca_components, and pca_mean. The browser projects your config without a server round-trip:
// Encode categorical dimensions as integers
const encoded = dimensions_used.map(dim => encoding_maps[dim][userConfig[dim]] ?? 0)
// Center using PCA mean
const centered = encoded.map((v, i) => v - pca_mean[i])
// Project using PCA components (dot product)
const x = pca_components[0].reduce((sum, w, i) => sum + w * centered[i], 0)
const y = pca_components[1].reduce((sum, w, i) => sum + w * centered[i], 0)
Filtering¶
When the point cloud is dense, use filters to reduce visual clutter:
- By deployment_config — Show only transformers-vllm points
- By model_family — Show only qwen3 or llama3
- By instance_family — Show only g5 or g6e points
Filters are additive — combining them narrows the visible set.
Data Source¶
The manifold reads from a static JSON file: docs/data/coverage-manifold.json
This file is regenerated by the codegen script:
# Generate from Athena export
node scripts/codegen-manifold.js --input athena-export.csv --output docs/data/coverage-manifold.json
# Generate synthetic sample data (for development)
node scripts/codegen-manifold.js --sample --output docs/data/coverage-manifold.json
In CI, the docs workflow runs --sample mode to ensure the manifold page always has data for development. With real benchmark infrastructure provisioned, use the Athena export for production data.
Manifold JSON Schema¶
{
"projection_method": "pca",
"dimensions_used": ["deployment_config", "model_family", ...],
"encoding_maps": { "deployment_config": {"http-flask": 0, ...}, ... },
"pca_components": [[0.42, -0.31, ...], [0.18, 0.55, ...]],
"pca_mean": [2.1, 0.8, 1.3, ...],
"points": [
{
"configId": "abc123...",
"x": 0.45, "y": -0.23,
"status": "proven",
"deployment_config": "transformers-vllm",
"model_name": "Qwen/Qwen3-4B",
"instance_type": "ml.g5.xlarge",
"throughput_rps": 45.2,
"ttft_p50_ms": 123,
"deployment_target": "realtime-inference",
"run_type": "ci"
}
],
"generated_at": "2026-06-09T12:00:00Z",
"total_configs": 150
}
Placeholder State¶
If no manifold data file exists (fresh install before first CI run), the visualization page displays a placeholder message explaining that coverage data will populate after benchmark runs complete.
Regenerating the Manifold¶
After new benchmark runs complete:
- Export results from Athena (CSV or JSON)
- Run the codegen script:
- Commit the updated
docs/data/coverage-manifold.json - The docs site rebuilds and shows the new points
Related¶
- Command Generator — Interactive config builder with "Show in Manifold" button
- CI Integration — How benchmark data is produced
- Path Prover — Agent that fills coverage gaps
- Bootstrap — Provisioning Athena/Glue infrastructure