Coverage Manifold¶

Overview¶

The Coverage Manifold is a 2D visualization of MCC's high-dimensional configuration space. It shows all proven, failed, and untested configurations as points in a scatter plot, projected via PCA (Principal Component Analysis) from 7 categorical dimensions down to 2D coordinates.

The manifold helps you: - See where your config sits relative to proven paths - Identify coverage gaps (dark regions with few points) - Find nearest proven alternatives when your config is untested

How It Works¶

Dimensions¶

Each configuration is encoded as a vector of 7 categorical dimensions:

Dimension	Values
`deployment_config`	http-flask, transformers-vllm, transformers-sglang, ...
`model_family`	qwen3, llama3, deepseek-r1, ...
`instance_family`	g5, g6, g6e, p5, ...
`quantization`	none, fp8, int8, awq, gptq, ...
`tp_degree`	1, 2, 4, 8
`enable_lora`	true, false
`deployment_target`	realtime-inference, async-inference, batch-transform, hyperpod-eks

PCA Projection¶

Categories are encoded as integers (see encoding maps in the manifold JSON). PCA reduces this 7-dimensional space to 2 components that capture the most variance, producing x/y coordinates for each point.

PCA is deterministic — the same data always produces the same projection — and supports projecting new points without re-running on the full dataset.

Visual Encoding¶

Visual Property	Encodes
Color	Status: green = proven/passed, red = failed, grey = untested/unfeasible
Size	Throughput (requests per second) — larger = higher throughput
Shape	Deployment target: ● circle = realtime, ■ square = async, ▲ triangle = batch, ◆ diamond = hyperpod

Interactions¶

Hover a point to see: model_name, instance_type, deployment_config, throughput_rps, ttft_p50_ms, status, run_type
Click a proven point to view the full Athena record
Dark regions (semi-transparent grey zones) indicate coverage gaps

"Plot My Config"¶

When using the Command Generator widget, click "Show in Manifold" to project your current configuration onto the scatter plot:

Your config appears as a ★ star marker at its projected position
Connector lines link to the 3 nearest proven configurations
Hover the connectors to see those configs' metrics

This tells you immediately whether you're in proven territory or uncharted space.

How Client-Side Projection Works¶

The manifold JSON includes encoding_maps, pca_components, and pca_mean. The browser projects your config without a server round-trip:

// Encode categorical dimensions as integers
const encoded = dimensions_used.map(dim => encoding_maps[dim][userConfig[dim]] ?? 0)

// Center using PCA mean
const centered = encoded.map((v, i) => v - pca_mean[i])

// Project using PCA components (dot product)
const x = pca_components[0].reduce((sum, w, i) => sum + w * centered[i], 0)
const y = pca_components[1].reduce((sum, w, i) => sum + w * centered[i], 0)

Filtering¶

When the point cloud is dense, use filters to reduce visual clutter:

By deployment_config — Show only transformers-vllm points
By model_family — Show only qwen3 or llama3
By instance_family — Show only g5 or g6e points

Filters are additive — combining them narrows the visible set.

Data Source¶

The manifold reads from a static JSON file: docs/data/coverage-manifold.json

This file is regenerated by the codegen script:

# Generate from Athena export
node scripts/codegen-manifold.js --input athena-export.csv --output docs/data/coverage-manifold.json

# Generate synthetic sample data (for development)
node scripts/codegen-manifold.js --sample --output docs/data/coverage-manifold.json

In CI, the docs workflow runs --sample mode to ensure the manifold page always has data for development. With real benchmark infrastructure provisioned, use the Athena export for production data.

Manifold JSON Schema¶

{
  "projection_method": "pca",
  "dimensions_used": ["deployment_config", "model_family", ...],
  "encoding_maps": { "deployment_config": {"http-flask": 0, ...}, ... },
  "pca_components": [[0.42, -0.31, ...], [0.18, 0.55, ...]],
  "pca_mean": [2.1, 0.8, 1.3, ...],
  "points": [
    {
      "configId": "abc123...",
      "x": 0.45, "y": -0.23,
      "status": "proven",
      "deployment_config": "transformers-vllm",
      "model_name": "Qwen/Qwen3-4B",
      "instance_type": "ml.g5.xlarge",
      "throughput_rps": 45.2,
      "ttft_p50_ms": 123,
      "deployment_target": "realtime-inference",
      "run_type": "ci"
    }
  ],
  "generated_at": "2026-06-09T12:00:00Z",
  "total_configs": 150
}

Placeholder State¶

If no manifold data file exists (fresh install before first CI run), the visualization page displays a placeholder message explaining that coverage data will populate after benchmark runs complete.

Regenerating the Manifold¶

After new benchmark runs complete:

Export results from Athena (CSV or JSON)

Run the codegen script:

node scripts/codegen-manifold.js --input athena-export.csv

Commit the updated docs/data/coverage-manifold.json
The docs site rebuilds and shows the new points

Command Generator — Interactive config builder with "Show in Manifold" button
CI Integration — How benchmark data is produced
Path Prover — Agent that fills coverage gaps
Bootstrap — Provisioning Athena/Glue infrastructure