Skip to content

Coverage Manifold

Overview

The Coverage Manifold is a 2D visualization of MCC's high-dimensional configuration space. It shows all proven, failed, and untested configurations as points in a scatter plot, projected via PCA (Principal Component Analysis) from 7 categorical dimensions down to 2D coordinates.

The manifold helps you: - See where your config sits relative to proven paths - Identify coverage gaps (dark regions with few points) - Find nearest proven alternatives when your config is untested


How It Works

Dimensions

Each configuration is encoded as a vector of 7 categorical dimensions:

Dimension Values
deployment_config http-flask, transformers-vllm, transformers-sglang, ...
model_family qwen3, llama3, deepseek-r1, ...
instance_family g5, g6, g6e, p5, ...
quantization none, fp8, int8, awq, gptq, ...
tp_degree 1, 2, 4, 8
enable_lora true, false
deployment_target realtime-inference, async-inference, batch-transform, hyperpod-eks

PCA Projection

Categories are encoded as integers (see encoding maps in the manifold JSON). PCA reduces this 7-dimensional space to 2 components that capture the most variance, producing x/y coordinates for each point.

PCA is deterministic — the same data always produces the same projection — and supports projecting new points without re-running on the full dataset.


Visual Encoding

Visual Property Encodes
Color Status: green = proven/passed, red = failed, grey = untested/unfeasible
Size Throughput (requests per second) — larger = higher throughput
Shape Deployment target: ● circle = realtime, ■ square = async, ▲ triangle = batch, ◆ diamond = hyperpod

Interactions

  • Hover a point to see: model_name, instance_type, deployment_config, throughput_rps, ttft_p50_ms, status, run_type
  • Click a proven point to view the full Athena record
  • Dark regions (semi-transparent grey zones) indicate coverage gaps

"Plot My Config"

When using the Command Generator widget, click "Show in Manifold" to project your current configuration onto the scatter plot:

  1. Your config appears as a ★ star marker at its projected position
  2. Connector lines link to the 3 nearest proven configurations
  3. Hover the connectors to see those configs' metrics

This tells you immediately whether you're in proven territory or uncharted space.

How Client-Side Projection Works

The manifold JSON includes encoding_maps, pca_components, and pca_mean. The browser projects your config without a server round-trip:

// Encode categorical dimensions as integers
const encoded = dimensions_used.map(dim => encoding_maps[dim][userConfig[dim]] ?? 0)

// Center using PCA mean
const centered = encoded.map((v, i) => v - pca_mean[i])

// Project using PCA components (dot product)
const x = pca_components[0].reduce((sum, w, i) => sum + w * centered[i], 0)
const y = pca_components[1].reduce((sum, w, i) => sum + w * centered[i], 0)

Filtering

When the point cloud is dense, use filters to reduce visual clutter:

  • By deployment_config — Show only transformers-vllm points
  • By model_family — Show only qwen3 or llama3
  • By instance_family — Show only g5 or g6e points

Filters are additive — combining them narrows the visible set.


Data Source

The manifold reads from a static JSON file: docs/data/coverage-manifold.json

This file is regenerated by the codegen script:

# Generate from Athena export
node scripts/codegen-manifold.js --input athena-export.csv --output docs/data/coverage-manifold.json

# Generate synthetic sample data (for development)
node scripts/codegen-manifold.js --sample --output docs/data/coverage-manifold.json

In CI, the docs workflow runs --sample mode to ensure the manifold page always has data for development. With real benchmark infrastructure provisioned, use the Athena export for production data.

Manifold JSON Schema

{
  "projection_method": "pca",
  "dimensions_used": ["deployment_config", "model_family", ...],
  "encoding_maps": { "deployment_config": {"http-flask": 0, ...}, ... },
  "pca_components": [[0.42, -0.31, ...], [0.18, 0.55, ...]],
  "pca_mean": [2.1, 0.8, 1.3, ...],
  "points": [
    {
      "configId": "abc123...",
      "x": 0.45, "y": -0.23,
      "status": "proven",
      "deployment_config": "transformers-vllm",
      "model_name": "Qwen/Qwen3-4B",
      "instance_type": "ml.g5.xlarge",
      "throughput_rps": 45.2,
      "ttft_p50_ms": 123,
      "deployment_target": "realtime-inference",
      "run_type": "ci"
    }
  ],
  "generated_at": "2026-06-09T12:00:00Z",
  "total_configs": 150
}

Placeholder State

If no manifold data file exists (fresh install before first CI run), the visualization page displays a placeholder message explaining that coverage data will populate after benchmark runs complete.


Regenerating the Manifold

After new benchmark runs complete:

  1. Export results from Athena (CSV or JSON)
  2. Run the codegen script:
    node scripts/codegen-manifold.js --input athena-export.csv
    
  3. Commit the updated docs/data/coverage-manifold.json
  4. The docs site rebuilds and shows the new points