Coverage for mcp/mission_judge/rubric.py: 100%
3 statements
« prev ^ index » next coverage.py v7.14.1, created at 2026-06-15 15:07 +0000
« prev ^ index » next coverage.py v7.14.1, created at 2026-06-15 15:07 +0000
1"""The fixed, versioned scoring guidance the judge embeds in every prompt.
3The judge scores how close a goal-directed run is to satisfying its objective
4on a single ``0.0``-``1.0`` scale. That scale is only meaningful if every
5invocation is judged against the *same* yardstick, so this module owns two
6module-level constants and nothing else:
8* :data:`RUBRIC` — the scoring guidance folded verbatim into each prompt. It
9 is byte-identical across all invocations within a shipped build and is
10 never interpolated with a clock, a random value, or any other ambient
11 state, so two identical inputs always produce an identical prompt.
12* :data:`RUBRIC_VERSION` — a short, stable identifier of the rubric text in
13 effect. It is recorded in a result's provenance so an operator can later
14 interpret a historical score against the exact guidance that produced it.
16The two are kept side by side on purpose: the version identifies the text, so
17**any edit to** :data:`RUBRIC` **must be paired with a new** :data:`RUBRIC_VERSION`
18**value**. A reader (or reviewer) changing one is meant to change the other in
19the same edit.
20"""
22from __future__ import annotations
24# Stable identifier of the rubric text below. A non-empty string of at most
25# 64 characters with no whitespace. Bump this whenever RUBRIC changes so a
26# recorded score can be traced back to the exact guidance that produced it.
27RUBRIC_VERSION: str = "spj-v1"
29# Fixed scoring guidance embedded in every prompt. Byte-identical across all
30# invocations within a shipped build and never interpolated with any
31# clock-derived, random, or ambient value. Maps the score onto [0.0, 1.0]:
32# 0.0 = no progress toward the objective, 1.0 = objective fully satisfied.
33# Editing this text requires bumping RUBRIC_VERSION above.
34RUBRIC: str = """\
35You are scoring how close a goal-directed automation run is to satisfying its
36stated objective. Return a single progress score in the closed range 0.0 to 1.0:
37 - 0.0 no meaningful progress toward the objective
38 - 0.25 early progress; foundational steps done, objective far off
39 - 0.5 roughly half the observable work toward the objective is complete
40 - 0.75 most of the objective is satisfied; minor work remains
41 - 1.0 the objective is fully satisfied
42Judge only against the objective and the supplied progress context. Do not
43reward activity that does not advance the objective.
44"""