Coverage for mcp/mission_judge/rubric.py: 100%

1"""The fixed, versioned scoring guidance the judge embeds in every prompt.

3The judge scores how close a goal-directed run is to satisfying its objective

4on a single ``0.0``-``1.0`` scale. That scale is only meaningful if every

5invocation is judged against the *same* yardstick, so this module owns two

6module-level constants and nothing else:

8* :data:`RUBRIC` — the scoring guidance folded verbatim into each prompt. It

9 is byte-identical across all invocations within a shipped build and is

10 never interpolated with a clock, a random value, or any other ambient

11 state, so two identical inputs always produce an identical prompt.

12* :data:`RUBRIC_VERSION` — a short, stable identifier of the rubric text in

13 effect. It is recorded in a result's provenance so an operator can later

14 interpret a historical score against the exact guidance that produced it.

16The two are kept side by side on purpose: the version identifies the text, so

17**any edit to** :data:`RUBRIC` **must be paired with a new** :data:`RUBRIC_VERSION`

18**value**. A reader (or reviewer) changing one is meant to change the other in

19the same edit.

20"""

22from __future__ import annotations

24# Stable identifier of the rubric text below. A non-empty string of at most

25# 64 characters with no whitespace. Bump this whenever RUBRIC changes so a

26# recorded score can be traced back to the exact guidance that produced it.

27RUBRIC_VERSION: str = "spj-v1"

29# Fixed scoring guidance embedded in every prompt. Byte-identical across all

30# invocations within a shipped build and never interpolated with any

31# clock-derived, random, or ambient value. Maps the score onto [0.0, 1.0]:

32# 0.0 = no progress toward the objective, 1.0 = objective fully satisfied.

33# Editing this text requires bumping RUBRIC_VERSION above.

34RUBRIC: str = """\

35You are scoring how close a goal-directed automation run is to satisfying its

36stated objective. Return a single progress score in the closed range 0.0 to 1.0:

37 - 0.0 no meaningful progress toward the objective

38 - 0.25 early progress; foundational steps done, objective far off

39 - 0.5 roughly half the observable work toward the objective is complete

40 - 0.75 most of the objective is satisfied; minor work remains

41 - 1.0 the objective is fully satisfied

42Judge only against the objective and the supplied progress context. Do not

43reward activity that does not advance the objective.

44"""