Coverage for mcp/mission_judge/rubric.py: 100%

3 statements  

« prev     ^ index     » next       coverage.py v7.14.1, created at 2026-06-15 15:07 +0000

1"""The fixed, versioned scoring guidance the judge embeds in every prompt. 

2 

3The judge scores how close a goal-directed run is to satisfying its objective 

4on a single ``0.0``-``1.0`` scale. That scale is only meaningful if every 

5invocation is judged against the *same* yardstick, so this module owns two 

6module-level constants and nothing else: 

7 

8* :data:`RUBRIC` — the scoring guidance folded verbatim into each prompt. It 

9 is byte-identical across all invocations within a shipped build and is 

10 never interpolated with a clock, a random value, or any other ambient 

11 state, so two identical inputs always produce an identical prompt. 

12* :data:`RUBRIC_VERSION` — a short, stable identifier of the rubric text in 

13 effect. It is recorded in a result's provenance so an operator can later 

14 interpret a historical score against the exact guidance that produced it. 

15 

16The two are kept side by side on purpose: the version identifies the text, so 

17**any edit to** :data:`RUBRIC` **must be paired with a new** :data:`RUBRIC_VERSION` 

18**value**. A reader (or reviewer) changing one is meant to change the other in 

19the same edit. 

20""" 

21 

22from __future__ import annotations 

23 

24# Stable identifier of the rubric text below. A non-empty string of at most 

25# 64 characters with no whitespace. Bump this whenever RUBRIC changes so a 

26# recorded score can be traced back to the exact guidance that produced it. 

27RUBRIC_VERSION: str = "spj-v1" 

28 

29# Fixed scoring guidance embedded in every prompt. Byte-identical across all 

30# invocations within a shipped build and never interpolated with any 

31# clock-derived, random, or ambient value. Maps the score onto [0.0, 1.0]: 

32# 0.0 = no progress toward the objective, 1.0 = objective fully satisfied. 

33# Editing this text requires bumping RUBRIC_VERSION above. 

34RUBRIC: str = """\ 

35You are scoring how close a goal-directed automation run is to satisfying its 

36stated objective. Return a single progress score in the closed range 0.0 to 1.0: 

37 - 0.0 no meaningful progress toward the objective 

38 - 0.25 early progress; foundational steps done, objective far off 

39 - 0.5 roughly half the observable work toward the objective is complete 

40 - 0.75 most of the objective is satisfied; minor work remains 

41 - 1.0 the objective is fully satisfied 

42Judge only against the objective and the supplied progress context. Do not 

43reward activity that does not advance the objective. 

44"""