Coverage for mcp/mission/sandbox.py: 88%
511 statements
« prev ^ index » next coverage.py v7.14.1, created at 2026-06-15 15:07 +0000
« prev ^ index » next coverage.py v7.14.1, created at 2026-06-15 15:07 +0000
1"""Restricted AST validator for Mission ``Strategy.script`` source.
3Where a ``Criterion(kind="predicate")`` carries a single expression, a
4``Strategy.script`` carries a multi-statement Python module that runs
5inside the Mission sandbox to drive an iteration. Both surfaces accept
6untrusted operator input, so both go through a parse-time AST allowlist
7before any execution. This module owns the script side: it parses
8scripts in ``exec`` mode, walks the tree with an explicit list of
9allowed nodes, and rejects everything else with :class:`ScriptRejected`.
11The script surface is wider than the predicate surface — multi-statement
12control flow, helper function definitions, named-exception ``try`` /
13``except`` / ``finally`` blocks, plus calls to the operator-supplied
14tool allowlist — so this module is its own validator rather than a
15shared base class. The structural decisions (an :class:`ast.NodeVisitor`
16that defines a ``visit_*`` for every accepted node and rejects in
17``generic_visit``, an exception type carrying ``reason`` /
18``failing_node`` / ``lineno`` / ``col_offset``, dunder filtering on
19strings and identifiers, comprehension-target shadowing checks) mirror
20:mod:`mcp.mission.predicate` so the two layers reject the same shapes
21the same way.
23Two layers, same as the predicate sandbox:
251. **Parse-time validation.** :func:`validate_script_ast` parses the
26 source in ``exec`` mode and walks the tree with
27 :class:`_ScriptValidator`. The first disallowed construct raises
28 :class:`ScriptRejected`; the script never runs.
292. **Run-time isolation.** The runtime layer (the
30 :class:`MissionSandbox` wrapper around ``MontySandboxProvider``)
31 executes a validated script under shared duration / memory limits
32 with an explicit namespace that withholds dangerous builtins like
33 ``open`` / ``getattr`` / ``__import__``. Even a tree that smuggled
34 past this validator would fail at lookup.
36Allowed surface
37---------------
38**Statements:** ``Module``, ``Expr``, ``Assign``, ``AugAssign``,
39``AnnAssign``, ``If``, ``While``, ``For``, ``Pass``, ``Break``,
40``Continue``, ``Return``, ``FunctionDef`` (no decorators), ``Try``
41(named-exception handlers only), ``Raise``.
43**Expressions:** constants, names from the allowlist, container
44literals (``List`` / ``Tuple`` / ``Set`` / ``Dict``), comprehensions
45(``ListComp`` / ``SetComp`` / ``DictComp`` / ``GeneratorExp``),
46``BinOp`` / ``UnaryOp`` / ``BoolOp`` / ``Compare`` / ``IfExp``,
47subscript and slice access, f-strings, lambdas, the walrus operator,
48plus calls.
50**Names visible to a script (the *base scope*):**
52- ``mission`` — the per-iteration namespace; the only allowed
53 attribute access is ``mission.observe`` and ``mission.event``.
54- The pure stdlib callables ``len``, ``min``, ``max``, ``sum``,
55 ``abs``, ``any``, ``all``, ``sorted``, ``range``, ``enumerate``,
56 ``zip``, ``list``, ``dict``, ``tuple``, ``set``, ``str``, ``int``,
57 ``float``, ``bool``.
58- A small set of built-in exception classes so ``raise ValueError(...)``
59 and ``except KeyError as e:`` both work without importing.
60- Every tool name the operator placed on the per-session allowlist.
62**Calls** may target a bare name from the base scope, a name a script
63introduced (a function it defined or a value it bound), or one of the
64two attribute calls ``mission.observe(...)`` / ``mission.event(...)``.
65``exec``, ``eval``, ``compile``, and ``__import__`` are rejected by
66name even if a script binds those identifiers locally.
68Rejected outright
69-----------------
70``Import`` / ``ImportFrom``, ``ClassDef``, ``AsyncFunctionDef`` /
71``AsyncFor`` / ``AsyncWith``, ``Yield`` / ``YieldFrom``, ``Global`` /
72``Nonlocal``, ``Match``, ``With``, ``Assert``, ``Delete``, decorators
73(the allowlist is currently empty), bare ``except:`` clauses,
74attribute access on anything other than ``mission``, calls on
75attributes / subscripts / other calls, dunder strings and identifiers,
76and any binding (``Assign``, ``AnnAssign``, ``AugAssign``, walrus,
77function parameter, function name, comprehension target, ``for``
78target, ``except as`` name) whose name shadows a base-scope identifier.
80``Await`` carries a single, narrow exception: ``await <tool>(...)``
81where ``<tool>`` is a bare name on the per-session tool allowlist.
82The runtime layer below exposes every allowlisted tool through the
83underlying Monty ``external_functions`` channel as a coroutine
84factory, so the script must ``await`` the call to receive the
85dispatcher's return value rather than a coroutine object. Every
86other ``Await`` shape — ``await name`` on a non-call, ``await
87mission.observe(...)``, ``await some_other_tool()`` for a tool not on
88the allowlist, ``await (lambda: ...)()`` — stays rejected with
89reason ``await_not_allowed``.
90"""
92from __future__ import annotations
94import ast
95from collections.abc import Iterable
96from typing import Final, NoReturn
98# <pyflowchart-code-diagram> BEGIN - auto-inserted, do not edit
99# Flowchart(s) generated from this file:
100# * ``validate_script_ast`` -> ``diagrams/code_diagrams/mcp/mission/sandbox.validate_script_ast.html``
101# (PNG: ``diagrams/code_diagrams/mcp/mission/sandbox.validate_script_ast.png``)
102# Regenerate with ``python diagrams/code_diagrams/generate.py``.
103# <pyflowchart-code-diagram> END
106# ---------------------------------------------------------------------------
107# Allowlists
108# ---------------------------------------------------------------------------
110_SAFE_BUILTINS: Final[frozenset[str]] = frozenset(
111 {
112 "len",
113 "min",
114 "max",
115 "sum",
116 "abs",
117 "any",
118 "all",
119 "sorted",
120 "range",
121 "enumerate",
122 "zip",
123 "list",
124 "dict",
125 "tuple",
126 "set",
127 "str",
128 "int",
129 "float",
130 "bool",
131 }
132)
133"""Pure stdlib callables a script may look up by bare name."""
135_ALLOWED_EXCEPTION_NAMES: Final[frozenset[str]] = frozenset(
136 {
137 "Exception",
138 "ValueError",
139 "TypeError",
140 "KeyError",
141 "IndexError",
142 "AttributeError",
143 "LookupError",
144 "RuntimeError",
145 "ArithmeticError",
146 "ZeroDivisionError",
147 "OverflowError",
148 "OSError",
149 "FileNotFoundError",
150 "TimeoutError",
151 "ConnectionError",
152 "StopIteration",
153 "AssertionError",
154 }
155)
156"""Built-in exception classes a script may name in ``raise`` and ``except``.
158Including these in the base scope is what lets a script say
159``except ValueError as e:`` or ``raise RuntimeError("msg")`` without an
160``import``. Constructing an exception instance is side-effect-free, so
161exposing the class is no broader than exposing the safe builtins.
162"""
164_MISSION_NAMESPACE_NAME: Final[str] = "mission"
165"""Top-level identifier reserved for the per-iteration helper namespace."""
167_MISSION_HELPER_ATTRIBUTES: Final[frozenset[str]] = frozenset({"observe", "event"})
168"""Only attributes the validator accepts on the ``mission`` namespace."""
170_FORBIDDEN_CALL_TARGETS: Final[frozenset[str]] = frozenset(
171 {"exec", "eval", "compile", "__import__"}
172)
173"""Names whose call form is rejected by name even if a script shadows them.
175A script could in principle write ``def exec(): ...`` and then call its
176own local. Rejecting these names at the call site as well as via the
177dunder filter (for ``__import__``) closes the gap.
178"""
180_ALLOWED_DECORATORS: Final[frozenset[str]] = frozenset()
181"""Decorator names a function definition may carry.
183Currently empty: any ``@decorator`` on a ``FunctionDef`` is rejected.
184The hook is here so a future iteration can vet a small set of operator-
185facing helpers (e.g. a retry decorator) by editing only this constant.
186"""
188_ALLOWED_BIN_OPS: Final[tuple[type[ast.operator], ...]] = (
189 ast.Add,
190 ast.Sub,
191 ast.Mult,
192 ast.Div,
193 ast.FloorDiv,
194 ast.Mod,
195 ast.Pow,
196 ast.MatMult,
197)
199_ALLOWED_UNARY_OPS: Final[tuple[type[ast.unaryop], ...]] = (
200 ast.UAdd,
201 ast.USub,
202 ast.Not,
203 ast.Invert,
204)
206_ALLOWED_COMPARE_OPS: Final[tuple[type[ast.cmpop], ...]] = (
207 ast.Eq,
208 ast.NotEq,
209 ast.Lt,
210 ast.LtE,
211 ast.Gt,
212 ast.GtE,
213 ast.Is,
214 ast.IsNot,
215 ast.In,
216 ast.NotIn,
217)
219_ALLOWED_BOOL_OPS: Final[tuple[type[ast.boolop], ...]] = (ast.And, ast.Or)
222# ---------------------------------------------------------------------------
223# Exception
224# ---------------------------------------------------------------------------
227class ScriptRejected(Exception):
228 """Raised when a script source contains a disallowed construct.
230 Mirror of :class:`mcp.mission.predicate.PredicateRejected` so callers
231 can render uniform structured errors regardless of which sandbox
232 layer rejected the input. ``reason`` is a short stable token (e.g.
233 ``"forbidden_node"``, ``"shadows_protected_name"``) suitable for
234 machine-readable error envelopes; ``failing_node`` is the
235 :class:`ast.AST` that triggered rejection (``None`` only when the
236 source failed to parse at all).
237 """
239 def __init__(
240 self,
241 reason: str,
242 *,
243 failing_node: ast.AST | None = None,
244 message: str | None = None,
245 ) -> None:
246 self.reason: str = reason
247 self.failing_node: ast.AST | None = failing_node
248 self.lineno: int | None = (
249 getattr(failing_node, "lineno", None) if failing_node is not None else None
250 )
251 self.col_offset: int | None = (
252 getattr(failing_node, "col_offset", None) if failing_node is not None else None
253 )
254 rendered = message if message is not None else reason
255 if self.lineno is not None:
256 rendered = f"{rendered} (line {self.lineno}, col {self.col_offset})"
257 super().__init__(rendered)
260# ---------------------------------------------------------------------------
261# Validator
262# ---------------------------------------------------------------------------
265class _ScriptValidator(ast.NodeVisitor):
266 """Walk a script AST and reject any construct outside the allowlist.
268 The validator tracks two things across the walk:
270 * **The base scope** — the union of the operator-supplied tool
271 allowlist, the safe builtins, the allowed exception names, and
272 the ``mission`` namespace. These names are *protected*: a script
273 may read them but may not bind, rebind, or shadow them with a
274 local of any kind (assignment, walrus, function parameter,
275 function name, comprehension target, ``for`` target,
276 ``except as`` name). Protecting them keeps the security model
277 one-line-tall: if you see a Name in the source whose ``id`` is
278 ``submit_job_sqs``, you can be sure it resolves to the registered
279 tool.
280 * **A scope stack** — entries onto the stack carry the names a
281 script has bound at module level plus the names introduced by
282 function parameters, comprehension targets, ``for`` loops, and
283 ``except as`` clauses. The stack is what makes a helper function
284 that defines a parameter ``i`` validate cleanly without ``i``
285 leaking into the module-level scope.
286 """
288 def __init__(self, allowlist: Iterable[str]) -> None:
289 # Order does not matter; keep as a frozenset for fast membership.
290 self._tool_allowlist: frozenset[str] = frozenset(allowlist)
292 # Names that are visible from the start of the script and that
293 # script-introduced bindings may NOT shadow. The mission
294 # namespace counts as protected: rebinding it would defeat the
295 # one-allowed-attribute-base rule in :meth:`visit_Attribute`.
296 # The forbidden call targets (``eval``, ``exec``, ``compile``,
297 # ``__import__``) are folded into the protected set so that a
298 # script trying to shadow them — ``(eval := 1)``, ``def exec():
299 # ...``, ``for compile in xs:`` — is rejected at the binding
300 # site with ``shadows_protected_name``, in addition to the
301 # call-site rejection in :meth:`visit_Call`. Two layers of
302 # defense for the same risk: a reader does not have to chase
303 # every later use to know whether the shadow is harmful.
304 self._base_scope: frozenset[str] = (
305 self._tool_allowlist
306 | _SAFE_BUILTINS
307 | _ALLOWED_EXCEPTION_NAMES
308 | _FORBIDDEN_CALL_TARGETS
309 | {_MISSION_NAMESPACE_NAME}
310 )
312 # Stack of frozensets of script-bound names (function params,
313 # for-loop targets, comprehension targets, assignment targets,
314 # function definitions). The base frame is empty; each scope
315 # push appends a new frame whose contents accumulate from the
316 # parent frame so a nested lookup can see outer locals.
317 self._scopes: list[frozenset[str]] = [frozenset()]
319 # ---- helpers -------------------------------------------------------
321 def _current_locals(self) -> frozenset[str]:
322 return self._scopes[-1]
324 def _name_is_visible(self, name: str) -> bool:
325 return name in self._base_scope or name in self._current_locals()
327 @staticmethod
328 def _is_dunder(name: str) -> bool:
329 return name.startswith("__")
331 @staticmethod
332 def _reject(reason: str, node: ast.AST, message: str | None = None) -> NoReturn:
333 raise ScriptRejected(reason, failing_node=node, message=message)
335 def _push_scope(self, locals_: frozenset[str]) -> None:
336 self._scopes.append(self._current_locals() | locals_)
338 def _pop_scope(self) -> None:
339 self._scopes.pop()
341 def _bind_local(self, name: str, node: ast.AST) -> None:
342 """Add ``name`` to the current frame, rejecting protected shadows.
344 Used by every binding form (assignment, walrus, function name,
345 function parameter, ``for`` target, comprehension target,
346 ``except as`` name). The shadow check is what prevents a
347 script from rebinding ``submit_job_sqs`` or ``mission`` and
348 thereby sneaking past later name-based validation.
349 """
350 if self._is_dunder(name):
351 self._reject(
352 "dunder_binding",
353 node,
354 f"binding to '{name}' is not allowed (starts with '__')",
355 )
356 if name in self._base_scope:
357 self._reject(
358 "shadows_protected_name",
359 node,
360 f"binding to '{name}' shadows a protected name",
361 )
362 # The accumulated-frame model means we replace the top frame
363 # rather than mutate it in place: every ``_push_scope`` already
364 # captured the parent, and append-adds at the leaf are local to
365 # this frame.
366 self._scopes[-1] = self._scopes[-1] | {name}
368 def _collect_target_names(self, target: ast.AST) -> list[ast.Name]:
369 """Flatten an assignment / for / comprehension target.
371 Tuples and lists nest (``for (a, b) in pairs``). ``Starred``
372 wraps (``a, *rest = xs``). Anything else under a target —
373 ``Subscript``, ``Attribute`` — would be a write into a
374 non-local namespace and is rejected by the caller via the
375 ``invalid_target`` reason.
376 """
377 if isinstance(target, ast.Name):
378 return [target]
379 if isinstance(target, (ast.Tuple, ast.List)):
380 collected: list[ast.Name] = []
381 for elt in target.elts:
382 collected.extend(self._collect_target_names(elt))
383 return collected
384 if isinstance(target, ast.Starred):
385 return self._collect_target_names(target.value)
386 self._reject(
387 "invalid_target",
388 target,
389 "assignment / loop target must be a plain identifier",
390 )
391 return [] # unreachable; _reject raises
393 def _bind_targets(self, target: ast.AST) -> None:
394 for name_node in self._collect_target_names(target):
395 self._bind_local(name_node.id, name_node)
397 # ---- top-level entry ----------------------------------------------
399 def visit_Module(self, node: ast.Module) -> None:
400 # ``ast.parse(..., mode="exec")`` produces a Module whose body
401 # is a list of statements. Walk each in order so any forward
402 # binding (e.g. a function definition followed by a call)
403 # validates with the binding visible in the same module scope.
404 for stmt in node.body:
405 self.visit(stmt)
407 # ---- catch-all -----------------------------------------------------
409 def generic_visit(self, node: ast.AST) -> None:
410 # Default rejection: the validator opts in to every supported
411 # node via a dedicated ``visit_*`` method. Anything reaching
412 # ``generic_visit`` is something the operator wrote that the
413 # script surface deliberately does not support — ``Import``,
414 # ``ClassDef``, ``Global``, ``Nonlocal``, ``Match``, ``With``,
415 # ``Assert``, ``Delete``, ``Yield``, ``AsyncFunctionDef`` /
416 # ``AsyncFor`` / ``AsyncWith`` (``Await`` is handled by its
417 # own narrow visitor), etc.
418 self._reject(
419 "forbidden_node",
420 node,
421 f"{type(node).__name__} is not allowed in a script",
422 )
424 # ---- statements ----------------------------------------------------
426 def visit_Expr(self, node: ast.Expr) -> None:
427 self.visit(node.value)
429 def visit_Pass(self, node: ast.Pass) -> None:
430 # No children; the visitor still has to opt in to keep
431 # generic_visit from rejecting it.
432 pass
434 def visit_Break(self, node: ast.Break) -> None:
435 pass
437 def visit_Continue(self, node: ast.Continue) -> None:
438 pass
440 def visit_Assign(self, node: ast.Assign) -> None:
441 # Validate the RHS *first* under the current scope, then bind
442 # the LHS targets. This ordering matters for ``x = x + 1``: the
443 # right-hand ``x`` must already exist as a local; if it does
444 # not, the ``visit_Name`` lookup fails. Conversely, ``x = 1``
445 # introduces ``x`` only after the literal validates.
446 self.visit(node.value)
447 for target in node.targets:
448 self._bind_targets(target)
450 def visit_AugAssign(self, node: ast.AugAssign) -> None:
451 if not isinstance(node.op, _ALLOWED_BIN_OPS): 451 ↛ 452line 451 didn't jump to line 452 because the condition on line 451 was never true
452 self._reject(
453 "binop_not_allowed",
454 node,
455 f"augmented operator {type(node.op).__name__} is not allowed",
456 )
457 # ``x += 1`` reads ``x`` then writes ``x``. The target Name
458 # must be visible already (no defining via aug-assign), and
459 # the target itself must not be a protected name. We re-use
460 # ``_bind_local`` for the shadow check; if ``x`` is already
461 # local the bind is a no-op.
462 if not isinstance(node.target, ast.Name):
463 self._reject(
464 "invalid_target",
465 node.target,
466 "augmented assignment target must be a plain identifier",
467 )
468 # Read-side check: target must already be in scope.
469 self.visit(node.target)
470 self.visit(node.value)
471 # Bind defensively — protects against aug-assign on a
472 # protected name even though the read-side visit above would
473 # already accept it (protected names ARE visible). The
474 # shadow check fires here.
475 self._bind_local(node.target.id, node.target)
477 def visit_AnnAssign(self, node: ast.AnnAssign) -> None:
478 # ``x: int = 1`` and ``x: int`` are accepted; ``obj.attr: int``
479 # is not (target must be a plain identifier).
480 if node.value is not None: 480 ↛ 482line 480 didn't jump to line 482 because the condition on line 480 was always true
481 self.visit(node.value)
482 if node.annotation is not None: 482 ↛ 484line 482 didn't jump to line 484 because the condition on line 482 was always true
483 self.visit(node.annotation)
484 if not isinstance(node.target, ast.Name):
485 self._reject(
486 "invalid_target",
487 node.target,
488 "annotated assignment target must be a plain identifier",
489 )
490 self._bind_local(node.target.id, node.target)
492 def visit_If(self, node: ast.If) -> None:
493 self.visit(node.test)
494 for stmt in node.body:
495 self.visit(stmt)
496 for stmt in node.orelse: 496 ↛ 497line 496 didn't jump to line 497 because the loop on line 496 never started
497 self.visit(stmt)
499 def visit_While(self, node: ast.While) -> None:
500 self.visit(node.test)
501 for stmt in node.body:
502 self.visit(stmt)
503 for stmt in node.orelse:
504 self.visit(stmt)
506 def visit_For(self, node: ast.For) -> None:
507 # Validate the iterable in the *outer* scope, then bind the
508 # loop targets in the same scope as the body. ``for x in xs:``
509 # leaks ``x`` after the loop, matching Python semantics.
510 self.visit(node.iter)
511 self._bind_targets(node.target)
512 for stmt in node.body:
513 self.visit(stmt)
514 for stmt in node.orelse:
515 self.visit(stmt)
517 def visit_Return(self, node: ast.Return) -> None:
518 if node.value is not None: 518 ↛ exitline 518 didn't return from function 'visit_Return' because the condition on line 518 was always true
519 self.visit(node.value)
521 def visit_Raise(self, node: ast.Raise) -> None:
522 if node.exc is not None: 522 ↛ 524line 522 didn't jump to line 524 because the condition on line 522 was always true
523 self.visit(node.exc)
524 if node.cause is not None: 524 ↛ 525line 524 didn't jump to line 525 because the condition on line 524 was never true
525 self.visit(node.cause)
527 def visit_Try(self, node: ast.Try) -> None:
528 # Body of the try block runs in the current scope.
529 for stmt in node.body:
530 self.visit(stmt)
531 for handler in node.handlers:
532 # Bare ``except:`` is rejected — operators must name the
533 # exception class so an unrelated bug is not silently
534 # swallowed by the same handler that catches a tool
535 # timeout.
536 if handler.type is None:
537 self._reject(
538 "bare_except",
539 handler,
540 "bare 'except:' is not allowed; name the exception class",
541 )
542 self.visit(handler.type)
543 # ``except Exc as name:`` introduces ``name`` only inside
544 # the handler block, mirroring Python semantics. Push a
545 # new scope so the binding does not leak to siblings.
546 self._push_scope(frozenset())
547 try:
548 if handler.name is not None:
549 # ``handler`` is the canonical AST node for the
550 # binding location; reuse it as the failing-node
551 # context for shadow rejections.
552 self._bind_local(handler.name, handler)
553 for stmt in handler.body:
554 self.visit(stmt)
555 finally:
556 self._pop_scope()
557 for stmt in node.orelse:
558 self.visit(stmt)
559 for stmt in node.finalbody:
560 self.visit(stmt)
562 def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
563 # Decorators are gated by a dedicated allowlist so the security
564 # surface stays small. The list is currently empty.
565 for deco in node.decorator_list:
566 if not (isinstance(deco, ast.Name) and deco.id in _ALLOWED_DECORATORS): 566 ↛ 565line 566 didn't jump to line 565 because the condition on line 566 was always true
567 self._reject(
568 "decorator_not_allowed",
569 deco,
570 "decorators are not allowed on script functions",
571 )
572 # Bind the function name in the *current* scope so the rest of
573 # the module can call it. The body opens a new scope under
574 # which arguments live.
575 self._bind_local(node.name, node)
576 self._validate_function_signature_and_body(node.args, node.body, node)
578 def _validate_function_signature_and_body(
579 self,
580 args: ast.arguments,
581 body: list[ast.stmt],
582 owner: ast.AST,
583 ) -> None:
584 # No defaults that touch the outer scope are forbidden, but
585 # the default expressions still validate under the *outer*
586 # scope (Python evaluates them once at def time, not per call).
587 for default in args.defaults:
588 self.visit(default)
589 for kw_default in args.kw_defaults:
590 if kw_default is not None:
591 self.visit(kw_default)
593 # Collect parameter names. Reject duplicates and protected
594 # shadows up front so the body sees a coherent local frame.
595 param_names: list[tuple[str, ast.AST]] = []
597 def _collect_arg(arg: ast.arg) -> None:
598 param_names.append((arg.arg, arg))
599 if arg.annotation is not None: 599 ↛ 600line 599 didn't jump to line 600 because the condition on line 599 was never true
600 self.visit(arg.annotation)
602 for arg in args.posonlyargs: 602 ↛ 603line 602 didn't jump to line 603 because the loop on line 602 never started
603 _collect_arg(arg)
604 for arg in args.args:
605 _collect_arg(arg)
606 if args.vararg is not None:
607 _collect_arg(args.vararg)
608 for arg in args.kwonlyargs:
609 _collect_arg(arg)
610 if args.kwarg is not None:
611 _collect_arg(args.kwarg)
613 # Push a fresh frame; bindings inside the function do not
614 # leak to the module-level scope.
615 self._push_scope(frozenset())
616 try:
617 seen: set[str] = set()
618 for name, owning_node in param_names:
619 if name in seen:
620 self._reject(
621 "duplicate_parameter",
622 owning_node,
623 f"duplicate parameter '{name}'",
624 )
625 seen.add(name)
626 self._bind_local(name, owning_node)
627 for stmt in body:
628 self.visit(stmt)
629 finally:
630 self._pop_scope()
632 # ---- expressions ---------------------------------------------------
634 def visit_Constant(self, node: ast.Constant) -> None:
635 # Reject dunder strings even when used as plain data. The same
636 # rationale as in the predicate sandbox: a string like
637 # ``"__class__"`` only ever appears in source code as part of
638 # an introspection escape pattern (``getattr(x, "__class__")``,
639 # ``locals()["__import__"]``). Forbidding them at the constant
640 # level closes those off even if a future change widened the
641 # call or attribute allowlist.
642 if isinstance(node.value, str) and self._is_dunder(node.value):
643 self._reject(
644 "dunder_string",
645 node,
646 "string constants starting with '__' are not allowed",
647 )
649 def visit_Name(self, node: ast.Name) -> None:
650 if self._is_dunder(node.id): 650 ↛ 651line 650 didn't jump to line 651 because the condition on line 650 was never true
651 self._reject(
652 "dunder_name",
653 node,
654 f"identifier '{node.id}' starts with '__'",
655 )
656 if not self._name_is_visible(node.id):
657 self._reject(
658 "name_not_allowed",
659 node,
660 f"name '{node.id}' is not in the script allowlist",
661 )
663 def visit_NamedExpr(self, node: ast.NamedExpr) -> None:
664 # ``(x := expr)`` — the walrus binds ``x`` in the enclosing
665 # scope. Validate the value first, then route through the
666 # standard binding helper so the protected-name shadow check
667 # fires for ``(mission := ...)`` etc.
668 self.visit(node.value)
669 if not isinstance(node.target, ast.Name): 669 ↛ 670line 669 didn't jump to line 670 because the condition on line 669 was never true
670 self._reject(
671 "invalid_target",
672 node.target,
673 "walrus target must be a plain identifier",
674 )
675 self._bind_local(node.target.id, node.target)
677 def visit_Lambda(self, node: ast.Lambda) -> None:
678 # Lambdas are scoped expressions: validate parameters + body
679 # under a fresh frame, exactly like a ``FunctionDef`` minus
680 # the decorator list and statement body. The lambda itself
681 # produces no binding in the enclosing scope.
682 self._validate_function_signature_and_body(node.args, [ast.Expr(value=node.body)], node)
684 # ---- containers ----------------------------------------------------
686 def visit_List(self, node: ast.List) -> None:
687 for elt in node.elts:
688 self.visit(elt)
690 def visit_Tuple(self, node: ast.Tuple) -> None:
691 for elt in node.elts:
692 self.visit(elt)
694 def visit_Set(self, node: ast.Set) -> None:
695 for elt in node.elts:
696 self.visit(elt)
698 def visit_Dict(self, node: ast.Dict) -> None:
699 for key in node.keys:
700 if key is not None:
701 self.visit(key)
702 else:
703 # ``{**other}`` would let a script splat arbitrary
704 # mappings into a dict literal; reject for the same
705 # reason as in the predicate sandbox.
706 self._reject(
707 "dict_unpacking",
708 node,
709 "dict unpacking is not allowed in a script",
710 )
711 for value in node.values:
712 self.visit(value)
714 def visit_Starred(self, node: ast.Starred) -> None:
715 # ``[*xs]``, ``f(*xs)``, ``a, *rest = xs`` — recurse into the
716 # inner expression so the nested Name still hits the
717 # allowlist check.
718 self.visit(node.value)
720 # ---- operators -----------------------------------------------------
722 def visit_BinOp(self, node: ast.BinOp) -> None:
723 if not isinstance(node.op, _ALLOWED_BIN_OPS): 723 ↛ 724line 723 didn't jump to line 724 because the condition on line 723 was never true
724 self._reject(
725 "binop_not_allowed",
726 node,
727 f"binary operator {type(node.op).__name__} is not allowed",
728 )
729 self.visit(node.left)
730 self.visit(node.right)
732 def visit_UnaryOp(self, node: ast.UnaryOp) -> None:
733 if not isinstance(node.op, _ALLOWED_UNARY_OPS):
734 self._reject(
735 "unaryop_not_allowed",
736 node,
737 f"unary operator {type(node.op).__name__} is not allowed",
738 )
739 self.visit(node.operand)
741 def visit_BoolOp(self, node: ast.BoolOp) -> None:
742 if not isinstance(node.op, _ALLOWED_BOOL_OPS):
743 self._reject(
744 "boolop_not_allowed",
745 node,
746 f"bool operator {type(node.op).__name__} is not allowed",
747 )
748 for value in node.values:
749 self.visit(value)
751 def visit_Compare(self, node: ast.Compare) -> None:
752 for op in node.ops:
753 if not isinstance(op, _ALLOWED_COMPARE_OPS): 753 ↛ 754line 753 didn't jump to line 754 because the condition on line 753 was never true
754 self._reject(
755 "compareop_not_allowed",
756 node,
757 f"comparison operator {type(op).__name__} is not allowed",
758 )
759 self.visit(node.left)
760 for comparator in node.comparators:
761 self.visit(comparator)
763 def visit_IfExp(self, node: ast.IfExp) -> None:
764 self.visit(node.test)
765 self.visit(node.body)
766 self.visit(node.orelse)
768 # ---- attribute and subscript --------------------------------------
770 def visit_Attribute(self, node: ast.Attribute) -> None:
771 # The script surface allows attribute access on exactly one
772 # name — the ``mission`` namespace — and only for the two
773 # helper attributes ``observe`` and ``event``. Every other
774 # ``foo.bar`` reads raise ``ScriptRejected``: tool results are
775 # opaque values, not deep object graphs, so a script that
776 # needs nested data should use subscripting on a return value.
777 if self._is_dunder(node.attr):
778 self._reject(
779 "dunder_attribute",
780 node,
781 f"attribute '{node.attr}' starts with '__'",
782 )
783 if not isinstance(node.value, ast.Name):
784 self._reject(
785 "attribute_target_not_name",
786 node,
787 "attribute access is only allowed on the 'mission' namespace",
788 )
789 if node.value.id != _MISSION_NAMESPACE_NAME:
790 self._reject(
791 "attribute_target_not_allowed",
792 node,
793 "attribute access is only allowed on the 'mission' namespace",
794 )
795 if node.attr not in _MISSION_HELPER_ATTRIBUTES:
796 self._reject(
797 "attribute_not_allowed",
798 node,
799 f"'mission.{node.attr}' is not an allowed helper",
800 )
801 # ``mission`` itself is a base-scope name; visit it for
802 # regularity so any future Name-side check still fires here.
803 self.visit(node.value)
805 def visit_Subscript(self, node: ast.Subscript) -> None:
806 # Recurse into both the value and the slice. The base of the
807 # chain falls out as a ``Name`` lookup that hits the
808 # allowlist; slices may themselves contain Names and Calls
809 # that go through the same validation path.
810 self.visit(node.value)
811 self.visit(node.slice)
813 def visit_Slice(self, node: ast.Slice) -> None:
814 if node.lower is not None:
815 self.visit(node.lower)
816 if node.upper is not None:
817 self.visit(node.upper)
818 if node.step is not None:
819 self.visit(node.step)
821 # ---- calls ---------------------------------------------------------
823 def visit_Call(self, node: ast.Call) -> None:
824 # The callee form decides which rule applies. Three shapes are
825 # allowed:
826 #
827 # * ``name(...)`` — bare name call. The name must already be
828 # visible (base-scope or script-bound local).
829 # * ``mission.observe(...)`` / ``mission.event(...)`` — the
830 # only attribute-call shape supported.
831 #
832 # ``foo()()`` (call returning a callable, then call), ``a[0]()``
833 # (subscript-then-call), and ``x.y()`` for any ``y`` not on the
834 # mission helper list are all rejected outright.
835 func = node.func
836 if isinstance(func, ast.Name):
837 # ``__import__``, ``exec``, ``eval``, ``compile`` are
838 # rejected by name even if a script defined a local with
839 # one of those names. The dunder filter in
840 # :meth:`visit_Name` already rejects ``__import__`` for
841 # plain reads; the explicit list is what blocks the
842 # ``def exec(): ...; exec()`` shadow attempt.
843 if func.id in _FORBIDDEN_CALL_TARGETS:
844 self._reject(
845 "forbidden_call_target",
846 node,
847 f"call to '{func.id}' is not allowed",
848 )
849 # Visit the Name so the visibility / dunder check fires.
850 self.visit(func)
851 elif isinstance(func, ast.Attribute):
852 # Only ``mission.observe`` / ``mission.event``. The
853 # attribute visit raises with a structured reason for
854 # every other shape (non-Name base, non-mission base,
855 # disallowed attribute), so we just recurse here.
856 self.visit(func)
857 else:
858 # ``f()()``, ``xs[0]()``, ``(lambda: ...)()`` — the
859 # callee is neither a Name nor a single ``mission.<x>``
860 # attribute access. Reject without descending; the
861 # blanket ``call_target_shape`` reason captures all three.
862 self._reject(
863 "call_target_shape",
864 node,
865 "script calls must target a bare name or 'mission.<helper>'",
866 )
867 for arg in node.args:
868 self.visit(arg)
869 for kw in node.keywords:
870 # ``**kwargs`` shows up as a keyword with arg=None; allow
871 # the value but recurse so its content is still validated
872 # against the same name and call rules.
873 self.visit(kw.value)
875 def visit_Await(self, node: ast.Await) -> None:
876 # The runtime layer (:class:`MissionSandbox`) exposes every
877 # allowlisted tool through Monty's ``external_functions``
878 # channel, where a registered async callable surfaces inside
879 # the script as a coroutine factory. Calling
880 # ``find_examples(query="gpu")`` from inside a script returns
881 # a coroutine object, not the dispatcher's return value;
882 # consuming the value requires writing ``await
883 # find_examples(query="gpu")``. The two ``mission`` helpers
884 # ride the same channel — the runtime layer prepends a small
885 # source-level shim that makes ``mission.observe`` /
886 # ``mission.event`` route into host-side closures via the same
887 # coroutine-factory channel, so awaiting them is required for
888 # the side effect (an observation row, an event row) to land
889 # on the iteration's audit log. The validator therefore opens
890 # ``Await`` for exactly two shapes:
891 #
892 # * ``await <name>(...)`` where ``<name>`` is on the per-
893 # session tool allowlist.
894 # * ``await mission.observe(...)`` / ``await mission.event(...)``
895 # — attribute calls on the ``mission`` namespace whose
896 # attribute is one of the two helper names that
897 # :meth:`visit_Attribute` already accepts.
898 #
899 # Both forms route the wrapped Call back through
900 # :meth:`visit_Call` so kwargs, positional args, and the
901 # forbidden-call-target rules apply unchanged.
902 #
903 # Rejected (folded into ``await_not_allowed``):
904 #
905 # * ``await x`` — bare name (no Call inside).
906 # * ``await some_other_tool()`` — call on a Name that is not
907 # on the per-session tool allowlist (a safe builtin, an
908 # exception class, ``mission`` itself, a script-bound local,
909 # or simply unknown).
910 # * ``await mission.foo(...)`` for any ``foo`` outside the
911 # helper set — :meth:`visit_Attribute` would already reject
912 # the inner call, but the early reject here keeps the reason
913 # token stable as ``await_not_allowed``.
914 # * ``await x.observe(...)`` for any ``x`` other than
915 # ``mission`` — same rationale.
916 # * ``await (lambda: ...)()`` / ``await xs[0]()`` —
917 # subscript-then-call / call-of-call shapes; the underlying
918 # Call would already fail :meth:`visit_Call`'s
919 # ``call_target_shape`` check, but reject at the await
920 # level too so the reason token stays ``await_not_allowed``.
921 #
922 # ``AsyncFunctionDef`` / ``AsyncFor`` / ``AsyncWith`` continue
923 # to fall through to :meth:`generic_visit` and stay rejected
924 # with ``forbidden_node`` — the relaxation here covers only
925 # the bare ``Await`` expression on the two accepted call
926 # shapes.
927 inner = node.value
928 if not isinstance(inner, ast.Call):
929 self._reject(
930 "await_not_allowed",
931 node,
932 "'await' may only be used on a call to an allowlisted "
933 "tool or a 'mission.<helper>' call",
934 )
935 func = inner.func
936 if isinstance(func, ast.Name):
937 if func.id not in self._tool_allowlist: 937 ↛ 938line 937 didn't jump to line 938 because the condition on line 937 was never true
938 self._reject(
939 "await_not_allowed",
940 node,
941 "'await' may only be used on a call to an allowlisted "
942 "tool or a 'mission.<helper>' call",
943 )
944 elif isinstance(func, ast.Attribute): 944 ↛ 958line 944 didn't jump to line 958 because the condition on line 944 was always true
945 # Only ``mission.observe(...)`` / ``mission.event(...)``.
946 if not ( 946 ↛ 951line 946 didn't jump to line 951 because the condition on line 946 was never true
947 isinstance(func.value, ast.Name)
948 and func.value.id == _MISSION_NAMESPACE_NAME
949 and func.attr in _MISSION_HELPER_ATTRIBUTES
950 ):
951 self._reject(
952 "await_not_allowed",
953 node,
954 "'await' may only be used on a call to an allowlisted "
955 "tool or a 'mission.<helper>' call",
956 )
957 else:
958 self._reject(
959 "await_not_allowed",
960 node,
961 "'await' may only be used on a call to an allowlisted "
962 "tool or a 'mission.<helper>' call",
963 )
964 # Hand the Call node back to the existing call-validation
965 # machinery so kwargs, positional args, and the
966 # forbidden-call-target check all fire exactly as they would
967 # for the non-awaited form.
968 self.visit(inner)
970 # ---- f-strings -----------------------------------------------------
972 def visit_JoinedStr(self, node: ast.JoinedStr) -> None:
973 for value in node.values:
974 self.visit(value)
976 def visit_FormattedValue(self, node: ast.FormattedValue) -> None:
977 self.visit(node.value)
978 if node.format_spec is not None: 978 ↛ 979line 978 didn't jump to line 979 because the condition on line 978 was never true
979 self.visit(node.format_spec)
981 # ---- comprehensions -----------------------------------------------
983 def _validate_comprehensions(self, generators: list[ast.comprehension]) -> frozenset[str]:
984 """Walk comprehension generators and return their target names.
986 Each generator's ``iter`` is validated against the *outer*
987 scope (it cannot reference targets of its own generator), then
988 the targets are added to the local set so the next generator's
989 ``ifs`` and any later ``iter`` can see them. Async generators
990 (``async for``) are rejected; the script body is sync.
991 """
992 accumulated: set[str] = set()
993 for gen in generators:
994 if gen.is_async: 994 ↛ 995line 994 didn't jump to line 995 because the condition on line 994 was never true
995 self._reject(
996 "async_comprehension",
997 gen.iter,
998 "async comprehensions are not allowed",
999 )
1000 self.visit(gen.iter)
1001 target_names = self._collect_target_names(gen.target)
1002 for name_node in target_names:
1003 if self._is_dunder(name_node.id):
1004 self._reject(
1005 "dunder_comprehension_target",
1006 name_node,
1007 f"comprehension target '{name_node.id}' starts with '__'",
1008 )
1009 if name_node.id in self._base_scope:
1010 self._reject(
1011 "shadows_protected_name",
1012 name_node,
1013 f"comprehension target '{name_node.id}' shadows a protected name",
1014 )
1015 accumulated.add(name_node.id)
1016 self._push_scope(frozenset(accumulated))
1017 try:
1018 for if_clause in gen.ifs:
1019 self.visit(if_clause)
1020 finally:
1021 self._pop_scope()
1022 return frozenset(accumulated)
1024 def _visit_comprehension_like(
1025 self,
1026 node: ast.ListComp | ast.SetComp | ast.GeneratorExp,
1027 ) -> None:
1028 locals_ = self._validate_comprehensions(node.generators)
1029 self._push_scope(locals_)
1030 try:
1031 self.visit(node.elt)
1032 finally:
1033 self._pop_scope()
1035 def visit_ListComp(self, node: ast.ListComp) -> None:
1036 self._visit_comprehension_like(node)
1038 def visit_SetComp(self, node: ast.SetComp) -> None:
1039 self._visit_comprehension_like(node)
1041 def visit_GeneratorExp(self, node: ast.GeneratorExp) -> None:
1042 self._visit_comprehension_like(node)
1044 def visit_DictComp(self, node: ast.DictComp) -> None:
1045 locals_ = self._validate_comprehensions(node.generators)
1046 self._push_scope(locals_)
1047 try:
1048 self.visit(node.key)
1049 self.visit(node.value)
1050 finally:
1051 self._pop_scope()
1054# ---------------------------------------------------------------------------
1055# Public API
1056# ---------------------------------------------------------------------------
1059def validate_script_ast(script: str, allowlist: list[str]) -> None:
1060 """Parse and validate a Mission script source string.
1062 On success, the function returns ``None`` and the caller may pass
1063 ``script`` to the sandbox runtime layer. On any disallowed
1064 construct, raises :class:`ScriptRejected` carrying ``reason``,
1065 ``failing_node``, ``lineno``, and ``col_offset``. The script is
1066 *never* executed by this function; it only walks the AST.
1068 ``allowlist`` is the per-session list of MCP tool names the script
1069 may call. Each name becomes a visible bare-Name and a permitted
1070 call target. Names not in the allowlist (and not in the safe
1071 builtin / exception / mission set) are rejected at every Name
1072 lookup.
1073 """
1074 if not isinstance(script, str): 1074 ↛ 1075line 1074 didn't jump to line 1075 because the condition on line 1074 was never true
1075 raise ScriptRejected(
1076 "not_a_string",
1077 message="script source must be a str",
1078 )
1079 try:
1080 parsed = ast.parse(script, mode="exec")
1081 except SyntaxError as exc:
1082 rejection = ScriptRejected(
1083 "syntax_error",
1084 message=f"could not parse script: {exc.msg}",
1085 )
1086 rejection.lineno = exc.lineno
1087 rejection.col_offset = exc.offset
1088 raise rejection from exc
1089 _ScriptValidator(allowlist).visit(parsed)
1092# ===========================================================================
1093# Runtime layer — MissionSandbox wrapper around MontySandboxProvider
1094# ===========================================================================
1095#
1096# Where ``validate_script_ast`` above is the parse-time gate, the wrapper
1097# below is the run-time isolation. A validated script is handed to the
1098# Monty sandbox under shared duration / memory limits, with two extras
1099# layered on top:
1100#
1101# * The operator-supplied tool allowlist is exposed as a set of async
1102# callables in the script's namespace. Each callable forwards into the
1103# engine's tool dispatcher so the existing ``@audit_logged`` /
1104# feature-flag / allowlist semantics still fire — running inside a
1105# script is *not* a way to bypass any of those.
1106# * A ``mission`` namespace object exposes the iteration's read-only
1107# metadata (deep-copied snapshot of the session's directive, criteria,
1108# budget, and prior-iteration summaries) plus the two streaming
1109# helpers ``mission.observe(...)`` / ``mission.event(...)``. The
1110# helpers append into closure-captured lists that ``MissionSandbox.run``
1111# merges into the resulting Observation.
1112#
1113# On any limit violation (duration, memory, runtime / typing / syntax
1114# from inside the script) the ``MontyError`` family bubbles out of the
1115# provider; the wrapper re-raises it as :class:`SandboxTerminated`
1116# carrying whatever the script collected before it was killed so the
1117# engine's ``_decide_phase`` can produce a deterministic ``terminate``
1118# verdict with the partial observation attached.
1120import copy # noqa: E402 — runtime layer below; keep imports near their consumers
1121import os # noqa: E402
1122import time # noqa: E402
1123from collections.abc import Awaitable, Callable # noqa: E402
1124from datetime import UTC, datetime # noqa: E402
1125from types import MappingProxyType # noqa: E402
1126from typing import Any # noqa: E402
1128from . import audit as _audit # noqa: E402
1130# ---------------------------------------------------------------------------
1131# Env helpers — module-level so the constants below are read once at import
1132# time. Tests pin the constants by monkey-patching the module attributes; a
1133# per-call read of os.environ would defeat that.
1134# ---------------------------------------------------------------------------
1137def _int_env(name: str, default: int) -> int:
1138 """Parse an integer env var; fall back to default on missing/empty/non-numeric.
1140 Mirrors the helper in :mod:`mcp.server` so the two code-mode entry
1141 points read the same caps with the same parsing semantics. Empty,
1142 whitespace-only, and non-numeric values all collapse to ``default``
1143 rather than raising — an operator who fat-fingers the env should
1144 still get a working sandbox.
1145 """
1146 raw = os.environ.get(name, "").strip()
1147 if not raw:
1148 return default
1149 try:
1150 return int(raw)
1151 except ValueError:
1152 return default
1155def _float_env(name: str, default: float) -> float:
1156 """Parse a float env var; fall back to default on missing/empty/non-numeric.
1158 Same fall-back semantics as :func:`_int_env`. The duration cap is a
1159 float so fractional seconds remain expressible.
1160 """
1161 raw = os.environ.get(name, "").strip()
1162 if not raw:
1163 return default
1164 try:
1165 return float(raw)
1166 except ValueError:
1167 return default
1170# Read the resource caps once at import time. Tests pin behaviour by
1171# monkey-patching these module-level constants before constructing a
1172# MissionSandbox. The defaults match the existing precedent in
1173# ``mcp/server.py`` where the same env names are wired into the
1174# Code Mode discovery transform's sandbox.
1175_DURATION_LIMIT_SECS: float = _float_env("GCO_MCP_CODE_MODE_MAX_DURATION_SECS", 30.0)
1176_MEMORY_LIMIT_BYTES: int = _int_env("GCO_MCP_CODE_MODE_MAX_MEMORY", 200_000_000)
1179# ---------------------------------------------------------------------------
1180# Lazy import of the runtime dependencies
1181# ---------------------------------------------------------------------------
1182#
1183# The AST validator above must remain importable on a host where
1184# ``fastmcp`` and ``pydantic_monty`` are not installed (for example a
1185# CLI-only environment that runs ``gco mission validate`` against a
1186# stored session JSON without ever wiring an engine). The provider class
1187# and the error class are pulled in lazily by ``_import_provider`` and
1188# cached at module level so repeated MissionSandbox constructions in the
1189# same process pay the import cost exactly once.
1191_MONTY_PROVIDER_CLASS: Any = None
1192_MONTY_ERROR_CLASS: Any = None
1195def _import_provider() -> tuple[Any, Any]:
1196 """Lazy-import ``MontySandboxProvider`` and ``MontyError`` and cache them.
1198 Returns the ``(provider_cls, error_cls)`` pair. The provider class
1199 is the value the wrapper instantiates with a ``ResourceLimits``
1200 dict; the error class is the *base* ``pydantic_monty.MontyError``
1201 that covers the whole limit / runtime / typing / syntax family
1202 raised from inside a script. We catch the base class rather than
1203 the leaves so a future Monty release that adds a new error type
1204 still routes through ``SandboxTerminated`` rather than escaping as
1205 an opaque ``Exception``.
1206 """
1207 global _MONTY_PROVIDER_CLASS, _MONTY_ERROR_CLASS
1208 if _MONTY_PROVIDER_CLASS is None:
1209 from fastmcp.experimental.transforms.code_mode import MontySandboxProvider
1210 from pydantic_monty import MontyError
1212 _MONTY_PROVIDER_CLASS = MontySandboxProvider
1213 _MONTY_ERROR_CLASS = MontyError
1214 return _MONTY_PROVIDER_CLASS, _MONTY_ERROR_CLASS
1217# ---------------------------------------------------------------------------
1218# Termination signal
1219# ---------------------------------------------------------------------------
1222class SandboxTerminated(Exception):
1223 """Raised when the Monty sandbox killed the script for exceeding a limit.
1225 The Mission engine catches this exception in its decide-phase and
1226 produces a ``terminate`` verdict for the iteration. Whatever the
1227 script collected via ``mission.observe(...)`` / ``mission.event(...)``
1228 before being killed is carried on the exception so the engine can
1229 surface the partial Observation in the iteration's audit record —
1230 a script that ran for 29 seconds and observed five intermediate
1231 states should not lose those five states just because the 30-second
1232 cap fired before the script returned.
1234 ``cause`` is the underlying Monty exception's class name (e.g.
1235 ``"MontyRuntimeError"``, ``"MontyTypingError"``) so callers can render
1236 a stable structured-error envelope without holding a reference to
1237 the original Monty exception object.
1238 """
1240 def __init__(
1241 self,
1242 cause: str,
1243 *,
1244 partial_observations: list[dict[str, Any]] | None = None,
1245 partial_events: list[dict[str, Any]] | None = None,
1246 partial_script_call_log: list[dict[str, Any]] | None = None,
1247 ) -> None:
1248 self.cause: str = cause
1249 # Defensive copies: callers occasionally inspect these lists
1250 # after the exception has propagated several frames up. A
1251 # shared reference would let a later mutation in the original
1252 # closure corrupt the audit record.
1253 self.partial_observations: list[dict[str, Any]] = list(partial_observations or [])
1254 self.partial_events: list[dict[str, Any]] = list(partial_events or [])
1255 # Partial in-script tool-call log captured by the per-tool
1256 # wrappers up to the moment Monty killed the script. Carrying
1257 # this onto the exception lets the engine's
1258 # ``_execute_script`` stash the partial calls on the iteration
1259 # record so a script that fired ten ``submit_job_sqs(...)``
1260 # calls before tripping the duration cap still records all ten
1261 # in the audit log. Defensive copy for the same reason as the
1262 # observe / event lists above.
1263 self.partial_script_call_log: list[dict[str, Any]] = list(partial_script_call_log or [])
1264 super().__init__(f"sandbox terminated: {cause}")
1267# ---------------------------------------------------------------------------
1268# Script rewrite — mission.observe/event → _mission_observe/_mission_event
1269# ---------------------------------------------------------------------------
1270#
1271# The AST gate above accepts ``mission.observe(...)`` and
1272# ``mission.event(...)`` as the only two attribute calls a script may
1273# write on the ``mission`` namespace. The runtime needs those calls to
1274# land on host-side closures so the iteration's ``observe_log`` /
1275# ``event_log`` lists actually receive the appends — passing the
1276# helpers in through ``inputs={"mission": <object>}`` would not work,
1277# because :class:`MontySandboxProvider` round-trips ``inputs`` values
1278# into the Monty VM by value (any in-script mutation lands on the VM
1279# copy, not the host's). Wrapping the helpers in a small host-side
1280# class and prepending it to the script as a preamble would not work
1281# either: Monty's parser does not support ``class`` definitions.
1282#
1283# Instead, after validation, the host re-parses the script and
1284# rewrites every accepted ``mission.<helper>(...)`` Call so its
1285# callee becomes a bare-Name lookup of the corresponding reserved
1286# external-function name. The rewritten source is then handed to
1287# Monty, where ``_mission_observe`` / ``_mission_event`` resolve to
1288# the host-side closures registered via ``external_functions``.
1289# Operator scripts cannot reference these names directly: the AST
1290# validator rejects them under ``name_not_allowed`` (neither is on
1291# the per-session tool allowlist nor in any safe-builtin / exception
1292# / mission base set), so the only path that produces those Name
1293# nodes is the rewrite below.
1295_MISSION_HELPER_RUNTIME_NAMES: Final[dict[str, str]] = {
1296 "observe": "_mission_observe",
1297 "event": "_mission_event",
1298}
1299# The keys must mirror ``_MISSION_HELPER_ATTRIBUTES`` exactly:
1300# the validator opens up ``mission.<attr>`` for those two attributes,
1301# and the rewriter below has to translate the same two and only the
1302# same two. A future widening of the helper set has to add an entry
1303# here too, or the rewriter would leave the new attribute as an
1304# ``Attribute`` callee and Monty's parser would reject it.
1305assert set(_MISSION_HELPER_RUNTIME_NAMES) == set(_MISSION_HELPER_ATTRIBUTES)
1308class _MissionAttributeCallRewriter(ast.NodeTransformer):
1309 """Rewrite ``mission.observe(...)`` / ``mission.event(...)`` callees.
1311 The transformer replaces the ``Attribute`` callee on accepted
1312 ``mission.<helper>`` Call nodes with a ``Name`` referencing the
1313 corresponding external-function key (``_mission_observe`` /
1314 ``_mission_event``). Args and kwargs ride through unchanged: the
1315 AST validator already vetted them, and the rewrite preserves
1316 source positions so any subsequent error in those subtrees still
1317 points at the operator's original column.
1319 The validator's :meth:`_ScriptValidator.visit_Attribute` already
1320 rejects every other ``mission.<x>`` shape, so the transformer
1321 only ever encounters the two helper attributes; defensive
1322 fallthrough leaves any other ``Attribute`` callee untouched, but
1323 in practice such a node would not have passed the gate.
1324 """
1326 def visit_Call(self, node: ast.Call) -> ast.AST:
1327 # Recurse into args / kwargs first so a nested
1328 # ``mission.<helper>(...)`` (e.g. inside an f-string used as
1329 # an argument) is rewritten too. ``self.generic_visit``
1330 # walks children and updates them in place.
1331 self.generic_visit(node)
1332 func = node.func
1333 if (
1334 isinstance(func, ast.Attribute)
1335 and isinstance(func.value, ast.Name)
1336 and func.value.id == _MISSION_NAMESPACE_NAME
1337 and func.attr in _MISSION_HELPER_RUNTIME_NAMES
1338 ):
1339 replacement = ast.Name(
1340 id=_MISSION_HELPER_RUNTIME_NAMES[func.attr],
1341 ctx=ast.Load(),
1342 )
1343 ast.copy_location(replacement, func)
1344 node.func = replacement
1345 return node
1348def _rewrite_mission_helpers(script: str) -> str:
1349 """Re-parse ``script``, rewrite mission helper calls, and unparse.
1351 Called after :func:`validate_script_ast` has already accepted the
1352 source — so ``ast.parse`` cannot fail here on syntax that was
1353 valid moments ago. Returns a fresh source string suitable for
1354 handing to ``MontySandboxProvider.run``.
1355 """
1356 tree = ast.parse(script, mode="exec")
1357 rewritten = _MissionAttributeCallRewriter().visit(tree)
1358 ast.fix_missing_locations(rewritten)
1359 return ast.unparse(rewritten)
1362# ---------------------------------------------------------------------------
1363# Tool callable wrapper
1364# ---------------------------------------------------------------------------
1367def _make_tool_wrapper(
1368 tool_name: str,
1369 ctx: Any | None,
1370 tool_dispatcher: Callable[[str, dict[str, Any], Any], Awaitable[Any]],
1371 script_call_log: list[dict[str, Any]],
1372 session_id: str,
1373 iteration_index: int,
1374) -> Callable[..., Awaitable[Any]]:
1375 """Build the per-tool async wrapper inserted into ``external_functions``.
1377 The wrapper is keyword-only by design — the Mission script grammar
1378 passes tool args as kwargs (``submit_job_sqs(manifest_path=...,
1379 region=...)``) and rejecting positionals at call time keeps the
1380 wrapper's record shape aligned with the engine's
1381 :class:`ToolCallRecord`. A script that calls
1382 ``submit_job_sqs("examples/x.yaml")`` with a positional argument
1383 fails immediately with a ``TypeError`` from Python's call
1384 machinery; that error surfaces through Monty as a
1385 ``MontyRuntimeError`` and is caught by the wrapper layer in
1386 :meth:`MissionSandbox.run`.
1388 The wrapper appends one record to ``script_call_log`` per call,
1389 whether the call succeeded or raised. A raised exception still
1390 propagates out of the wrapper (so Monty surfaces it to the script
1391 as a Python exception the script can catch with
1392 ``try``/``except``), but the record carries ``status="failed"``
1393 plus a truncated error message so the engine's audit path sees
1394 every invocation.
1396 On both success and failure the wrapper also emits a
1397 ``mission_script_call_event`` audit row tagged
1398 ``via_script=True``. The dispatch into ``tool_dispatcher`` runs
1399 the registered tool function, so the standard ``@audit_logged``
1400 entry has already fired by the time the wrapper reaches its emit
1401 site — the script-call event is a *second*, distinct row that
1402 lets consumers distinguish in-script invocations from direct
1403 ``tool_calls`` strategy invocations without having to walk
1404 timestamps.
1405 """
1407 async def wrapper(**kwargs: Any) -> Any:
1408 # Snapshot the kwargs into a fresh dict before dispatch so the
1409 # log entry preserves exactly what the script passed even if
1410 # the dispatcher mutates the dict downstream.
1411 args = dict(kwargs)
1412 started = time.monotonic()
1413 try:
1414 result = await tool_dispatcher(tool_name, args, ctx)
1415 except Exception as exc:
1416 duration_ms = max(int((time.monotonic() - started) * 1000), 0)
1417 error_message = f"{type(exc).__name__}: {exc}"[:200]
1418 script_call_log.append(
1419 {
1420 "tool_name": tool_name,
1421 "args": args,
1422 "status": "failed",
1423 "result_summary": None,
1424 "duration_ms": duration_ms,
1425 # Truncated to 200 chars to match the audit
1426 # module's existing convention for error_message
1427 # fields elsewhere in the engine.
1428 "error_message": error_message,
1429 }
1430 )
1431 # Emit the via_script audit row before re-raising so the
1432 # event is recorded even when the script catches the
1433 # exception and continues executing.
1434 _audit.emit_script_call_event(
1435 session_id,
1436 iteration_index,
1437 tool_name,
1438 "failed",
1439 duration_ms,
1440 error_message=error_message,
1441 )
1442 raise
1443 duration_ms = max(int((time.monotonic() - started) * 1000), 0)
1444 record: dict[str, Any] = {
1445 "tool_name": tool_name,
1446 "args": args,
1447 "status": "ok",
1448 "result_summary": result,
1449 "duration_ms": duration_ms,
1450 }
1451 script_call_log.append(record)
1452 _audit.emit_script_call_event(
1453 session_id,
1454 iteration_index,
1455 tool_name,
1456 "ok",
1457 duration_ms,
1458 )
1459 return result
1461 # Setting ``__name__`` makes Monty's traceback render the
1462 # operator's tool name rather than ``wrapper`` when a call goes
1463 # wrong inside the sandboxed script. The script_call_log remains
1464 # the canonical record of what fired.
1465 wrapper.__name__ = tool_name
1466 return wrapper
1469# ---------------------------------------------------------------------------
1470# Observation assembly
1471# ---------------------------------------------------------------------------
1474def _annotate_call_result(call: dict[str, Any]) -> Any:
1475 """Wrap a script-call ``result_summary`` with per-call markers.
1477 Mirrors :meth:`MissionEngine._annotate_tool_result` for the
1478 scripted-strategy path so the Observation's ``tool_results`` list
1479 always carries the ``_status`` and ``tool_name`` markers the
1480 predicate evaluator and the ``tool_call_succeeded`` evaluator
1481 rely on, regardless of the underlying tool's return shape.
1483 Strategy:
1485 * **Dict result_summary** — augment in place with ``_status`` and
1486 ``tool_name`` only when those keys are absent. This keeps any
1487 caller-supplied marker visible while ensuring evaluators always
1488 find them.
1489 * **Non-dict result_summary** — wrap in a fresh dict carrying
1490 the call's ``_status`` / ``tool_name`` plus a ``result`` field
1491 that holds the original payload so predicates can still walk
1492 into it.
1493 """
1494 result = call.get("result_summary")
1495 status = call.get("status") or "unknown"
1496 tool_name = call.get("tool_name")
1497 if isinstance(result, dict):
1498 annotated = dict(result)
1499 annotated.setdefault("_status", status)
1500 annotated.setdefault("tool_name", tool_name)
1501 return annotated
1502 return {
1503 "_status": status,
1504 "tool_name": tool_name,
1505 "result": result,
1506 }
1509def _build_script_observation(
1510 *,
1511 script_call_log: list[dict[str, Any]],
1512 observe_log: list[dict[str, Any]],
1513 event_log: list[dict[str, Any]],
1514 phase_started_at: str,
1515 phase_ended_at: str,
1516) -> dict[str, Any]:
1517 """Merge the closure-captured logs into an Observation dict.
1519 Mirrors :meth:`MissionEngine._build_observation` for the
1520 ``tool_calls`` strategy path so a downstream Evaluate_Phase /
1521 Decide_Phase consumer cannot tell, from the Observation shape
1522 alone, whether the iteration ran a scripted or a non-scripted
1523 Strategy:
1525 * ``tool_results`` lists every call's ``result_summary`` (including
1526 failures, for stable indexing against ``script_call_log``).
1527 * ``metrics`` lifts any top-level ``metrics`` dict from a
1528 successful tool result, exactly like the engine does.
1529 * ``events`` pools the events emitted by tool results with the
1530 ``mission.event(...)`` calls so the criteria evaluator only
1531 walks one list.
1532 * ``errors`` carries failed / skipped calls in the same shape the
1533 engine uses, so the decide-phase heuristic that triggers
1534 ``adjust`` on new errors keeps working unchanged.
1536 The ``mission.observe(...)`` rows fold into a dedicated
1537 ``observations`` bucket inside ``metrics`` rather than flat-merging
1538 so a script-collected key cannot silently overwrite a tool-derived
1539 metric of the same name. A criterion that wants a script-collected
1540 key reads ``metrics.observations.<key>``; a criterion that wants a
1541 tool-derived metric reads ``metrics.<key>``. The two namespaces
1542 stay distinct.
1543 """
1544 tool_results: list[Any] = []
1545 metrics: dict[str, Any] = {}
1546 events: list[dict[str, Any]] = []
1547 errors: list[dict[str, Any]] = []
1549 for call in script_call_log:
1550 tool_results.append(_annotate_call_result(call))
1551 if call.get("status") == "ok": 1551 ↛ 1563line 1551 didn't jump to line 1563 because the condition on line 1551 was always true
1552 result = call.get("result_summary")
1553 if isinstance(result, dict): 1553 ↛ 1554line 1553 didn't jump to line 1554 because the condition on line 1553 was never true
1554 result_metrics = result.get("metrics")
1555 if isinstance(result_metrics, dict):
1556 metrics.update(result_metrics)
1557 result_events = result.get("events")
1558 if isinstance(result_events, list):
1559 for event in result_events:
1560 if isinstance(event, dict):
1561 events.append(event)
1562 else:
1563 errors.append(
1564 {
1565 "tool_name": call.get("tool_name"),
1566 "status": call.get("status"),
1567 "error_message": call.get("error_message"),
1568 }
1569 )
1571 # Pool the script-side ``mission.event(...)`` calls with
1572 # tool-derived events. ``dict(ev)`` is a defensive copy so a later
1573 # mutation of the closure list does not bleed into the persisted
1574 # Observation.
1575 for ev in event_log:
1576 events.append(dict(ev))
1578 # ``mission.observe(...)`` rows fold into a dedicated bucket on
1579 # metrics so they remain addressable without colliding with
1580 # tool-derived metric names.
1581 if observe_log: 1581 ↛ 1587line 1581 didn't jump to line 1587 because the condition on line 1581 was always true
1582 observations_bucket: dict[str, Any] = {}
1583 for entry in observe_log:
1584 observations_bucket[entry["key"]] = entry["value"]
1585 metrics["observations"] = observations_bucket
1587 observation: dict[str, Any] = {
1588 "tool_results": tool_results,
1589 "metrics": metrics,
1590 "events": events,
1591 "phase_started_at": phase_started_at,
1592 "phase_ended_at": phase_ended_at,
1593 }
1594 if errors: 1594 ↛ 1595line 1594 didn't jump to line 1595 because the condition on line 1594 was never true
1595 observation["errors"] = errors
1596 return observation
1599# ---------------------------------------------------------------------------
1600# MissionSandbox
1601# ---------------------------------------------------------------------------
1604class MissionSandbox:
1605 """Run a validated Mission script under ``MontySandboxProvider`` limits.
1607 One sandbox per iteration. The constructor freezes the per-iteration
1608 ``mission`` namespace as a :class:`types.MappingProxyType` snapshot
1609 (so a script cannot reach back through ``mission`` and mutate the
1610 session record), pins the operator's tool allowlist, and builds the
1611 underlying ``MontySandboxProvider`` with the duration / memory
1612 limits read from the module-level constants. :meth:`run` then
1613 drives a single script execution end to end:
1615 1. AST validate via :func:`validate_script_ast` — propagation of
1616 :class:`ScriptRejected` is the engine's signal to fail the
1617 Execute_Phase with reason ``script_rejected``.
1618 2. Build the ``external_functions`` map: one async wrapper per
1619 allowlisted tool, each forwarding into the engine's tool
1620 dispatcher so the wrapper preserves the existing
1621 ``@audit_logged`` / feature-flag / allowlist semantics — running
1622 inside a script is *not* a way to bypass any of those.
1623 3. Execute under Monty's caps. Any ``MontyError`` (limit /
1624 runtime / typing / syntax) is re-raised as
1625 :class:`SandboxTerminated` carrying whatever the script
1626 collected before being killed.
1627 4. Fold the closure-captured tool log, observe log, and event log
1628 into an Observation dict whose shape exactly matches the
1629 engine's tool-calls path.
1631 The sandbox is immutable after construction: there are no setters,
1632 no rebuild methods, and the underlying provider is held by
1633 reference rather than recreated per call. Each iteration gets its
1634 own MissionSandbox so a stale frozen namespace cannot leak across
1635 iterations.
1636 """
1638 def __init__(
1639 self,
1640 allowlist: list[str],
1641 session: Any,
1642 ) -> None:
1643 # Defensive copy of the allowlist: the engine pins the
1644 # allowlist on the session at create time, but a shared list
1645 # reference would let later mutations slip past the AST
1646 # validator's frozenset (which is constructed once per
1647 # validation call from ``self._allowlist``).
1648 self._allowlist: list[str] = list(allowlist)
1650 # Build the per-iteration mission namespace as an immutable
1651 # snapshot. Each iteration summary carries only the four
1652 # fields a script needs to reason about prior progress —
1653 # full IterationRecord shapes would be both heavy and
1654 # tempting for a script to walk in ways the engine does not
1655 # support.
1656 iteration_summaries: list[dict[str, Any]] = []
1657 for it in session.get("iterations") or []: 1657 ↛ 1658line 1657 didn't jump to line 1658 because the loop on line 1657 never started
1658 iteration_summaries.append(
1659 {
1660 "iteration_index": it.get("iteration_index"),
1661 "verdict": it.get("verdict"),
1662 "verdict_reason": it.get("verdict_reason"),
1663 "checkpoint_evaluated": it.get("checkpoint_evaluated"),
1664 }
1665 )
1666 # ``copy.deepcopy`` on criteria + budget so a script that
1667 # walks them via subscripting cannot mutate the session
1668 # record even if Python's MappingProxyType were ever
1669 # bypassed by a future change.
1670 ns: dict[str, Any] = {
1671 "session_id": session["session_id"],
1672 "iteration_index": len(session.get("iterations") or []),
1673 "directive_text": session.get("directive_text", ""),
1674 "criteria": copy.deepcopy(session.get("criteria") or []),
1675 "budget": copy.deepcopy(session.get("budget") or {}),
1676 "iterations": iteration_summaries,
1677 }
1678 self._frozen_mission_ns: MappingProxyType[str, Any] = MappingProxyType(ns)
1680 # Construct the provider once and pin it on the instance.
1681 # The provider holds no per-call state, so reusing it across
1682 # multiple ``run`` calls would be safe in principle, but the
1683 # one-sandbox-per-iteration lifetime keeps the failure
1684 # surface small and matches the rest of the per-iteration
1685 # state above.
1686 provider_cls, _ = _import_provider()
1687 self._provider = provider_cls(
1688 limits={
1689 "max_duration_secs": _DURATION_LIMIT_SECS,
1690 "max_memory": _MEMORY_LIMIT_BYTES,
1691 }
1692 )
1694 # ---- read-only accessors ------------------------------------------
1696 @property
1697 def frozen_mission_ns(self) -> MappingProxyType[str, Any]:
1698 """The iteration's frozen ``mission`` namespace snapshot."""
1699 return self._frozen_mission_ns
1701 @property
1702 def allowlist(self) -> list[str]:
1703 """Defensive copy of the per-session tool allowlist."""
1704 return list(self._allowlist)
1706 # ---- public surface -----------------------------------------------
1708 async def run(
1709 self,
1710 script: str,
1711 ctx: Any | None,
1712 tool_dispatcher: Callable[[str, dict[str, Any], Any], Awaitable[Any]],
1713 ) -> tuple[dict[str, Any], list[dict[str, Any]]]:
1714 """Validate, execute, and observe a Mission script.
1716 Returns ``(observation, script_call_log)`` matching the shape
1717 the engine's ``_execute_script`` expects: the observation is a
1718 plain dict (engine cast to :class:`Observation` at the call
1719 site) and the call log is a list of
1720 :class:`ToolCallRecord`-shaped dicts.
1722 On any ``MontyError`` from the provider — duration cap, memory
1723 cap, runtime / typing / syntax error inside the script — the
1724 method re-raises as :class:`SandboxTerminated` carrying the
1725 closure-captured partial observations and events. The engine's
1726 decide-phase pattern-matches on this exception and produces a
1727 ``terminate`` verdict for the iteration.
1729 ``ScriptRejected`` from the AST validator propagates upward
1730 unchanged: the engine's Execute_Phase treats that as a
1731 ``script_rejected`` failure and never reaches the runtime path
1732 below.
1733 """
1734 # Step 1: AST gate. Propagating ``ScriptRejected`` upward is
1735 # deliberate — the engine's _execute_phase wraps it as a
1736 # phase failure with reason ``script_rejected``; doing the
1737 # rejection here means the runtime path never sees a
1738 # disallowed source.
1739 validate_script_ast(script, self._allowlist)
1741 _, monty_error_cls = _import_provider()
1743 # Closure-captured collectors. Populated synchronously by the
1744 # host-side helper closures registered as
1745 # ``external_functions`` and the per-tool wrappers; observed
1746 # post-run (or post-termination) to build the Observation.
1747 # Lists rather than dicts so the order in which the script
1748 # called ``mission.event`` / ``mission.observe`` is preserved
1749 # in the final record.
1750 observe_log: list[dict[str, Any]] = []
1751 event_log: list[dict[str, Any]] = []
1752 script_call_log: list[dict[str, Any]] = []
1754 # Host-side helpers for ``mission.observe`` and
1755 # ``mission.event``. Routing them through the
1756 # ``external_functions`` channel — rather than as bound
1757 # methods on a dataclass shipped via ``inputs`` — is what
1758 # makes script-side mutations visible to the host:
1759 # ``MontySandboxProvider`` round-trips ``inputs`` values into
1760 # the underlying Monty VM by value, so a closure list
1761 # captured on a method body of an ``inputs`` dataclass would
1762 # only ever see the VM-side copy. The external-functions
1763 # channel runs each call back in host Python, so the lists
1764 # below receive the appends.
1765 #
1766 # The signatures match the original ``mission.observe`` /
1767 # ``mission.event`` script-facing surface: ``observe`` takes
1768 # ``(key, value)`` positionally, ``event`` takes ``name``
1769 # positionally plus arbitrary keyword arguments. The AST
1770 # rewrite below replaces the attribute callee with a bare
1771 # Name lookup but leaves args / kwargs unchanged, so the
1772 # call shape that lands on these helpers is exactly what an
1773 # operator would write at the script surface.
1774 async def _mission_observe(key: str, value: Any) -> None:
1775 observe_log.append({"key": key, "value": value})
1777 async def _mission_event(name: str, **kwargs: Any) -> None:
1778 event_row: dict[str, Any] = {"event_name": name}
1779 event_row.update(kwargs)
1780 event_log.append(event_row)
1782 # The frozen mission namespace remains pinned on this
1783 # sandbox instance (``self._frozen_mission_ns``) so a future
1784 # widening of the script surface can expose it without
1785 # rebuilding the construction-time snapshot. It does *not*
1786 # ride through the ``inputs`` channel today: the validator
1787 # never accepts attribute access on anything other than
1788 # ``mission`` (and the only two ``mission`` attributes are
1789 # the ``observe`` / ``event`` helpers handled by the
1790 # preamble below), so a script has no way to read the
1791 # snapshot through Monty's runtime. Holding it on the host
1792 # side is the simpler shape; routing it as a ``Mapping``
1793 # through ``inputs`` would require Monty to convert the
1794 # full dataclass + nested dicts to its own value model and
1795 # pay a per-iteration translation cost for data nothing
1796 # observes.
1798 # Build the external_functions mapping. Each tool name maps
1799 # to an async wrapper; Monty's ``external_functions`` channel
1800 # auto-wraps sync callables to async, but we register native
1801 # async functions so the dispatcher's ``await`` chain stays
1802 # explicit and the wrapper can do its own timing.
1803 external_functions: dict[str, Callable[..., Any]] = {}
1804 # Pull the per-iteration identifiers off the frozen namespace
1805 # snapshot built at construction time so the wrapper records
1806 # the same ``session_id`` / ``iteration_index`` the rest of
1807 # the iteration's audit rows carry.
1808 session_id = self._frozen_mission_ns["session_id"]
1809 iteration_index = self._frozen_mission_ns["iteration_index"]
1810 for tool_name in self._allowlist:
1811 external_functions[tool_name] = _make_tool_wrapper(
1812 tool_name,
1813 ctx,
1814 tool_dispatcher,
1815 script_call_log,
1816 session_id,
1817 iteration_index,
1818 )
1820 # The two helper functions ride alongside the per-tool
1821 # wrappers under reserved underscore-prefixed names. Operator
1822 # scripts cannot collide with these: the AST validator
1823 # rejects ``_mission_observe`` and ``_mission_event`` as
1824 # bare names (neither is on the per-session tool allowlist
1825 # nor any of the safe-builtin / exception / mission base
1826 # sets), so a script that wrote ``_mission_observe(...)``
1827 # directly would fail the gate with ``name_not_allowed``.
1828 # Only the AST rewrite below — applied *after* the gate —
1829 # ever produces those Name nodes.
1830 external_functions["_mission_observe"] = _mission_observe
1831 external_functions["_mission_event"] = _mission_event
1833 # The validated operator source is re-parsed and rewritten
1834 # so every accepted ``mission.<helper>(...)`` Call's callee
1835 # becomes a bare-Name lookup of the corresponding reserved
1836 # external-function name. Monty's parser does not accept
1837 # ``class`` / nested-attribute shims that would otherwise
1838 # let us preserve the surface attribute call, so the
1839 # rewrite happens on the AST itself before the source ever
1840 # reaches the underlying VM. Operator code keeps its
1841 # author-time surface (``await mission.observe(key, value)``);
1842 # only the run-time surface differs.
1843 final_source = _rewrite_mission_helpers(script)
1845 phase_started_at = datetime.now(UTC).isoformat()
1847 try:
1848 await self._provider.run(
1849 code=final_source,
1850 inputs={},
1851 external_functions=external_functions,
1852 )
1853 except monty_error_cls as exc:
1854 # ``MontyError`` is the base of the limit / runtime /
1855 # typing / syntax error family. Catching the base class
1856 # rather than the leaves means a future Monty release
1857 # adding a new error type still routes through
1858 # ``SandboxTerminated`` rather than escaping as an opaque
1859 # ``Exception``.
1860 raise SandboxTerminated(
1861 type(exc).__name__,
1862 partial_observations=list(observe_log),
1863 partial_events=list(event_log),
1864 partial_script_call_log=list(script_call_log),
1865 ) from exc
1867 phase_ended_at = datetime.now(UTC).isoformat()
1869 # The script's return value is intentionally ignored: the
1870 # contract documented for the script surface is "use
1871 # ``mission.observe(...)`` / ``mission.event(...)`` to report
1872 # data". A script that returned a dict would conflict with
1873 # the helper-driven observation list, and the engine's
1874 # observe-phase already accepts a pre-built Observation
1875 # without consulting any return value.
1876 observation = _build_script_observation(
1877 script_call_log=script_call_log,
1878 observe_log=observe_log,
1879 event_log=event_log,
1880 phase_started_at=phase_started_at,
1881 phase_ended_at=phase_ended_at,
1882 )
1883 return observation, list(script_call_log)
1886# ---------------------------------------------------------------------------
1887# Default factory
1888# ---------------------------------------------------------------------------
1891def make_default_sandbox_runner(
1892 allowlist: list[str],
1893 session: Any,
1894) -> Callable[
1895 [str, Any, Callable[[str, dict[str, Any], Any], Awaitable[Any]]],
1896 Awaitable[tuple[dict[str, Any], list[dict[str, Any]]]],
1897]:
1898 """Build the default ``sandbox_runner`` callable for the engine.
1900 The :class:`MissionEngine` takes a callable matching the
1901 ``SandboxRunner`` protocol (``(script, ctx, tool_dispatcher) ->
1902 (observation_dict, script_call_log)``); this helper wraps a fresh
1903 :class:`MissionSandbox` for a given session and returns the bound
1904 :meth:`MissionSandbox.run` method so the engine can drive the
1905 sandbox without depending on the sandbox class itself.
1907 One sandbox per session: the constructor freezes a snapshot of the
1908 session's directive, criteria, budget, and prior-iteration
1909 summaries into the ``mission`` namespace, so reusing a runner
1910 across sessions would leak stale state. The engine's normal
1911 construction path therefore calls this factory once per
1912 ``mission_start`` and pins the returned callable on the engine
1913 instance for the session's lifetime.
1914 """
1915 sandbox = MissionSandbox(
1916 allowlist=allowlist,
1917 session=session,
1918 )
1919 return sandbox.run
1922# ---------------------------------------------------------------------------
1923# Public surface
1924# ---------------------------------------------------------------------------
1927__all__ = [
1928 "MissionSandbox",
1929 "ScriptRejected",
1930 "SandboxTerminated",
1931 "make_default_sandbox_runner",
1932 "validate_script_ast",
1933]