Rich Value Pattern
The Rich Value Pattern lets fields carry metadata alongside their actual values. Instead of just "Widget", a field can be {"_value": "Widget", "_confidence": 0.95}, wrapping the value with confidence scores, bounding boxes, or any custom metadata your pipeline produces.
The Convention
A rich value is any JSON dict with a _value key. The underscore prefix prevents collision with user data.
{
"invoice_number": {"_value": "INV-2024-001", "_confidence": 0.97},
"vendor": {"_value": "Acme Corp", "_bbox": [0.1, 0.2, 0.3, 0.4]},
"total": {"_value": 1247.50},
"notes": "Delivered to front door"
}
_value(required): the actual field value, unwrapped into the model field_confidence(optional): model confidence score, consumed by the confidence evaluation module- Any other keys (optional): stored as extras, accessible via
get_field_extras() - Plain values (no wrapper): still work, no change needed
How It Works
from_json() walks the JSON tree. When it finds a dict with _value, it:
- Extracts
_valueas the field value for the model - Extracts
_confidence(if present) into the confidence accessor - Stores everything else as extras for that field path
After unwrapping, the model behaves like a normal pydantic model. model_dump(), validation, and comparison all work on the unwrapped values.
from stickler import StructuredModel, ComparableField
class Invoice(StructuredModel):
invoice_number: str = ComparableField()
vendor: str = ComparableField()
total: float = ComparableField()
pred = Invoice.from_json({
"invoice_number": {"_value": "INV-001", "_confidence": 0.97},
"vendor": {"_value": "Acme Corp", "_handwritten": True, "_source_page": 2},
"total": 1247.50
})
# Values are unwrapped
pred.invoice_number # "INV-001"
pred.vendor # "Acme Corp"
pred.total # 1247.50
# Confidence is accessible
pred.get_field_confidence("invoice_number") # 0.97
pred.get_field_confidence("total") # None (no rich value)
# Extras are accessible
pred.get_field_extras("vendor") # {"_handwritten": True, "_source_page": 2}
pred.get_all_extras() # {"vendor": {"_handwritten": True, "_source_page": 2}}
Nested Objects and Lists
Rich values work at any nesting depth. Paths use dot notation for nested objects and bracket notation for list items.
{
"customer": {
"name": {"_value": "Jane Smith", "_confidence": 0.96},
"address": {
"street": {"_value": "123 Main St", "_confidence": 0.85},
"city": "Boston"
}
},
"items": [
{
"product": {"_value": "Laptop", "_confidence": 0.89},
"price": {"_value": 1299.99, "_bbox": [0.3, 0.4, 0.5, 0.1]}
}
]
}
pred.get_field_confidence("customer.address.street") # 0.85
pred.get_field_extras("items[0].price") # {"_bbox": [0.3, 0.4, 0.5, 0.1]}
Wrapping Null Values
Rich values support explicit null predictions. This distinguishes "I predicted null" from "the field is missing entirely":
{
"notes": {"_value": null, "_confidence": 0.3}
}
The model field gets None as its value, and the confidence score is preserved.
Value-Only Wrapping
You can wrap a value without any metadata. This is useful when some fields in your pipeline use rich values and you want a consistent format:
{
"name": {"_value": "Widget"},
"price": {"_value": 29.99},
"sku": "ABC123"
}
All three fields produce the same model values. The first two are unwrapped from rich values; the third is a plain value.
What Gets Stored Where
All keys inside a rich value wrapper must use an underscore prefix. This is enforced with a warning at parse time. Non-prefixed keys will still be stored in extras, but a UserWarning is emitted to help you catch the issue early.
Reserved namespace
Underscore-prefixed keys (_*) inside a rich value wrapper are a reserved namespace owned by stickler. Any dict containing a _value key is treated as a wrapper, and stickler may extract additional _-prefixed keys (_confidence, _bbox, _source_span, ...) into typed accessors as the schema grows. Don't ship {"_value": x} payloads where _value is meant to be user data — wrap once at the boundary, or use a different key name.
Reserved top-level dunder names
The following keys cannot appear at the top level of a payload passed to StructuredModel.from_json():
__stickler_raw_json____stickler_field_confidences____stickler_field_extras__
These names back library-managed metadata stores on the model instance. Allowing them through extra: "allow" would let user data silently shadow stickler's own state. from_json() raises ValueError if any of them appear in the input dict.
| Key in rich value | Where it goes | How to access it |
|---|---|---|
_value |
Model field value | pred.field_name |
_confidence |
Confidence store | pred.get_field_confidence("field_name") |
_bbox, _handwritten, etc. |
Extras store | pred.get_field_extras("field_name") |
Detection Rule
A dict is a rich value if and only if it contains a _value key. The underscore prefix is chosen because:
- It's conventionally "private/system" in Python and JSON
- It's extremely unlikely to appear as a real data key
- It's visually distinct from user data keys
Dicts without _value are treated as regular nested data and passed through to pydantic unchanged.
Deprecated legacy shape
The pre-rename {"value": ..., "confidence": ...} shape is still
recognized for one release so existing JSONL corpora keep their
confidence data on upgrade. Both value and confidence keys must be
present for the shim to fire — a plain dict like
{"currency": "USD", "value": 100} is treated as user data, not a rich
value. When the shim does fire, a DeprecationWarning names the
offending field path and the value is unwrapped the same way as the new
form. The legacy shape will be removed in 0.5.0 — migrate payloads to
_value/_confidence as soon as you can.
Integration with Confidence Evaluation
The confidence evaluation module consumes _confidence values from rich values. See Confidence Metrics for AUROC, Brier Score, ECE, and Error Capture at Review Budget metrics.
JSONL Serialization
When a prediction is created via from_json() with rich values, the original JSON is stored on the model instance. When compare_with() runs, this raw JSON is included in the result as prediction_raw. This enables the map/reduce pattern: compare individual documents, save results to JSONL, and later aggregate them via update_from_comparison_result() with full confidence and bounding-box (mAP) metric support.
See Bulk Evaluation: Map/Reduce for the full pattern.
Metadata Types
The pattern is designed for extensibility. Two metadata types are supported today:
_confidence— consumed by the Confidence Metrics module for calibration scoring._bbox— consumed by the Bounding Box mAP Metrics module for localization scoring.
Known metadata keys are extracted into typed accessors (_confidence via get_field_confidence() / get_all_confidences()), while any other underscore-prefixed keys — including _bbox — are preserved in extras and read via get_field_extras() / get_all_extras(). The bbox metric reads boxes directly from get_all_extras(), so no dedicated get_field_bbox() accessor is needed. Future metadata types (e.g. source spans for provenance tracking) follow the same path: store them in a rich value and the relevant metric picks them up from extras.
See Also
- Confidence Metrics: evaluating confidence calibration quality
- Confidence Evaluation Guide: practical guide for using confidence metrics
- Bounding Box mAP Metrics: evaluating bounding-box localization with mean Average Precision