Threshold-Gated Recursive Evaluation
When comparing List[StructuredModel] fields, Stickler only performs detailed nested-field analysis on object pairs whose overall similarity meets a configurable threshold. Pairs that fall below the threshold are classified as False Discovery (FD) and treated as atomic units -- no field-by-field breakdown is generated for them.
Core Principle
Only recurse into nested field evaluation for object pairs that meet the similarity threshold.
This keeps metrics focused on meaningful comparisons and avoids generating misleading field-level statistics for object pairs that are fundamentally different.
Algorithm Flow
1. Hungarian Matching
Use the Hungarian algorithm to find optimal pairings between GT and Pred lists based on overall object similarity.
2. Threshold Classification
For each matched pair, compare the similarity score against StructuredModel.match_threshold:
- similarity >= threshold -- TP: recurse into nested fields
- similarity < threshold -- FD: stop recursion, treat as atomic
3. Unmatched Items
- GT extras -- FN: stop recursion
- Pred extras -- FA: stop recursion
Code Example
from stickler import StructuredModel, ComparableField
from stickler.comparators.levenshtein import LevenshteinComparator
from stickler.comparators.exact import ExactComparator
from typing import List
class Product(StructuredModel):
product_id: str = ComparableField(
comparator=ExactComparator(), threshold=1.0, weight=3.0
)
name: str = ComparableField(
comparator=LevenshteinComparator(), threshold=0.7, weight=2.0
)
price: float = ComparableField(threshold=0.9, weight=1.0)
match_threshold = 0.8 # Gates recursive evaluation
class Order(StructuredModel):
order_id: str = ComparableField(
comparator=ExactComparator(), threshold=1.0, weight=2.0
)
products: List[Product] = ComparableField(weight=3.0)
Worked Scenarios
Given three GT products and three Pred products:
Good Match (similarity >= 0.8)
GT: Product("PROD-001", "Laptop", 999.99)
Pred: Product("PROD-001", "Laptop Computer", 999.99)
- Classification: TP
- Nested field analysis is performed:
product_id: TP (exact match)name: TP (similarity ~0.9)price: TP (exact match)
Poor Match (similarity < 0.8)
GT: Product("PROD-002", "Mouse", 29.99)
Pred: Product("PROD-002", "Different Product", 99.99)
- Classification: FD
- No nested field analysis -- the objects are too dissimilar for field-level breakdown to be useful.
Unmatched Items
GT: Product("PROD-003", "Cable", 14.99) -- FN (no counterpart in Pred)
Pred: Product("PROD-004", "New Product", 19.99) -- FA (no counterpart in GT)
No nested analysis for either.
Result Structure
{
"products": {
"overall": {
"tp": 1, "fd": 1, "fn": 1, "fa": 1,
"derived": { "cm_precision": 0.5, "cm_recall": 0.5, "cm_f1": 0.5 }
},
"fields": {
"product_id": { "tp": 1 },
"name": { "tp": 1 },
"price": { "tp": 1 }
},
"non_matches": [
{
"type": "FD",
"gt_object": "Product(PROD-002, Mouse, 29.99)",
"pred_object": "Product(PROD-002, Different Product, 99.99)",
"similarity": 0.3
},
{ "type": "FN", "gt_object": "Product(PROD-003, Cable, 14.99)" },
{ "type": "FA", "pred_object": "Product(PROD-004, New Product, 19.99)" }
]
}
}
Field-level metrics appear only for the single TP pair. The non_matches list documents every FD, FN, and FA for diagnostic purposes.
Delegation Pattern
Under the hood, comparison logic is distributed across specialized components:
| Component | Responsibility |
|---|---|
| ComparisonEngine | Orchestrates the single-traversal comparison; manages score percolation |
| ComparisonDispatcher | Routes each field to the correct comparator based on type and null state |
| FieldComparator | Handles primitives and single nested StructuredModel fields |
| PrimitiveListComparator | Handles List[str], List[int], etc. via Hungarian matching |
| StructuredListComparator | Handles List[StructuredModel] with threshold-gated recursion |
The dispatcher uses pattern matching on null states for early exits, then routes non-null values by type to the appropriate comparator.
Score Aggregation
Scores percolate upward from leaf fields to the top-level result using weighted averaging:
- Each field comparison produces a raw similarity score (0.0 -- 1.0).
- The score is optionally clipped to 0 if below the field threshold (
clip_under_threshold). - Clipped scores are multiplied by the field weight and summed.
- The overall similarity is
total_weighted_score / total_weight.
The all_fields_matched flag is True only when every field's raw similarity meets its individual threshold.
Edge Cases
Empty lists -- [] vs [] is TN. [] vs [items] generates one FA per item. [items] vs [] generates one FN per item.
Threshold boundary -- similarity >= threshold counts as TP and triggers recursion. Values exactly at the boundary are matches.
Different thresholds per model -- Each StructuredModel subclass can define its own match_threshold. A Product with match_threshold = 0.8 and an Address with match_threshold = 0.6 are each evaluated independently.
Nested lists -- When a StructuredModel contains another List[StructuredModel], the same threshold-gating applies recursively at each nesting level, using the inner model's match_threshold.
See Also
- Hungarian Matching -- the assignment algorithm that produces pairings
- Classification Logic -- full definitions of TP, FD, FA, FN, TN
- Aggregate Metrics -- how metrics roll up through the result tree