Hungarian Matching for List[StructuredModel] A Complete Guide
Overview
This document provides a comprehensive guide to Hungarian matching for List[StructuredModel] fields in the stickler library, including concrete examples, expected behavior, and validation of the current implementation.
Core Algorithm: Threshold-Gated Recursive Evaluation
The Hungarian matching algorithm for structured lists follows this principle:
🔑 Only recurse into nested field evaluation for object pairs that meet the similarity threshold.
Algorithm Steps
- Hungarian Matching: Find optimal pairings between GT and Pred lists based on overall object similarity
- Threshold Classification: For each matched pair, check if similarity ≥
StructuredModel.match_threshold - similarity ≥ threshold → TP + recurse into nested fields
- similarity < threshold → FD + stop recursion (atomic)
- Unmatched Items: Handle items that couldn't be matched
- GT extras → FN + stop recursion (atomic)
- Pred extras → FA + stop recursion (atomic)
Concrete Example: Transaction Matching
Model Definitions
class Transaction(StructuredModel):
transaction_id: str = ComparableField(
comparator=ExactComparator(),
threshold=1.0,
weight=3.0
)
description: str = ComparableField(
comparator=LevenshteinComparator(),
threshold=0.7,
weight=2.0
)
amount: float = ComparableField(
threshold=0.9,
weight=1.0
)
# ⚠️ CRITICAL: This threshold controls Hungarian matching recursion
match_threshold = 0.8
class Account(StructuredModel):
account_id: str = ComparableField(
comparator=ExactComparator(),
threshold=1.0,
weight=2.0
)
# Notice: NO threshold on this field - should use Transaction.match_threshold
transactions: List[Transaction] = ComparableField(weight=3.0)
Test Data
Ground Truth:
gt_account = Account(
account_id="ACC-12345",
transactions=[
Transaction(transaction_id="TXN-001", description="Coffee shop payment", amount=4.95),
Transaction(transaction_id="TXN-002", description="Grocery store", amount=127.43),
Transaction(transaction_id="TXN-003", description="Gas station", amount=45.67)
]
)
Prediction:
pred_account = Account(
account_id="ACC-12345",
transactions=[
Transaction(transaction_id="TXN-001", description="Coffee shop", amount=4.95), # Good match (TP)
Transaction(transaction_id="TXN-002", description="Online purchase", amount=89.99), # Poor match (FD)
Transaction(transaction_id="TXN-004", description="Restaurant", amount=23.45) # Poor match (FD)
]
)
Step-by-Step Hungarian Matching Analysis
Step 1: Pairwise Similarity Calculation
| GT Index | Pred Index | GT Transaction | Pred Transaction | Similarity | Above Threshold? |
|---|---|---|---|---|---|
| 0 | 0 | TXN-001: Coffee shop payment | TXN-001: Coffee shop | 0.860 | ✅ Yes (≥0.8) |
| 0 | 1 | TXN-001: Coffee shop payment | TXN-002: Online purchase | 0.137 | ❌ No |
| 0 | 2 | TXN-001: Coffee shop payment | TXN-004: Restaurant | 0.154 | ❌ No |
| 1 | 0 | TXN-002: Grocery store | TXN-001: Coffee shop | 0.130 | ❌ No |
| 1 | 1 | TXN-002: Grocery store | TXN-002: Online purchase | 0.572 | ❌ No (<0.8) |
| 1 | 2 | TXN-002: Grocery store | TXN-004: Restaurant | 0.135 | ❌ No |
| 2 | 0 | TXN-003: Gas station | TXN-001: Coffee shop | 0.097 | ❌ No |
| 2 | 1 | TXN-003: Gas station | TXN-002: Online purchase | 0.056 | ❌ No |
| 2 | 2 | TXN-003: Gas station | TXN-004: Restaurant | 0.124 | ❌ No |
Step 2: Hungarian Algorithm Assignment
The Hungarian algorithm finds the optimal assignment that maximizes total similarity:
- GT[0] → Pred[0]: 0.860 ✅ Good Match
- GT[1] → Pred[1]: 0.572 ❌ Poor Match
- GT[2] → Pred[2]: 0.124 ❌ Poor Match
Step 3: Threshold-Gated Classification
| Pair | Similarity | Classification | Nested Field Analysis |
|---|---|---|---|
| GT[0] → Pred[0] | 0.860 ≥ 0.8 | TP | ✅ Generate nested metrics |
| GT[1] → Pred[1] | 0.572 < 0.8 | FD | ❌ Skip (atomic treatment) |
| GT[2] → Pred[2] | 0.124 < 0.8 | FD | ❌ Skip (atomic treatment) |
Expected Result Structure
Object-Level Metrics (List Level)
"transactions": {
"overall": {
"tp": 1, # GT[0] → Pred[0] good match
"fd": 2, # GT[1] → Pred[1] AND GT[2] → Pred[2] poor matches
"fa": 0, # No unmatched preds (equal length lists)
"fn": 0, # No unmatched GTs (equal length lists)
"fp": 2 # fd + fa = 2 + 0
},
"fields": {
# Only nested metrics for the TP pair (GT[0] → Pred[0])
"transaction_id": {"tp": 1, "fa": 0, "fd": 0, "fn": 0},
"description": {"tp": 1, "fa": 0, "fd": 0, "fn": 0},
"amount": {"tp": 1, "fa": 0, "fd": 0, "fn": 0}
},
"non_matches": [
{
"type": "FD",
"gt_object": "Transaction(TXN-002, Grocery store, $127.43)",
"pred_object": "Transaction(TXN-002, Online purchase, $89.99)",
"similarity": 0.572
},
{
"type": "FD",
"gt_object": "Transaction(TXN-003, Gas station, $45.67)",
"pred_object": "Transaction(TXN-004, Restaurant, $23.45)",
"similarity": 0.124
}
]
}
✅ Current Implementation Status
Implementation Validation Results
Actual Test Results (Equal Length Lists):
TEST CASE 1: Equal Length Lists (3x3)
Actual Metrics: TP=1, FD=2, FN=0, FA=0
Expected Metrics: TP=1, FD=2, FN=0, FA=0
✅ PASS: Metrics match corrected expected behavior
TEST CASE 2: GT Longer (4x2)
Actual Metrics: TP=1, FD=1, FN=2, FA=0
Expected Metrics: TP=1, FD=1, FN=2, FA=0
✅ PASS
TEST CASE 3: Pred Longer (2x4)
Actual Metrics: TP=1, FD=1, FN=0, FA=2
Expected Metrics: TP=1, FD=1, FN=0, FA=2
✅ PASS
TEST CASE 4: All Above Threshold (3x3)
Actual Metrics: TP=3, FD=0, FN=0, FA=0
Expected Metrics: TP=3, FD=0, FN=0, FA=0
✅ PASS
Key Findings
-
✅ Equal Length Behavior: When GT and Pred lists have equal length, Hungarian algorithm pairs everyone up, resulting in only TP/FD classifications (no FN/FA)
-
✅ Below-Threshold Treatment: Both similarity scores of 0.572 and 0.124 are correctly treated the same way as FD since both are below the 0.8 threshold
-
✅ Length Mismatch Behavior: Only when lists have uneven lengths do we see FN (unmatched GT items) or FA (unmatched Pred items)
-
✅ Threshold Gating: The implementation correctly uses the
match_threshold = 0.8to determine TP vs FD classification
Conclusion
The implementation is already correct and matches the expected behavior described in this document. The key insight was understanding that:
- Equal length lists → Only TP/FD possible (Hungarian pairs everyone)
- Uneven length lists → FN/FA occur for unpaired items
- Below threshold → All treated as FD regardless of specific similarity value
Implementation Plan
Phase 1: Create Extraction Baseline Tests ✅
- Document current behavior with concrete examples
- Create test cases that capture existing Hungarian matching patterns
- Establish bit-for-bit compatibility baseline
Phase 2: Extract Hungarian Logic to Dedicated Class
- Create
StructuredListComparatorclass - Move ~500 lines from
StructuredModelto dedicated class: _compare_struct_list_with_scores()_calculate_nested_field_metrics()_calculate_object_level_metrics()_calculate_struct_list_similarity()_handle_struct_list_empty_cases()- Update
StructuredModel._dispatch_field_comparison()to delegate
Phase 3: Fix Threshold Logic
- Correct threshold source in extracted class
- Implement proper threshold-gated recursion
- Update object-level vs field-level metric separation
Phase 4: Integration Testing
- Verify Phase 1 baseline tests still pass
- Test with different
match_thresholdvalues - Validate nested list scenarios work correctly
Phase 5: Cleanup and Documentation
- Remove old code from
StructuredModel - Update documentation with corrected examples
- Add integration tests for edge cases
Key Architectural Principles
1. Threshold Source Hierarchy
List[StructuredModel] Threshold Resolution:
1. Use StructuredModel.match_threshold (class attribute)
2. Fall back to default 0.7 if not specified
3. NEVER use ComparableField.threshold from parent list field
2. Metric Separation
Object-Level Metrics: Count objects (TP, FD, FA, FN)
Field-Level Metrics: Count field comparisons within good matches only
3. Recursion Gating
IF object_similarity >= match_threshold:
classification = TP
RECURSE into nested field analysis
ELSE:
classification = FD
STOP recursion (treat as atomic)
Testing Strategy
Baseline Compatibility Tests
- Capture exact current behavior before refactoring
- Bit-for-bit comparison of confusion matrix output
- Edge cases: empty lists, single items, all poor matches
Threshold Boundary Tests
- Objects exactly at threshold (0.8000...)
- Objects slightly above/below threshold
- Different model classes with different thresholds
Nested List Scenarios
List[List[StructuredModel]]recursive structures- Mixed threshold values across nesting levels
- Performance with large lists
This comprehensive documentation serves as both a specification for the correct behavior and a baseline for testing the upcoming refactor.