Comparators
Comparators are the algorithms that determine how similar two field values are. Each comparator is optimized for a different data type or comparison strategy, returning a similarity score between 0.0 (completely different) and 1.0 (identical). When you define a StructuredModel, you assign a comparator to each field so Stickler knows how to evaluate that field.
Which Comparator Should I Use?
| Comparator | Best For | Speed | Needs AWS? | Score Type |
|---|---|---|---|---|
| ExactComparator | IDs, codes, booleans | Instant | No | Binary (0.0 or 1.0) |
| LevenshteinComparator | Names, addresses, text with typos | Instant | No | Continuous (0.0--1.0) |
| NumericComparator | Prices, quantities, measurements | Instant | No | Binary (0.0 or 1.0) |
| FuzzyComparator | Flexible text, descriptions, reordered tokens | Fast | No | Continuous (0.0--1.0) |
| SemanticComparator | Meaning-based text similarity | Moderate | Yes (Bedrock) | Continuous (0.0--1.0) |
| BERTComparator | Contextual semantic similarity | Moderate | No (runs locally) | Continuous (0.0--1.0) |
| LLMComparator | Complex semantic evaluation with reasoning | Slow | Yes (Bedrock) | Binary (0.0 or 1.0) |
Comparator Details
ExactComparator
Checks for exact string matching after normalizing whitespace, punctuation, and (by default) case. Returns 1.0 for exact matches and 0.0 otherwise.
When to use: Critical identifiers, status codes, booleans, or any field where partial matches are meaningless.
from stickler import StructuredModel, ComparableField
from stickler.comparators import ExactComparator
class Order(StructuredModel):
order_id: str = ComparableField(
comparator=ExactComparator(),
threshold=1.0,
weight=3.0
)
Key parameters:
| Parameter | Default | Description |
|---|---|---|
threshold |
1.0 |
Similarity threshold for binary classification |
case_sensitive |
False |
Whether comparison is case-sensitive |
LevenshteinComparator
Calculates the Levenshtein edit distance between two strings and returns a normalized similarity score: 1.0 - (edit_distance / max_length). Optionally normalizes input by stripping whitespace and lowercasing.
When to use: Names, addresses, free-form text where typos and minor variations are expected. This is the default comparator for string fields.
from stickler import StructuredModel, ComparableField
from stickler.comparators import LevenshteinComparator
class Contact(StructuredModel):
name: str = ComparableField(
comparator=LevenshteinComparator(threshold=0.8),
weight=1.5
)
Key parameters:
| Parameter | Default | Description |
|---|---|---|
threshold |
0.7 |
Similarity threshold for binary classification |
normalize |
True |
Strip whitespace and lowercase before comparison |
NumericComparator
Extracts numeric values from strings or numbers and compares them with configurable tolerance. Handles currency symbols, commas, and accounting notation (e.g., (123) for negative values). Returns 1.0 if the numbers match within tolerance, 0.0 otherwise.
When to use: Prices, quantities, measurements, or any numeric field where small differences are acceptable.
from stickler import StructuredModel, ComparableField
from stickler.comparators import NumericComparator
class Invoice(StructuredModel):
amount: float = ComparableField(
comparator=NumericComparator(relative_tolerance=0.05),
weight=2.0
)
Key parameters:
| Parameter | Default | Description |
|---|---|---|
threshold |
1.0 |
Similarity threshold for binary classification |
relative_tolerance |
0.0 |
Relative tolerance (e.g., 0.1 = 10%) |
absolute_tolerance |
0.0 |
Absolute tolerance (e.g., 0.01 for cents) |
tolerance |
None |
Alias for absolute_tolerance (backward compatibility) |
FuzzyComparator
Uses the rapidfuzz library for advanced fuzzy string matching. Supports multiple matching methods including standard ratio, partial matching, and token-based matching that is order-independent.
When to use: Descriptions, product names, or text where word order may vary or partial matches are valuable.
from stickler import StructuredModel, ComparableField
from stickler.comparators import FuzzyComparator
class Product(StructuredModel):
description: str = ComparableField(
comparator=FuzzyComparator(method="token_sort_ratio"),
threshold=0.7
)
Key parameters:
| Parameter | Default | Description |
|---|---|---|
threshold |
0.7 |
Similarity threshold for binary classification |
method |
"ratio" |
Matching method: "ratio", "partial_ratio", "token_sort_ratio", or "token_set_ratio" |
normalize |
True |
Strip whitespace and lowercase before comparison |
Matching methods explained:
ratio-- Standard Levenshtein distance ratio (similar toLevenshteinComparatorbut using rapidfuzz's optimized implementation).partial_ratio-- Finds the best partial match within the longer string. Good when one value is a substring of the other.token_sort_ratio-- Splits strings into tokens, sorts them, then compares. Handles reordered words (e.g., "John Smith" vs "Smith John").token_set_ratio-- Splits into token sets, comparing the intersection and remainder. Handles extra or missing words.
Dependency
FuzzyComparator requires the rapidfuzz package. Install it with: pip install rapidfuzz
SemanticComparator
Uses AWS Bedrock Titan embeddings to generate vector representations of text, then computes cosine similarity. Captures meaning rather than surface-level string similarity.
When to use: Text fields where meaning matters more than exact wording. See LLM-as-a-Judge Comparators for a detailed guide.
from stickler import StructuredModel, ComparableField
from stickler.comparators import SemanticComparator
class Review(StructuredModel):
summary: str = ComparableField(
comparator=SemanticComparator(threshold=0.8),
weight=1.0
)
Key parameters:
| Parameter | Default | Description |
|---|---|---|
threshold |
0.7 |
Similarity threshold for binary classification |
model_id |
"amazon.titan-embed-text-v2:0" |
Bedrock embedding model ID |
sim_function |
"cosine_similarity" |
Similarity function to use |
embedding_function |
None |
Optional custom embedding function (bypasses Bedrock) |
BERTComparator
Uses the BERTScore metric (via the evaluate library) to calculate contextual semantic similarity. Returns the F1 score component of BERTScore as the similarity measure. Runs entirely locally -- no API calls required.
When to use: Text fields where you need semantic understanding without cloud dependencies. See LLM-as-a-Judge Comparators for a detailed guide.
from stickler import StructuredModel, ComparableField
from stickler.comparators import BERTComparator
class Document(StructuredModel):
summary: str = ComparableField(
comparator=BERTComparator(threshold=0.85),
weight=1.0
)
Key parameters:
| Parameter | Default | Description |
|---|---|---|
threshold |
0.7 |
Similarity threshold for binary classification |
The default model is distilbert-base-uncased, loaded globally via the evaluate library.
LLMComparator
Uses a Large Language Model (via AWS Bedrock and the strands-agents library) to perform intelligent semantic comparisons. The LLM receives both values and optional evaluation guidelines, then returns a binary equivalence judgment. This is the most flexible comparator but also the most expensive.
When to use: Complex comparisons that require reasoning, domain-specific logic, or understanding of abbreviations and conventions. See LLM-as-a-Judge Comparators for a detailed guide.
from stickler import StructuredModel, ComparableField
from stickler.comparators import LLMComparator
class Address(StructuredModel):
street: str = ComparableField(
comparator=LLMComparator(
model="us.amazon.nova-lite-v1:0",
eval_guidelines="Consider street abbreviations equivalent (St=Street, Ave=Avenue)"
),
threshold=0.8
)
Key parameters:
| Parameter | Default | Description |
|---|---|---|
model |
Required | Bedrock model ID string or a strands.models.Model instance |
eval_guidelines |
None |
Custom guidelines for the LLM to follow during comparison |
Dependency
LLMComparator requires the strands-agents package. Install it with: pip install stickler-eval[llm]
Default Comparators by Type
When you do not specify a comparator in ComparableField, Stickler assigns one based on the JSON schema type of the field:
| JSON Schema Type | Default Comparator | Default Threshold | Rationale |
|---|---|---|---|
string |
LevenshteinComparator | 0.5 | Handles typos and minor variations |
number |
NumericComparator | 0.5 | Tolerates small numeric differences |
integer |
NumericComparator | 0.5 | Tolerates small numeric differences |
boolean |
ExactComparator | 1.0 | Must be exactly true or false |
array (primitives) |
Based on item type | Based on item type | Inherits from element type |
array (objects) |
Hungarian matching | 0.7 | Optimal pairing of list elements |
object |
Recursive comparison | 0.7 | Field-by-field nested comparison |
Custom Comparators
You can create your own comparator by extending BaseComparator. The only requirement is implementing the compare method, which takes two values and returns a float between 0.0 and 1.0.
The BaseComparator Interface
from stickler.comparators.base import BaseComparator
class BaseComparator(ABC):
def __init__(self, threshold: float = 0.7):
self.threshold = threshold
@abstractmethod
def compare(self, str1: Any, str2: Any) -> float:
"""Compare two values and return a similarity score.
Args:
str1: First value
str2: Second value
Returns:
Similarity score between 0.0 and 1.0
"""
pass
BaseComparator also provides:
__call__-- makes the comparator callable directly (delegates tocompare).binary_compare-- converts the continuous similarity score to a(tp, fp)tuple based on the threshold.
Example: Custom RegexComparator
import re
from typing import Any
from stickler.comparators.base import BaseComparator
class RegexComparator(BaseComparator):
"""Comparator that checks if a value matches a reference regex pattern."""
def __init__(self, threshold: float = 1.0):
super().__init__(threshold=threshold)
def compare(self, pattern: Any, value: Any) -> float:
if pattern is None or value is None:
return 0.0
try:
return 1.0 if re.fullmatch(str(pattern), str(value)) else 0.0
except re.error:
return 0.0
Use it like any built-in comparator:
from stickler import StructuredModel, ComparableField
class PhoneRecord(StructuredModel):
phone: str = ComparableField(
comparator=RegexComparator(),
threshold=1.0
)
Next Steps
For a deep dive into the three AI-powered comparators (SemanticComparator, BERTComparator, and LLMComparator), see LLM-as-a-Judge Comparators.