Skip to content

Comparators

stickler.comparators

Common comparators for key information evaluation.

This package contains comparators that are shared between the traditional and ANLS Star evaluation systems. These comparators implement a unified interface that works with both systems.

stickler.comparators.BaseComparator

Bases: ABC

Base class for all comparators.

This class defines the interface that all comparators must implement. Comparators are used to compare two values and return a similarity score between 0.0 and 1.0, where 1.0 means the values are identical.

Source code in stickler/comparators/base.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
class BaseComparator(ABC):
    """Base class for all comparators.

    This class defines the interface that all comparators must implement.
    Comparators are used to compare two values and return a similarity score
    between 0.0 and 1.0, where 1.0 means the values are identical.
    """

    def __init__(self, threshold: float = 0.7):
        """Initialize the comparator.

        Args:
            threshold: Similarity threshold (0.0-1.0)
        """
        self.threshold = threshold

    @abstractmethod
    def compare(self, str1: Any, str2: Any) -> float:
        """Compare two values and return a similarity score.

        Args:
            str1: First value
            str2: Second value

        Returns:
            Similarity score between 0.0 and 1.0
        """
        pass

    def __call__(self, str1: Any, str2: Any) -> float:
        """Make the comparator callable.

        Args:
            str1: First value
            str2: Second value

        Returns:
            Similarity score between 0.0 and 1.0
        """
        return self.compare(str1, str2)

    def binary_compare(self, str1: Any, str2: Any) -> Tuple[int, int]:
        """Compare two values and return a binary result as (tp, fp) tuple.

        This method converts the continuous similarity score to a binary decision
        based on the threshold. If the similarity is greater than or equal to the
        threshold, it returns (1, 0) indicating true positive. Otherwise, it returns
        (0, 1) indicating false positive.

        Args:
            str1: First value
            str2: Second value

        Returns:
            Tuple of (tp, fp) where tp is 1 if similar, 0 otherwise,
            and fp is the opposite
        """
        score = self.compare(str1, str2)
        if score >= self.threshold:
            return (1, 0)  # True positive
        else:
            return (0, 1)  # False positive

    def __str__(self) -> str:
        """String representation for serialization."""
        return self.__class__.__name__

    def __repr__(self) -> str:
        """Detailed string representation."""
        return f"{self.__class__.__name__}(threshold={self.threshold})"

__call__(str1, str2)

Make the comparator callable.

Parameters:

Name Type Description Default
str1 Any

First value

required
str2 Any

Second value

required

Returns:

Type Description
float

Similarity score between 0.0 and 1.0

Source code in stickler/comparators/base.py
36
37
38
39
40
41
42
43
44
45
46
def __call__(self, str1: Any, str2: Any) -> float:
    """Make the comparator callable.

    Args:
        str1: First value
        str2: Second value

    Returns:
        Similarity score between 0.0 and 1.0
    """
    return self.compare(str1, str2)

__init__(threshold=0.7)

Initialize the comparator.

Parameters:

Name Type Description Default
threshold float

Similarity threshold (0.0-1.0)

0.7
Source code in stickler/comparators/base.py
15
16
17
18
19
20
21
def __init__(self, threshold: float = 0.7):
    """Initialize the comparator.

    Args:
        threshold: Similarity threshold (0.0-1.0)
    """
    self.threshold = threshold

__repr__()

Detailed string representation.

Source code in stickler/comparators/base.py
74
75
76
def __repr__(self) -> str:
    """Detailed string representation."""
    return f"{self.__class__.__name__}(threshold={self.threshold})"

__str__()

String representation for serialization.

Source code in stickler/comparators/base.py
70
71
72
def __str__(self) -> str:
    """String representation for serialization."""
    return self.__class__.__name__

binary_compare(str1, str2)

Compare two values and return a binary result as (tp, fp) tuple.

This method converts the continuous similarity score to a binary decision based on the threshold. If the similarity is greater than or equal to the threshold, it returns (1, 0) indicating true positive. Otherwise, it returns (0, 1) indicating false positive.

Parameters:

Name Type Description Default
str1 Any

First value

required
str2 Any

Second value

required

Returns:

Type Description
int

Tuple of (tp, fp) where tp is 1 if similar, 0 otherwise,

int

and fp is the opposite

Source code in stickler/comparators/base.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def binary_compare(self, str1: Any, str2: Any) -> Tuple[int, int]:
    """Compare two values and return a binary result as (tp, fp) tuple.

    This method converts the continuous similarity score to a binary decision
    based on the threshold. If the similarity is greater than or equal to the
    threshold, it returns (1, 0) indicating true positive. Otherwise, it returns
    (0, 1) indicating false positive.

    Args:
        str1: First value
        str2: Second value

    Returns:
        Tuple of (tp, fp) where tp is 1 if similar, 0 otherwise,
        and fp is the opposite
    """
    score = self.compare(str1, str2)
    if score >= self.threshold:
        return (1, 0)  # True positive
    else:
        return (0, 1)  # False positive

compare(str1, str2) abstractmethod

Compare two values and return a similarity score.

Parameters:

Name Type Description Default
str1 Any

First value

required
str2 Any

Second value

required

Returns:

Type Description
float

Similarity score between 0.0 and 1.0

Source code in stickler/comparators/base.py
23
24
25
26
27
28
29
30
31
32
33
34
@abstractmethod
def compare(self, str1: Any, str2: Any) -> float:
    """Compare two values and return a similarity score.

    Args:
        str1: First value
        str2: Second value

    Returns:
        Similarity score between 0.0 and 1.0
    """
    pass

stickler.comparators.ExactComparator

Bases: BaseComparator

Comparator that checks for exact string matching.

This comparator removes whitespace and punctuation before comparison. It returns 1.0 for exact matches and 0.0 otherwise.

Example
comparator = ExactComparator()

# Returns 1.0 (exact match after normalization)
comparator.compare("hello, world!", "hello world")

# Returns 0.0 (different strings)
comparator.compare("hello", "goodbye")
Source code in stickler/comparators/exact.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class ExactComparator(BaseComparator):
    """Comparator that checks for exact string matching.

    This comparator removes whitespace and punctuation before comparison.
    It returns 1.0 for exact matches and 0.0 otherwise.

    Example:
        ```python
        comparator = ExactComparator()

        # Returns 1.0 (exact match after normalization)
        comparator.compare("hello, world!", "hello world")

        # Returns 0.0 (different strings)
        comparator.compare("hello", "goodbye")
        ```
    """

    def __init__(self, threshold: float = 1.0, case_sensitive: bool = False):
        """Initialize the comparator.

        Args:
            threshold: Similarity threshold (default 1.0)
            case_sensitive: Whether comparison is case sensitive (default False)
        """
        super().__init__(threshold=threshold)
        self.case_sensitive = case_sensitive

    def compare(self, str1: Any, str2: Any) -> float:
        """Compare two values with exact string matching.

        Args:
            str1: First value
            str2: Second value

        Returns:
            1.0 if the strings match exactly after normalization, 0.0 otherwise
        """
        if str1 is None and str2 is None:
            return 1.0
        if str1 is None or str2 is None:
            return 0.0

        # Convert to strings if they aren't already
        str1 = str(str1)
        str2 = str(str2)

        # Apply case normalization if needed
        if not self.case_sensitive:
            str1 = lowercase(str1)
            str2 = lowercase(str2)

        # Remove whitespace and punctuation
        normalized1 = strip_punctuation_space(str1)
        normalized2 = strip_punctuation_space(str2)

        # Compare normalized strings
        return 1.0 if normalized1 == normalized2 else 0.0

__init__(threshold=1.0, case_sensitive=False)

Initialize the comparator.

Parameters:

Name Type Description Default
threshold float

Similarity threshold (default 1.0)

1.0
case_sensitive bool

Whether comparison is case sensitive (default False)

False
Source code in stickler/comparators/exact.py
27
28
29
30
31
32
33
34
35
def __init__(self, threshold: float = 1.0, case_sensitive: bool = False):
    """Initialize the comparator.

    Args:
        threshold: Similarity threshold (default 1.0)
        case_sensitive: Whether comparison is case sensitive (default False)
    """
    super().__init__(threshold=threshold)
    self.case_sensitive = case_sensitive

compare(str1, str2)

Compare two values with exact string matching.

Parameters:

Name Type Description Default
str1 Any

First value

required
str2 Any

Second value

required

Returns:

Type Description
float

1.0 if the strings match exactly after normalization, 0.0 otherwise

Source code in stickler/comparators/exact.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def compare(self, str1: Any, str2: Any) -> float:
    """Compare two values with exact string matching.

    Args:
        str1: First value
        str2: Second value

    Returns:
        1.0 if the strings match exactly after normalization, 0.0 otherwise
    """
    if str1 is None and str2 is None:
        return 1.0
    if str1 is None or str2 is None:
        return 0.0

    # Convert to strings if they aren't already
    str1 = str(str1)
    str2 = str(str2)

    # Apply case normalization if needed
    if not self.case_sensitive:
        str1 = lowercase(str1)
        str2 = lowercase(str2)

    # Remove whitespace and punctuation
    normalized1 = strip_punctuation_space(str1)
    normalized2 = strip_punctuation_space(str2)

    # Compare normalized strings
    return 1.0 if normalized1 == normalized2 else 0.0

stickler.comparators.LevenshteinComparator

Bases: BaseComparator

Comparator using Levenshtein distance for string similarity.

This class implements the Levenshtein distance algorithm for measuring the difference between two strings. It calculates a normalized similarity score between 0 and 1.

Source code in stickler/comparators/levenshtein.py
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
class LevenshteinComparator(BaseComparator):
    """Comparator using Levenshtein distance for string similarity.

    This class implements the Levenshtein distance algorithm for measuring
    the difference between two strings. It calculates a normalized similarity
    score between 0 and 1.
    """

    def __init__(self, normalize: bool = True, threshold: float = 0.7):
        """Initialize the comparator.

        Args:
            normalize: Whether to normalize input strings
                      (strip whitespace, lowercase) before comparison
            threshold: Similarity threshold (default 0.7)
        """
        super().__init__(threshold=threshold)
        self._normalize = normalize

    @property
    def name(self) -> str:
        """Return the name of the comparator."""
        return "levenshtein"

    @property
    def config(self) -> Optional[Dict[str, Any]]:
        """Return configuration parameters."""
        return {"normalize": self._normalize}

    def compare(self, s1: Any, s2: Any) -> float:
        """
        Compare two strings using Levenshtein distance.

        Args:
            s1: First string or value
            s2: Second string or value

        Returns:
            Similarity score between 0.0 and 1.0, with 1.0 indicating identical

        Raises:
            TypeError: If either input is a dictionary, as dictionaries are not suitable
                      for Levenshtein distance comparison and should be handled through
                      structured models instead.
        """
        # Reject dictionaries - they should be broken down into proper StructuredModel subclasses
        if isinstance(s1, dict) or isinstance(s2, dict):
            raise TypeError(
                "Dictionary objects cannot be compared using LevenshteinComparator. "
                "Use a StructuredModel subclass with properly defined fields instead."
            )

        # Convert to strings and handle None values
        s1 = "" if s1 is None else str(s1)
        s2 = "" if s2 is None else str(s2)

        # Normalize strings if enabled
        if self._normalize:
            s1 = " ".join(s1.strip().lower().split())
            s2 = " ".join(s2.strip().lower().split())

        # Handle empty strings
        if not s1 and not s2:
            return 1.0

        # Calculate Levenshtein distance
        dist = self._levenshtein_distance(s1, s2)
        str_length = max(len(s1), len(s2))

        if str_length == 0:
            return 1.0

        # Convert distance to similarity (1.0 - normalized_distance)
        return 1.0 - (float(dist) / float(str_length))

    @staticmethod
    def _levenshtein_distance(s1: str, s2: str) -> int:
        """
        Calculate the Levenshtein distance between two strings.

        Args:
            s1: First string
            s2: Second string

        Returns:
            The Levenshtein distance as an integer
        """
        if len(s1) > len(s2):
            s1, s2 = s2, s1

        distances = range(len(s1) + 1)
        for i2, c2 in enumerate(s2):
            distances_ = [i2 + 1]
            for i1, c1 in enumerate(s1):
                if c1 == c2:
                    distances_.append(distances[i1])
                else:
                    distances_.append(
                        1 + min((distances[i1], distances[i1 + 1], distances_[-1]))
                    )
            distances = distances_
        return distances[-1]

config property

Return configuration parameters.

name property

Return the name of the comparator.

__init__(normalize=True, threshold=0.7)

Initialize the comparator.

Parameters:

Name Type Description Default
normalize bool

Whether to normalize input strings (strip whitespace, lowercase) before comparison

True
threshold float

Similarity threshold (default 0.7)

0.7
Source code in stickler/comparators/levenshtein.py
16
17
18
19
20
21
22
23
24
25
def __init__(self, normalize: bool = True, threshold: float = 0.7):
    """Initialize the comparator.

    Args:
        normalize: Whether to normalize input strings
                  (strip whitespace, lowercase) before comparison
        threshold: Similarity threshold (default 0.7)
    """
    super().__init__(threshold=threshold)
    self._normalize = normalize

compare(s1, s2)

Compare two strings using Levenshtein distance.

Parameters:

Name Type Description Default
s1 Any

First string or value

required
s2 Any

Second string or value

required

Returns:

Type Description
float

Similarity score between 0.0 and 1.0, with 1.0 indicating identical

Raises:

Type Description
TypeError

If either input is a dictionary, as dictionaries are not suitable for Levenshtein distance comparison and should be handled through structured models instead.

Source code in stickler/comparators/levenshtein.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def compare(self, s1: Any, s2: Any) -> float:
    """
    Compare two strings using Levenshtein distance.

    Args:
        s1: First string or value
        s2: Second string or value

    Returns:
        Similarity score between 0.0 and 1.0, with 1.0 indicating identical

    Raises:
        TypeError: If either input is a dictionary, as dictionaries are not suitable
                  for Levenshtein distance comparison and should be handled through
                  structured models instead.
    """
    # Reject dictionaries - they should be broken down into proper StructuredModel subclasses
    if isinstance(s1, dict) or isinstance(s2, dict):
        raise TypeError(
            "Dictionary objects cannot be compared using LevenshteinComparator. "
            "Use a StructuredModel subclass with properly defined fields instead."
        )

    # Convert to strings and handle None values
    s1 = "" if s1 is None else str(s1)
    s2 = "" if s2 is None else str(s2)

    # Normalize strings if enabled
    if self._normalize:
        s1 = " ".join(s1.strip().lower().split())
        s2 = " ".join(s2.strip().lower().split())

    # Handle empty strings
    if not s1 and not s2:
        return 1.0

    # Calculate Levenshtein distance
    dist = self._levenshtein_distance(s1, s2)
    str_length = max(len(s1), len(s2))

    if str_length == 0:
        return 1.0

    # Convert distance to similarity (1.0 - normalized_distance)
    return 1.0 - (float(dist) / float(str_length))

stickler.comparators.NumericComparator

Bases: BaseComparator

Comparator for numeric values with configurable tolerance.

This comparator extracts and compares numeric values from strings or numbers. It supports relative and absolute tolerance for comparison.

Example
# Default exact matching
exact = NumericComparator()
exact.compare("123", "123.0")  # Returns 1.0
exact.compare("123", "124")    # Returns 0.0

# With tolerance
approx = NumericComparator(relative_tolerance=0.1)  # 10% tolerance
approx.compare("100", "109")   # Returns 1.0 (within 10%)
approx.compare("100", "111")   # Returns 0.0 (beyond 10%)
Source code in stickler/comparators/numeric.py
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
class NumericComparator(BaseComparator):
    """Comparator for numeric values with configurable tolerance.

    This comparator extracts and compares numeric values from strings or numbers.
    It supports relative and absolute tolerance for comparison.

    Example:
        ```python
        # Default exact matching
        exact = NumericComparator()
        exact.compare("123", "123.0")  # Returns 1.0
        exact.compare("123", "124")    # Returns 0.0

        # With tolerance
        approx = NumericComparator(relative_tolerance=0.1)  # 10% tolerance
        approx.compare("100", "109")   # Returns 1.0 (within 10%)
        approx.compare("100", "111")   # Returns 0.0 (beyond 10%)
        ```
    """

    def __init__(
        self,
        threshold: float = 1.0,
        relative_tolerance: float = 0.0,
        absolute_tolerance: float = 0.0,
        tolerance: Optional[float] = None,
    ):
        """Initialize the comparator.

        Args:
            threshold: Similarity threshold (default 1.0)
            relative_tolerance: Relative tolerance for comparison (default 0.0)
            absolute_tolerance: Absolute tolerance for comparison (default 0.0)
            tolerance: Alias for absolute_tolerance (for backward compatibility)
        """
        super().__init__(threshold=threshold)
        self.relative_tolerance = relative_tolerance

        # Handle tolerance alias for backward compatibility
        if tolerance is not None:
            if absolute_tolerance != 0.0:
                raise ValueError(
                    "Cannot specify both 'tolerance' and 'absolute_tolerance'. Use 'absolute_tolerance'."
                )
            self.absolute_tolerance = tolerance
        else:
            self.absolute_tolerance = absolute_tolerance

    @property
    def config(self) -> Optional[Dict[str, Any]]:
        """Return configuration parameters for serialization."""
        config = {}
        if self.relative_tolerance != 0.0:
            config["relative_tolerance"] = self.relative_tolerance
        if self.absolute_tolerance != 0.0:
            config["absolute_tolerance"] = self.absolute_tolerance
        return config or None

    def compare(self, str1: Any, str2: Any) -> float:
        """Compare two values numerically.

        Args:
            str1: First value
            str2: Second value

        Returns:
            1.0 if the numbers match within tolerance, 0.0 otherwise
        """
        if str1 is None and str2 is None:
            return 1.0
        if str1 is None or str2 is None:
            return 0.0

        # Extract numeric values
        num1 = self._extract_number(str1)
        num2 = self._extract_number(str2)

        if num1 is None or num2 is None:
            return 0.0

        # Check equality with tolerance
        if self._numbers_equal(num1, num2):
            return 1.0

        return 0.0

    def _extract_number(self, value: Any) -> Union[Decimal, None]:
        """Extract a numeric value from a string or number.

        Args:
            value: Value to extract a number from

        Returns:
            Decimal value or None if no valid number could be extracted
        """
        if isinstance(value, (int, float)):
            return Decimal(str(value))

        if not isinstance(value, str):
            value = str(value)

        # Check for accounting notation: (123) means -123
        is_negative = False
        if value.startswith("(") and value.endswith(")"):
            value = value[1:-1]  # Remove the parentheses
            is_negative = True

        # Remove common currency symbols and other non-numeric characters
        value = re.sub(r"[^0-9.-]", "", value)

        # Handle empty string
        if not value:
            return None

        # Try to convert to Decimal
        try:
            decimal_value = Decimal(value)
            # Apply negative sign if accounting notation was used
            if is_negative:
                decimal_value = -decimal_value
            return decimal_value
        except InvalidOperation:
            return None

    def _numbers_equal(self, num1: Decimal, num2: Decimal) -> bool:
        """Check if two numbers are equal within tolerance.

        Args:
            num1: First number
            num2: Second number

        Returns:
            True if numbers are equal within tolerance, False otherwise
        """
        if num1 == num2:
            return True

        # Check with relative tolerance
        if self.relative_tolerance > 0:
            # Handle zero case
            if num1 == 0:
                return abs(num2) <= self.relative_tolerance

            # Calculate relative difference using num1 as base
            relative_diff = abs(num1 - num2) / abs(num1)
            if relative_diff <= self.relative_tolerance:
                return True

        # Check with absolute tolerance
        if self.absolute_tolerance > 0:
            if abs(num1 - num2) <= self.absolute_tolerance:
                return True

        return False

config property

Return configuration parameters for serialization.

__init__(threshold=1.0, relative_tolerance=0.0, absolute_tolerance=0.0, tolerance=None)

Initialize the comparator.

Parameters:

Name Type Description Default
threshold float

Similarity threshold (default 1.0)

1.0
relative_tolerance float

Relative tolerance for comparison (default 0.0)

0.0
absolute_tolerance float

Absolute tolerance for comparison (default 0.0)

0.0
tolerance Optional[float]

Alias for absolute_tolerance (for backward compatibility)

None
Source code in stickler/comparators/numeric.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def __init__(
    self,
    threshold: float = 1.0,
    relative_tolerance: float = 0.0,
    absolute_tolerance: float = 0.0,
    tolerance: Optional[float] = None,
):
    """Initialize the comparator.

    Args:
        threshold: Similarity threshold (default 1.0)
        relative_tolerance: Relative tolerance for comparison (default 0.0)
        absolute_tolerance: Absolute tolerance for comparison (default 0.0)
        tolerance: Alias for absolute_tolerance (for backward compatibility)
    """
    super().__init__(threshold=threshold)
    self.relative_tolerance = relative_tolerance

    # Handle tolerance alias for backward compatibility
    if tolerance is not None:
        if absolute_tolerance != 0.0:
            raise ValueError(
                "Cannot specify both 'tolerance' and 'absolute_tolerance'. Use 'absolute_tolerance'."
            )
        self.absolute_tolerance = tolerance
    else:
        self.absolute_tolerance = absolute_tolerance

compare(str1, str2)

Compare two values numerically.

Parameters:

Name Type Description Default
str1 Any

First value

required
str2 Any

Second value

required

Returns:

Type Description
float

1.0 if the numbers match within tolerance, 0.0 otherwise

Source code in stickler/comparators/numeric.py
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def compare(self, str1: Any, str2: Any) -> float:
    """Compare two values numerically.

    Args:
        str1: First value
        str2: Second value

    Returns:
        1.0 if the numbers match within tolerance, 0.0 otherwise
    """
    if str1 is None and str2 is None:
        return 1.0
    if str1 is None or str2 is None:
        return 0.0

    # Extract numeric values
    num1 = self._extract_number(str1)
    num2 = self._extract_number(str2)

    if num1 is None or num2 is None:
        return 0.0

    # Check equality with tolerance
    if self._numbers_equal(num1, num2):
        return 1.0

    return 0.0

stickler.comparators.NumericExactC = NumericComparator module-attribute

stickler.comparators.FuzzyComparator

Bases: BaseComparator

Comparator for fuzzy string matching.

This comparator uses the rapidfuzz library to calculate similarity between strings using advanced Levenshtein distance calculations. It provides better fuzzy matching than basic Levenshtein for many use cases.

If rapidfuzz is not available, this will raise an ImportError when instantiated.

Source code in stickler/comparators/fuzzy.py
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
class FuzzyComparator(BaseComparator):
    """Comparator for fuzzy string matching.

    This comparator uses the rapidfuzz library to calculate similarity between
    strings using advanced Levenshtein distance calculations. It provides better
    fuzzy matching than basic Levenshtein for many use cases.

    If rapidfuzz is not available, this will raise an ImportError when instantiated.
    """

    def __init__(
        self, method: str = "ratio", normalize: bool = True, threshold: float = 0.7
    ):
        """Initialize the fuzzy comparator.

        Args:
            method: The fuzzy matching method to use. Options:
                - "ratio": Standard Levenshtein distance ratio
                - "partial_ratio": Partial string matching
                - "token_sort_ratio": Token-based matching with sorting
                - "token_set_ratio": Token-based matching with set operations
            normalize: Whether to normalize input strings before comparison
                      (strip whitespace, lowercase)
            threshold: Similarity threshold (default 0.7)

        Raises:
            ImportError: If rapidfuzz library is not available
        """
        super().__init__(threshold=threshold)

        if not RAPIDFUZZ_AVAILABLE:
            raise ImportError(
                "The rapidfuzz library is required for FuzzyComparator. "
                "Install it with: pip install rapidfuzz"
            )

        self._method = method
        self._normalize = normalize

        # Select the appropriate fuzzy matching function
        self._fuzzy_func = {
            "ratio": fuzz.ratio,
            "partial_ratio": fuzz.partial_ratio,
            "token_sort_ratio": fuzz.token_sort_ratio,
            "token_set_ratio": fuzz.token_set_ratio,
        }.get(method, fuzz.ratio)

    @property
    def name(self) -> str:
        """Return the name of the comparator."""
        return f"fuzzy_{self._method}"

    @property
    def config(self) -> Optional[Dict[str, Any]]:
        """Return configuration parameters."""
        return {"method": self._method, "normalize": self._normalize}

    def compare(self, value1: Any, value2: Any) -> float:
        """Compare two strings using fuzzy matching.

        Args:
            value1: First string or value
            value2: Second string or value

        Returns:
            Similarity score between 0.0 and 1.0
        """
        # Handle None values
        if value1 is None and value2 is None:
            return 1.0
        elif value1 is None or value2 is None:
            return 0.0

        # Convert to strings
        s1 = str(value1)
        s2 = str(value2)

        # Normalize if enabled
        if self._normalize:
            s1 = s1.strip().lower()
            s2 = s2.strip().lower()

        # Calculate fuzzy match score and normalize to 0.0-1.0
        if s1 == "" and s2 == "":
            return 1.0

        # Use the selected fuzzy matching function
        try:
            return self._fuzzy_func(s1, s2) / 100.0
        except Exception:
            # Fall back to basic comparison if fuzzy match fails
            return 1.0 if s1 == s2 else 0.0

config property

Return configuration parameters.

name property

Return the name of the comparator.

__init__(method='ratio', normalize=True, threshold=0.7)

Initialize the fuzzy comparator.

Parameters:

Name Type Description Default
method str

The fuzzy matching method to use. Options: - "ratio": Standard Levenshtein distance ratio - "partial_ratio": Partial string matching - "token_sort_ratio": Token-based matching with sorting - "token_set_ratio": Token-based matching with set operations

'ratio'
normalize bool

Whether to normalize input strings before comparison (strip whitespace, lowercase)

True
threshold float

Similarity threshold (default 0.7)

0.7

Raises:

Type Description
ImportError

If rapidfuzz library is not available

Source code in stickler/comparators/fuzzy.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def __init__(
    self, method: str = "ratio", normalize: bool = True, threshold: float = 0.7
):
    """Initialize the fuzzy comparator.

    Args:
        method: The fuzzy matching method to use. Options:
            - "ratio": Standard Levenshtein distance ratio
            - "partial_ratio": Partial string matching
            - "token_sort_ratio": Token-based matching with sorting
            - "token_set_ratio": Token-based matching with set operations
        normalize: Whether to normalize input strings before comparison
                  (strip whitespace, lowercase)
        threshold: Similarity threshold (default 0.7)

    Raises:
        ImportError: If rapidfuzz library is not available
    """
    super().__init__(threshold=threshold)

    if not RAPIDFUZZ_AVAILABLE:
        raise ImportError(
            "The rapidfuzz library is required for FuzzyComparator. "
            "Install it with: pip install rapidfuzz"
        )

    self._method = method
    self._normalize = normalize

    # Select the appropriate fuzzy matching function
    self._fuzzy_func = {
        "ratio": fuzz.ratio,
        "partial_ratio": fuzz.partial_ratio,
        "token_sort_ratio": fuzz.token_sort_ratio,
        "token_set_ratio": fuzz.token_set_ratio,
    }.get(method, fuzz.ratio)

compare(value1, value2)

Compare two strings using fuzzy matching.

Parameters:

Name Type Description Default
value1 Any

First string or value

required
value2 Any

Second string or value

required

Returns:

Type Description
float

Similarity score between 0.0 and 1.0

Source code in stickler/comparators/fuzzy.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
def compare(self, value1: Any, value2: Any) -> float:
    """Compare two strings using fuzzy matching.

    Args:
        value1: First string or value
        value2: Second string or value

    Returns:
        Similarity score between 0.0 and 1.0
    """
    # Handle None values
    if value1 is None and value2 is None:
        return 1.0
    elif value1 is None or value2 is None:
        return 0.0

    # Convert to strings
    s1 = str(value1)
    s2 = str(value2)

    # Normalize if enabled
    if self._normalize:
        s1 = s1.strip().lower()
        s2 = s2.strip().lower()

    # Calculate fuzzy match score and normalize to 0.0-1.0
    if s1 == "" and s2 == "":
        return 1.0

    # Use the selected fuzzy matching function
    try:
        return self._fuzzy_func(s1, s2) / 100.0
    except Exception:
        # Fall back to basic comparison if fuzzy match fails
        return 1.0 if s1 == s2 else 0.0

stickler.comparators.BERTComparator

Bases: BaseComparator

Comparator that uses BERT embeddings for semantic similarity.

This comparator uses the BERTScore metric to calculate semantic similarity between strings, returning the f1 score as the similarity measure.

Example
comparator = BERTComparator(threshold=0.8)

# Returns similarity score based on semantic similarity
score = comparator.compare("The cat sat on the mat", "A feline was sitting on a rug")
Source code in stickler/comparators/bert.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class BERTComparator(BaseComparator):
    """Comparator that uses BERT embeddings for semantic similarity.

    This comparator uses the BERTScore metric to calculate semantic similarity
    between strings, returning the f1 score as the similarity measure.

    Example:
        ```python
        comparator = BERTComparator(threshold=0.8)

        # Returns similarity score based on semantic similarity
        score = comparator.compare("The cat sat on the mat", "A feline was sitting on a rug")
        ```
    """

    def __init__(self, threshold: float = 0.7):
        """Initialize the BERTComparator.

        Args:
            threshold: Similarity threshold (0.0-1.0)
        """
        super().__init__(threshold=threshold)
        if model is None:
            raise ImportError(
                "BERTScore model could not be loaded. Please install 'evaluate' package."
            )

    def compare(self, str1: Any, str2: Any) -> float:
        """Compare two strings using BERT semantic similarity.

        Args:
            str1: First string
            str2: Second string

        Returns:
            Similarity score between 0.0 and 1.0 based on BERTScore f1
        """
        if str1 is None or str2 is None:
            return 0.0

        # Convert to strings if they aren't already
        str1 = str(str1)
        str2 = str(str2)

        # Strip punctuation and whitespace
        str1_clean = strip_punctuation_space(str1)
        str2_clean = strip_punctuation_space(str2)

        # Handle empty strings
        if not str1_clean or not str2_clean:
            return 1.0 if str1_clean == str2_clean else 0.0

        try:
            # Calculate BERT score
            result = model.compute(
                predictions=[str1_clean], references=[str2_clean], lang="en"
            )

            # Return f1 score
            return result["f1"][0]
        except Exception as e:
            # Fallback to direct comparison
            print(f"BERT comparison error: {str(e)}")
            return 1.0 if str1_clean == str2_clean else 0.0

__init__(threshold=0.7)

Initialize the BERTComparator.

Parameters:

Name Type Description Default
threshold float

Similarity threshold (0.0-1.0)

0.7
Source code in stickler/comparators/bert.py
33
34
35
36
37
38
39
40
41
42
43
def __init__(self, threshold: float = 0.7):
    """Initialize the BERTComparator.

    Args:
        threshold: Similarity threshold (0.0-1.0)
    """
    super().__init__(threshold=threshold)
    if model is None:
        raise ImportError(
            "BERTScore model could not be loaded. Please install 'evaluate' package."
        )

compare(str1, str2)

Compare two strings using BERT semantic similarity.

Parameters:

Name Type Description Default
str1 Any

First string

required
str2 Any

Second string

required

Returns:

Type Description
float

Similarity score between 0.0 and 1.0 based on BERTScore f1

Source code in stickler/comparators/bert.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def compare(self, str1: Any, str2: Any) -> float:
    """Compare two strings using BERT semantic similarity.

    Args:
        str1: First string
        str2: Second string

    Returns:
        Similarity score between 0.0 and 1.0 based on BERTScore f1
    """
    if str1 is None or str2 is None:
        return 0.0

    # Convert to strings if they aren't already
    str1 = str(str1)
    str2 = str(str2)

    # Strip punctuation and whitespace
    str1_clean = strip_punctuation_space(str1)
    str2_clean = strip_punctuation_space(str2)

    # Handle empty strings
    if not str1_clean or not str2_clean:
        return 1.0 if str1_clean == str2_clean else 0.0

    try:
        # Calculate BERT score
        result = model.compute(
            predictions=[str1_clean], references=[str2_clean], lang="en"
        )

        # Return f1 score
        return result["f1"][0]
    except Exception as e:
        # Fallback to direct comparison
        print(f"BERT comparison error: {str(e)}")
        return 1.0 if str1_clean == str2_clean else 0.0

stickler.comparators.SemanticComparator

Bases: BaseComparator

Comparator that uses embeddings for semantic similarity.

This comparator uses embeddings from a model (default: Titan) to calculate semantic similarity between strings.

Attributes:

Name Type Description
SIMILARITY_FUNCTIONS

Dictionary of similarity functions

bc

BedrockClient instance

model_id

Model ID to use for embeddings

embedding_function

Function to generate embeddings

sim_function

Name of the similarity function to use

similarity_function

The actual similarity function

Source code in stickler/comparators/semantic.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
class SemanticComparator(BaseComparator):
    """Comparator that uses embeddings for semantic similarity.

    This comparator uses embeddings from a model (default: Titan) to calculate
    semantic similarity between strings.

    Attributes:
        SIMILARITY_FUNCTIONS: Dictionary of similarity functions
        bc: BedrockClient instance
        model_id: Model ID to use for embeddings
        embedding_function: Function to generate embeddings
        sim_function: Name of the similarity function to use
        similarity_function: The actual similarity function
    """

    SIMILARITY_FUNCTIONS = {
        "cosine_similarity": lambda x, y: 1 - spatial.distance.cosine(x, y)
    }

    def __init__(
        self,
        model_id: str = "amazon.titan-embed-text-v2:0",
        sim_function: str = "cosine_similarity",
        embedding_function: Optional[Callable] = None,
        threshold: float = 0.7,
    ):
        """Initialize the SemanticComparator.

        Args:
            model_id: Model ID to use for embeddings
            sim_function: Name of the similarity function to use
            embedding_function: Optional custom embedding function
            threshold: Similarity threshold (0.0-1.0)

        Raises:
            ImportError: If BedrockClient is not available and no embedding_function is provided
        """
        super().__init__(threshold=threshold)

        if embedding_function is not None:
            self.embedding_function = embedding_function
        else:
            self.model_id = (model_id,)
            self.embedding_function = partial(
                generate_bedrock_embedding, model_id=model_id
            )

        self.sim_function = sim_function
        self.similarity_function = self.SIMILARITY_FUNCTIONS[self.sim_function]

    def compare(self, str1: str, str2: str) -> float:
        """Compare two strings using semantic similarity.

        Args:
            str1: First string
            str2: Second string

        Returns:
            Similarity score between 0.0 and 1.0
        """
        if str1 is None or str2 is None:
            return 0.0

        try:
            x, y = self.embedding_function(str1), self.embedding_function(str2)
            return self.similarity_function(x, y)
        except Exception:
            # Fallback to string equality if embedding fails
            return 1.0 if str1 == str2 else 0.0

__init__(model_id='amazon.titan-embed-text-v2:0', sim_function='cosine_similarity', embedding_function=None, threshold=0.7)

Initialize the SemanticComparator.

Parameters:

Name Type Description Default
model_id str

Model ID to use for embeddings

'amazon.titan-embed-text-v2:0'
sim_function str

Name of the similarity function to use

'cosine_similarity'
embedding_function Optional[Callable]

Optional custom embedding function

None
threshold float

Similarity threshold (0.0-1.0)

0.7

Raises:

Type Description
ImportError

If BedrockClient is not available and no embedding_function is provided

Source code in stickler/comparators/semantic.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def __init__(
    self,
    model_id: str = "amazon.titan-embed-text-v2:0",
    sim_function: str = "cosine_similarity",
    embedding_function: Optional[Callable] = None,
    threshold: float = 0.7,
):
    """Initialize the SemanticComparator.

    Args:
        model_id: Model ID to use for embeddings
        sim_function: Name of the similarity function to use
        embedding_function: Optional custom embedding function
        threshold: Similarity threshold (0.0-1.0)

    Raises:
        ImportError: If BedrockClient is not available and no embedding_function is provided
    """
    super().__init__(threshold=threshold)

    if embedding_function is not None:
        self.embedding_function = embedding_function
    else:
        self.model_id = (model_id,)
        self.embedding_function = partial(
            generate_bedrock_embedding, model_id=model_id
        )

    self.sim_function = sim_function
    self.similarity_function = self.SIMILARITY_FUNCTIONS[self.sim_function]

compare(str1, str2)

Compare two strings using semantic similarity.

Parameters:

Name Type Description Default
str1 str

First string

required
str2 str

Second string

required

Returns:

Type Description
float

Similarity score between 0.0 and 1.0

Source code in stickler/comparators/semantic.py
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def compare(self, str1: str, str2: str) -> float:
    """Compare two strings using semantic similarity.

    Args:
        str1: First string
        str2: Second string

    Returns:
        Similarity score between 0.0 and 1.0
    """
    if str1 is None or str2 is None:
        return 0.0

    try:
        x, y = self.embedding_function(str1), self.embedding_function(str2)
        return self.similarity_function(x, y)
    except Exception:
        # Fallback to string equality if embedding fails
        return 1.0 if str1 == str2 else 0.0

stickler.comparators.LLMComparator

Bases: BaseComparator

Large Language Model-based semantic comparator.

This comparator uses LLMs to perform intelligent semantic comparisons that go beyond simple string matching. It can understand context, handle abbreviations, recognize synonyms, and apply domain-specific comparison logic through custom evaluation guidelines.

The comparator returns binary similarity scores (0.0 or 1.0) based on whether the LLM determines the values are semantically equivalent. It handles edge cases like None values and provides detailed comparison information for debugging.

Attributes:

Name Type Description
model Union[Model, str]

The LLM model identifier or Model instance.

eval_guidelines str

Custom guidelines for comparison logic.

system_prompt str

The system prompt used to instruct the LLM.

prompt_template Template

Jinja2 template for formatting comparison prompts.

agent Agent

The strands Agent instance for LLM interactions.

threshold float

Inherited from BaseComparator, used for binary decisions.

Note

This comparator requires AWS Bedrock access and proper authentication. API calls incur costs and latency, so consider caching for repeated comparisons.

Source code in stickler/comparators/llm.py
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
class LLMComparator(BaseComparator):
    """Large Language Model-based semantic comparator.

    This comparator uses LLMs to perform intelligent semantic comparisons that go
    beyond simple string matching. It can understand context, handle abbreviations,
    recognize synonyms, and apply domain-specific comparison logic through custom
    evaluation guidelines.

    The comparator returns binary similarity scores (0.0 or 1.0) based on whether
    the LLM determines the values are semantically equivalent. It handles edge cases
    like None values and provides detailed comparison information for debugging.

    Attributes:
        model (Union[Model, str]): The LLM model identifier or Model instance.
        eval_guidelines (str, optional): Custom guidelines for comparison logic.
        system_prompt (str): The system prompt used to instruct the LLM.
        prompt_template (Template): Jinja2 template for formatting comparison prompts.
        agent (Agent): The strands Agent instance for LLM interactions.
        threshold (float): Inherited from BaseComparator, used for binary decisions.

    Note:
        This comparator requires AWS Bedrock access and proper authentication.
        API calls incur costs and latency, so consider caching for repeated comparisons.
    """

    def __init__(
        self,
        model: Union[Model, str] = None,
        eval_guidelines: str = None,
    ):
        """Initialize the LLM comparator.

        Args:
            model: The LLM model to use for comparisons. Can be a model identifier
                string (e.g., "us.anthropic.claude-3-haiku-20240307-v1:0") or a
                strands Model instance. Defaults to Claude 3 Haiku.
            eval_guidelines: Optional custom guidelines to include in the comparison
                prompt. These guidelines help the LLM understand domain-specific
                comparison rules (e.g., "Consider abbreviations equivalent").

        Raises:
            ImportError: If strands-agents is not installed.
            ValueError: If the model parameter is not provided.

        Example:
            >>> # Basic initialization
            >>> comparator = LLMComparator()

            >>> # With custom model and guidelines
            >>> comparator = LLMComparator(
            ...     model="us.amazon.nova-lite-v1:0",
            ...     eval_guidelines="Consider street abbreviations equivalent"
            ... )
        """
        super().__init__()

        # Check if strands is available
        if not STRANDS_AVAILABLE:
            raise ImportError(
                "LLMComparator requires the 'strands-agents' package. "
                "Install it with: pip install stickler-eval[llm]"
            )

        if model is None:
            raise ValueError("Model must be provided for LLMComparator.")
        self.model = model
        self.system_prompt = self._default_system_prompt()
        self.prompt_template = self._default_prompt_template()
        if eval_guidelines is not None:
            self.eval_guidelines = html.escape(eval_guidelines)
        else:
            self.eval_guidelines = eval_guidelines

        # Initialize Agent
        self.agent = Agent(
            model=self.model, system_prompt=self.system_prompt, callback_handler=None
        )

    def _default_system_prompt(self) -> str:
        """Generate the default system prompt for the LLM.

        Returns:
            str: System prompt instructing the LLM to perform binary comparisons.
        """
        return "You are a helpful assistant that compares two values and determines if they are equivalent. Only return one word: 'true' or 'false'."

    def _default_prompt_template(self) -> Template:
        """Generate the default Jinja2 template for comparison prompts.

        Returns:
            Template: Jinja2 template that formats comparison prompts with values
                and optional evaluation guidelines.
        """
        prompt_template = """
            Compare these two values and determine if they are equivalent:

            Value 1: {{ value1 }}
            Value 2: {{ value2 }}

            {% if eval_guidelines is not none %}
            <guidelines>
            Here are some guidelines to follow for the comparison:
            {{ eval_guidelines }}
            </guidelines>
            {% endif %}

            If the values are equivalent, return 'true'. If not, return 'false'. Only return one word: 'true' or 'false'.
            """

        template = Template(prompt_template)
        return template

    def _invoke_agent(self, prompt: str) -> str:
        """Invoke the LLM agent with a formatted prompt.

        Args:
            prompt: The formatted prompt string to send to the LLM.

        Returns:
            str: The text response from the LLM.

        Raises:
            Exception: If the agent call fails or response format is unexpected.
        """
        result = self.agent(prompt)
        return result.message["content"][0]["text"]

    def compare(self, value1: Any, value2: Any) -> float:
        """Compare two values using LLM-based semantic analysis.

        This method converts both values to strings and uses the configured LLM
        to determine if they are semantically equivalent. The comparison considers
        context, abbreviations, synonyms, and any provided evaluation guidelines.

        Args:
            value1: First value to compare. Can be any type that converts to string.
            value2: Second value to compare. Can be any type that converts to string.

        Returns:
            float: Binary similarity score:
                - 1.0 if the LLM determines the values are equivalent
                - 0.0 if the LLM determines the values are not equivalent
                - 0.0 if an error occurs during comparison

        Note:
            - None values: Returns 1.0 if both are None, 0.0 if only one is None
            - Error handling: Returns 0.0 for any exceptions during LLM calls
            - Cost consideration: Each call incurs API costs and latency

        Example:
            >>> comparator = LLMComparator()
            >>> comparator.compare("St. John's Street", "Saint John's St")
            1.0
            >>> comparator.compare("apple", "orange")
            0.0
            >>> comparator.compare(None, None)
            1.0
        """
        # Handle None values
        if value1 is None and value2 is None:
            return 1.0
        elif value1 is None or value2 is None:
            return 0.0

        # Format the prompt with your values
        formatted_prompt = self.prompt_template.render(
            value1=html.escape(str(value1)),
            value2=html.escape(str(value2)),
            eval_guidelines=self.eval_guidelines,
        )

        try:
            # Get LLM response
            response = self._invoke_agent(formatted_prompt)
            # Parse response to boolean
            response_lower = response.strip().lower()
            if "true" in response_lower:
                return 1.0
            else:
                return 0.0

        except NoCredentialsError:
            print("Error: AWS credentials not found.")
            raise

        except Exception as e:
            print(f"Error during LLM call: {e}")
            raise

    def get_comparison_details(self, value1: Any, value2: Any) -> Dict[str, Any]:
        """Get detailed information about a comparison operation.

        This method provides comprehensive details about the comparison process,
        including the formatted prompt, LLM response, model information, and
        final comparison result. Useful for debugging, auditing, and understanding
        how the LLM made its decision.

        Args:
            value1: First value to compare. Can be any type that converts to string.
            value2: Second value to compare. Can be any type that converts to string.

        Returns:
            Dict[str, Any]: Dictionary containing comparison details:
                - 'prompt' (str): The formatted prompt sent to the LLM
                - 'llm_response' (str): Raw response from the LLM
                - 'model_id' (Union[Model, str]): The model used (string ID or Model instance)
                - 'comparison_result' (float): Final similarity score (0.0 or 1.0)

                On error:
                - 'error' (str): Error message describing what went wrong
                - 'comparison_result' (bool): False to indicate failure

        Example:
            >>> comparator = LLMComparator(eval_guidelines="Consider abbreviations")
            >>> details = comparator.get_comparison_details("St. John", "Saint John")
            >>> print(details['llm_response'])
            'true'
            >>> print(details['comparison_result'])
            1.0
            >>> print('guidelines' in details['prompt'])
            True
        """
        formatted_prompt = self.prompt_template.render(
            value1=html.escape(str(value1)),
            value2=html.escape(str(value2)),
            eval_guidelines=self.eval_guidelines,
        )

        try:
            response = self._invoke_agent(formatted_prompt)
            return {
                "prompt": formatted_prompt,
                "llm_response": response,
                "model_id": self.model,
                "comparison_result": self.compare(value1, value2),
            }
        except Exception as e:
            return {"error": str(e), "comparison_result": False}

__init__(model=None, eval_guidelines=None)

Initialize the LLM comparator.

Parameters:

Name Type Description Default
model Union[Model, str]

The LLM model to use for comparisons. Can be a model identifier string (e.g., "us.anthropic.claude-3-haiku-20240307-v1:0") or a strands Model instance. Defaults to Claude 3 Haiku.

None
eval_guidelines str

Optional custom guidelines to include in the comparison prompt. These guidelines help the LLM understand domain-specific comparison rules (e.g., "Consider abbreviations equivalent").

None

Raises:

Type Description
ImportError

If strands-agents is not installed.

ValueError

If the model parameter is not provided.

Example

Basic initialization

comparator = LLMComparator()

With custom model and guidelines

comparator = LLMComparator( ... model="us.amazon.nova-lite-v1:0", ... eval_guidelines="Consider street abbreviations equivalent" ... )

Source code in stickler/comparators/llm.py
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def __init__(
    self,
    model: Union[Model, str] = None,
    eval_guidelines: str = None,
):
    """Initialize the LLM comparator.

    Args:
        model: The LLM model to use for comparisons. Can be a model identifier
            string (e.g., "us.anthropic.claude-3-haiku-20240307-v1:0") or a
            strands Model instance. Defaults to Claude 3 Haiku.
        eval_guidelines: Optional custom guidelines to include in the comparison
            prompt. These guidelines help the LLM understand domain-specific
            comparison rules (e.g., "Consider abbreviations equivalent").

    Raises:
        ImportError: If strands-agents is not installed.
        ValueError: If the model parameter is not provided.

    Example:
        >>> # Basic initialization
        >>> comparator = LLMComparator()

        >>> # With custom model and guidelines
        >>> comparator = LLMComparator(
        ...     model="us.amazon.nova-lite-v1:0",
        ...     eval_guidelines="Consider street abbreviations equivalent"
        ... )
    """
    super().__init__()

    # Check if strands is available
    if not STRANDS_AVAILABLE:
        raise ImportError(
            "LLMComparator requires the 'strands-agents' package. "
            "Install it with: pip install stickler-eval[llm]"
        )

    if model is None:
        raise ValueError("Model must be provided for LLMComparator.")
    self.model = model
    self.system_prompt = self._default_system_prompt()
    self.prompt_template = self._default_prompt_template()
    if eval_guidelines is not None:
        self.eval_guidelines = html.escape(eval_guidelines)
    else:
        self.eval_guidelines = eval_guidelines

    # Initialize Agent
    self.agent = Agent(
        model=self.model, system_prompt=self.system_prompt, callback_handler=None
    )

compare(value1, value2)

Compare two values using LLM-based semantic analysis.

This method converts both values to strings and uses the configured LLM to determine if they are semantically equivalent. The comparison considers context, abbreviations, synonyms, and any provided evaluation guidelines.

Parameters:

Name Type Description Default
value1 Any

First value to compare. Can be any type that converts to string.

required
value2 Any

Second value to compare. Can be any type that converts to string.

required

Returns:

Name Type Description
float float

Binary similarity score: - 1.0 if the LLM determines the values are equivalent - 0.0 if the LLM determines the values are not equivalent - 0.0 if an error occurs during comparison

Note
  • None values: Returns 1.0 if both are None, 0.0 if only one is None
  • Error handling: Returns 0.0 for any exceptions during LLM calls
  • Cost consideration: Each call incurs API costs and latency
Example

comparator = LLMComparator() comparator.compare("St. John's Street", "Saint John's St") 1.0 comparator.compare("apple", "orange") 0.0 comparator.compare(None, None) 1.0

Source code in stickler/comparators/llm.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
def compare(self, value1: Any, value2: Any) -> float:
    """Compare two values using LLM-based semantic analysis.

    This method converts both values to strings and uses the configured LLM
    to determine if they are semantically equivalent. The comparison considers
    context, abbreviations, synonyms, and any provided evaluation guidelines.

    Args:
        value1: First value to compare. Can be any type that converts to string.
        value2: Second value to compare. Can be any type that converts to string.

    Returns:
        float: Binary similarity score:
            - 1.0 if the LLM determines the values are equivalent
            - 0.0 if the LLM determines the values are not equivalent
            - 0.0 if an error occurs during comparison

    Note:
        - None values: Returns 1.0 if both are None, 0.0 if only one is None
        - Error handling: Returns 0.0 for any exceptions during LLM calls
        - Cost consideration: Each call incurs API costs and latency

    Example:
        >>> comparator = LLMComparator()
        >>> comparator.compare("St. John's Street", "Saint John's St")
        1.0
        >>> comparator.compare("apple", "orange")
        0.0
        >>> comparator.compare(None, None)
        1.0
    """
    # Handle None values
    if value1 is None and value2 is None:
        return 1.0
    elif value1 is None or value2 is None:
        return 0.0

    # Format the prompt with your values
    formatted_prompt = self.prompt_template.render(
        value1=html.escape(str(value1)),
        value2=html.escape(str(value2)),
        eval_guidelines=self.eval_guidelines,
    )

    try:
        # Get LLM response
        response = self._invoke_agent(formatted_prompt)
        # Parse response to boolean
        response_lower = response.strip().lower()
        if "true" in response_lower:
            return 1.0
        else:
            return 0.0

    except NoCredentialsError:
        print("Error: AWS credentials not found.")
        raise

    except Exception as e:
        print(f"Error during LLM call: {e}")
        raise

get_comparison_details(value1, value2)

Get detailed information about a comparison operation.

This method provides comprehensive details about the comparison process, including the formatted prompt, LLM response, model information, and final comparison result. Useful for debugging, auditing, and understanding how the LLM made its decision.

Parameters:

Name Type Description Default
value1 Any

First value to compare. Can be any type that converts to string.

required
value2 Any

Second value to compare. Can be any type that converts to string.

required

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: Dictionary containing comparison details: - 'prompt' (str): The formatted prompt sent to the LLM - 'llm_response' (str): Raw response from the LLM - 'model_id' (Union[Model, str]): The model used (string ID or Model instance) - 'comparison_result' (float): Final similarity score (0.0 or 1.0)

On error: - 'error' (str): Error message describing what went wrong - 'comparison_result' (bool): False to indicate failure

Example

comparator = LLMComparator(eval_guidelines="Consider abbreviations") details = comparator.get_comparison_details("St. John", "Saint John") print(details['llm_response']) 'true' print(details['comparison_result']) 1.0 print('guidelines' in details['prompt']) True

Source code in stickler/comparators/llm.py
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def get_comparison_details(self, value1: Any, value2: Any) -> Dict[str, Any]:
    """Get detailed information about a comparison operation.

    This method provides comprehensive details about the comparison process,
    including the formatted prompt, LLM response, model information, and
    final comparison result. Useful for debugging, auditing, and understanding
    how the LLM made its decision.

    Args:
        value1: First value to compare. Can be any type that converts to string.
        value2: Second value to compare. Can be any type that converts to string.

    Returns:
        Dict[str, Any]: Dictionary containing comparison details:
            - 'prompt' (str): The formatted prompt sent to the LLM
            - 'llm_response' (str): Raw response from the LLM
            - 'model_id' (Union[Model, str]): The model used (string ID or Model instance)
            - 'comparison_result' (float): Final similarity score (0.0 or 1.0)

            On error:
            - 'error' (str): Error message describing what went wrong
            - 'comparison_result' (bool): False to indicate failure

    Example:
        >>> comparator = LLMComparator(eval_guidelines="Consider abbreviations")
        >>> details = comparator.get_comparison_details("St. John", "Saint John")
        >>> print(details['llm_response'])
        'true'
        >>> print(details['comparison_result'])
        1.0
        >>> print('guidelines' in details['prompt'])
        True
    """
    formatted_prompt = self.prompt_template.render(
        value1=html.escape(str(value1)),
        value2=html.escape(str(value2)),
        eval_guidelines=self.eval_guidelines,
    )

    try:
        response = self._invoke_agent(formatted_prompt)
        return {
            "prompt": formatted_prompt,
            "llm_response": response,
            "model_id": self.model,
            "comparison_result": self.compare(value1, value2),
        }
    except Exception as e:
        return {"error": str(e), "comparison_result": False}

stickler.comparators.StructuredModelComparator

Bases: BaseComparator

Comparator for structured model objects.

This comparator is designed to work with StructuredModel instances, leveraging their built-in comparison capabilities.

Source code in stickler/comparators/structured.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
class StructuredModelComparator(BaseComparator):
    """Comparator for structured model objects.

    This comparator is designed to work with StructuredModel instances,
    leveraging their built-in comparison capabilities.
    """

    def __init__(self, threshold: float = 0.7, strict_types: bool = False):
        """Initialize the comparator.

        Args:
            threshold: Similarity threshold (0.0-1.0)
            strict_types: If True, will raise TypeError when non-StructuredModel objects are compared
        """
        super().__init__(threshold)
        self.strict_types = strict_types

    def compare(self, model1: Any, model2: Any) -> float:
        """Compare two structured model instances.

        This method uses the built-in compare method of StructuredModel objects
        if available, otherwise falls back to basic equality comparison.

        Args:
            model1: First model (ideally a StructuredModel instance)
            model2: Second model (ideally a StructuredModel instance)

        Returns:
            Similarity score between 0.0 and 1.0

        Raises:
            TypeError: When strict_types=True and comparing non-StructuredModel objects
        """
        # In strict mode, enforce StructuredModel types (used in tests)
        # For string values, always raise TypeError in strict mode
        if self.strict_types and isinstance(model1, str) and isinstance(model2, str):
            raise TypeError(
                "StructuredModelComparator can only compare StructuredModel instances"
            )

        # Handle None values
        if model1 is None or model2 is None:
            return 1.0 if model1 == model2 else 0.0

        # Check if both objects have a compare method (duck typing)
        if hasattr(model1, "compare") and callable(model1.compare):
            return model1.compare(model2)

        # Fall back to equality check for non-StructuredModel objects
        return 1.0 if model1 == model2 else 0.0

__init__(threshold=0.7, strict_types=False)

Initialize the comparator.

Parameters:

Name Type Description Default
threshold float

Similarity threshold (0.0-1.0)

0.7
strict_types bool

If True, will raise TypeError when non-StructuredModel objects are compared

False
Source code in stickler/comparators/structured.py
15
16
17
18
19
20
21
22
23
def __init__(self, threshold: float = 0.7, strict_types: bool = False):
    """Initialize the comparator.

    Args:
        threshold: Similarity threshold (0.0-1.0)
        strict_types: If True, will raise TypeError when non-StructuredModel objects are compared
    """
    super().__init__(threshold)
    self.strict_types = strict_types

compare(model1, model2)

Compare two structured model instances.

This method uses the built-in compare method of StructuredModel objects if available, otherwise falls back to basic equality comparison.

Parameters:

Name Type Description Default
model1 Any

First model (ideally a StructuredModel instance)

required
model2 Any

Second model (ideally a StructuredModel instance)

required

Returns:

Type Description
float

Similarity score between 0.0 and 1.0

Raises:

Type Description
TypeError

When strict_types=True and comparing non-StructuredModel objects

Source code in stickler/comparators/structured.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def compare(self, model1: Any, model2: Any) -> float:
    """Compare two structured model instances.

    This method uses the built-in compare method of StructuredModel objects
    if available, otherwise falls back to basic equality comparison.

    Args:
        model1: First model (ideally a StructuredModel instance)
        model2: Second model (ideally a StructuredModel instance)

    Returns:
        Similarity score between 0.0 and 1.0

    Raises:
        TypeError: When strict_types=True and comparing non-StructuredModel objects
    """
    # In strict mode, enforce StructuredModel types (used in tests)
    # For string values, always raise TypeError in strict mode
    if self.strict_types and isinstance(model1, str) and isinstance(model2, str):
        raise TypeError(
            "StructuredModelComparator can only compare StructuredModel instances"
        )

    # Handle None values
    if model1 is None or model2 is None:
        return 1.0 if model1 == model2 else 0.0

    # Check if both objects have a compare method (duck typing)
    if hasattr(model1, "compare") and callable(model1.compare):
        return model1.compare(model2)

    # Fall back to equality check for non-StructuredModel objects
    return 1.0 if model1 == model2 else 0.0