Code Style Guide

This guide covers coding conventions and standards for the Stickler project.

Overview

Stickler follows Python best practices with Ruff for linting. The project uses modern Python features (3.12+) including type hints and Pydantic for data validation.

Linting

Ruff

Ruff is the primary linter for the project. The project uses Ruff's default configuration (no custom ruff.toml or [tool.ruff] section in pyproject.toml).

# Install Ruff (if not already installed)
pip install ruff

# Check code for style issues
ruff check .

# Auto-fix issues where possible
ruff check --fix .

# Check specific file or directory
ruff check src/stickler/comparators/

CI Integration

Linting runs automatically on every push and pull request via GitHub Actions. While currently non-blocking, you should address linting issues before submitting PRs.

Naming Conventions

Classes

Use PascalCase for class names:

# Good
class StructuredModel:
    pass

class LevenshteinComparator:
    pass

class NumericComparator:
    pass

# Bad
class structured_model:  # snake_case
    pass

class levenshteincomparator:  # lowercase
    pass

Functions and Methods

Use snake_case for functions and methods:

# Good
def compare_with(self, other: "StructuredModel") -> dict:
    pass

def binary_compare(self, value1: str, value2: str) -> float:
    pass

def get_field_scores(self) -> dict:
    pass

# Bad
def compareWith(self):  # camelCase
    pass

def GetFieldScores(self):  # PascalCase
    pass

Constants

Use UPPER_SNAKE_CASE for constants:

# Good
DEFAULT_THRESHOLD = 0.5
MAX_RECURSION_DEPTH = 100
COMPARISON_TOLERANCE = 1e-6

# Bad
default_threshold = 0.5  # lowercase
DefaultThreshold = 0.5   # PascalCase

Private Members

Use leading underscore for private/internal members:

class StructuredModel:
    # Private attribute
    _comparison_cache: dict

    # Private method
    def _compare_fields(self, other: "StructuredModel") -> dict:
        pass

    # Public method
    def compare_with(self, other: "StructuredModel") -> dict:
        pass

Variables

Use descriptive snake_case names:

# Good
field_scores = {}
comparison_result = model1.compare_with(model2)
total_weight = sum(weights)

# Bad
fs = {}              # Too short, unclear
fieldScores = {}     # camelCase
result = ...         # Too generic

Type Hints

All public APIs should have type hints. The project uses Python 3.12+ typing features.

Basic Type Hints

from typing import Optional, List, Dict, Any

def compare(self, str1: str, str2: str) -> float:
    """Compare two strings and return similarity score."""
    ...

def evaluate(
    self,
    ground_truth: "StructuredModel",
    prediction: "StructuredModel"
) -> Dict[str, Any]:
    """Evaluate prediction against ground truth."""
    ...

Optional Parameters

from typing import Optional

def compare_with(
    self,
    other: "StructuredModel",
    threshold: Optional[float] = None
) -> dict:
    """Compare this model with another.

    Args:
        other: Model to compare against.
        threshold: Optional threshold override.
    """
    ...

Generic Types

from typing import List, Dict, Union

def process_items(
    self,
    items: List["StructuredModel"]
) -> Dict[str, float]:
    ...

def get_value(self) -> Union[str, int, float]:
    ...

Type Aliases

For complex types, use type aliases:

from typing import Dict, List, Any

# Type aliases
FieldScores = Dict[str, float]
ComparisonResult = Dict[str, Any]
ModelList = List["StructuredModel"]

def compare_all(self, models: ModelList) -> List[ComparisonResult]:
    ...

Docstrings

Use Google-style docstrings for all public modules, classes, functions, and methods.

Module Docstring

"""Levenshtein string comparator implementation.

This module provides the LevenshteinComparator class for comparing
strings based on edit distance. It's useful for handling typos and
minor variations in text fields.

Example:
    >>> from stickler.comparators.levenshtein import LevenshteinComparator
    >>> comparator = LevenshteinComparator()
    >>> comparator.compare("hello", "helo")
    0.8
"""

Class Docstring

class LevenshteinComparator(BaseComparator):
    """Comparator using Levenshtein edit distance.

    Calculates string similarity based on the minimum number of
    single-character edits (insertions, deletions, substitutions)
    required to transform one string into another.

    Attributes:
        case_sensitive: Whether to perform case-sensitive comparison.

    Example:
        >>> comparator = LevenshteinComparator()
        >>> comparator.compare("hello", "hello")
        1.0
        >>> comparator.compare("hello", "hallo")
        0.8
    """

Function/Method Docstring

def compare_with(self, other: "StructuredModel") -> dict:
    """Compare this model with another model.

    Performs field-by-field comparison using configured comparators
    and calculates weighted overall score.

    Args:
        other: The model to compare against. Must be same type.

    Returns:
        Dictionary containing:
            - overall_score: Weighted average of field scores (0.0-1.0)
            - field_scores: Dict mapping field names to scores
            - all_fields_matched: Boolean if all fields meet thresholds

    Raises:
        TypeError: If other is not a StructuredModel instance.
        ValueError: If models have incompatible schemas.

    Example:
        >>> model1 = Person(name="John Doe", age=30)
        >>> model2 = Person(name="John Doe", age=30)
        >>> result = model1.compare_with(model2)
        >>> result["overall_score"]
        1.0
    """
    ...

Property Docstring

@property
def field_names(self) -> List[str]:
    """List of comparable field names.

    Returns:
        List of field names that have ComparableField descriptors.
    """
    ...

Import Organization

Organize imports in three groups, separated by blank lines:

Standard library imports
Third-party package imports
Local application imports

# Standard library
import json
from typing import List, Optional, Dict, Any

# Third-party packages
import pytest
from pydantic import Field, BaseModel

# Local imports
from stickler.structured_object_evaluator import StructuredModel
from stickler.structured_object_evaluator.models.comparable_field import ComparableField
from stickler.comparators.levenshtein import LevenshteinComparator

Import Guidelines

Use absolute imports for clarity
Import specific items rather than entire modules when practical
Avoid wildcard imports (from module import *)

# Good
from stickler.comparators.levenshtein import LevenshteinComparator

# Acceptable for commonly used items
from typing import List, Optional, Dict

# Avoid
from stickler.comparators import *

Code Organization

File Structure

Typical module structure:

"""Module docstring describing purpose."""

# Imports (organized as above)
import ...

# Constants
DEFAULT_THRESHOLD = 0.5

# Type aliases
FieldScores = Dict[str, float]

# Helper functions (private)
def _helper_function():
    ...

# Classes
class MainClass:
    ...

# Module-level functions (if any)
def main_function():
    ...

Class Structure

Organize class members in this order:

class MyClass:
    """Class docstring."""

    # Class attributes
    default_value: ClassVar[int] = 0

    # Instance attributes (for dataclasses/pydantic)
    name: str
    value: int

    # __init__ (if not using dataclass)
    def __init__(self, name: str, value: int):
        ...

    # Properties
    @property
    def computed_value(self) -> int:
        ...

    # Public methods
    def public_method(self) -> None:
        ...

    # Private methods
    def _private_helper(self) -> None:
        ...

    # Magic methods
    def __repr__(self) -> str:
        ...

Error Handling

Exception Types

Use appropriate exception types:

# Type errors
if not isinstance(other, StructuredModel):
    raise TypeError(f"Expected StructuredModel, got {type(other)}")

# Value errors
if threshold < 0 or threshold > 1:
    raise ValueError(f"Threshold must be between 0 and 1, got {threshold}")

# Custom exceptions (when needed)
class ComparisonError(Exception):
    """Raised when comparison cannot be performed."""
    pass

Exception Messages

Provide clear, actionable error messages:

# Good
raise ValueError(
    f"Threshold must be between 0.0 and 1.0, got {threshold}. "
    f"Consider using a value like 0.8 for fuzzy matching."
)

# Bad
raise ValueError("Invalid threshold")

Comments

When to Comment

Explain why, not what (code shows what)
Document non-obvious business logic
Note workarounds or TODOs

# Good - explains why
# Hungarian algorithm requires O(n^3) time, so we limit list size
# to prevent performance issues with large datasets
if len(items) > MAX_LIST_SIZE:
    items = items[:MAX_LIST_SIZE]

# Bad - states the obvious
# Increment counter
counter += 1

TODO Comments

Use consistent format:

# TODO: Implement caching for expensive comparisons
# TODO(username): Fix edge case with empty strings
# FIXME: This workaround should be removed after v2.0

Best Practices Summary

Be consistent - Follow existing patterns in the codebase
Be explicit - Use type hints and clear names
Be concise - Avoid unnecessary complexity
Document public APIs - Docstrings for all public interfaces
Handle errors gracefully - Clear error messages
Test your code - See the Testing Guide