FACTSlab · aaronstevenwhite · Sep 30, 2025 · Sep 27, 2025 · Sep 28, 2025 · Sep 28, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,102 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.2.0] - 2025-09-30
+
+### Added
+
+#### Symbol Parsing System
+- **Symbol parsers** for all four linguistic resources (FrameNet, PropBank, VerbNet, WordNet)
+- **Structured symbol extraction** for parsing and normalizing entity identifiers
+- **Type-safe parsed symbol representations** using TypedDict patterns
+- **Symbol parser documentation** - Complete API documentation for all symbol parser modules
+- **Symbol parser caching** - LRU cache decorators on all parsing functions for better performance
+- Support for parsing complex symbols like ARG1-PPT, ?Theme_i, Core[Agent]
+
+#### Fuzzy Search and Matching
+- **Fuzzy search capability** with Levenshtein distance-based matching
+- **Configurable similarity thresholds** for controlling match precision
+- **Multi-field fuzzy matching** across names, descriptions, and identifiers
+- **Search result ranking** - New ranking module for scoring search results by match type and field relevance
+- **Batch search methods** - `batch_by_lemma` method in UnifiedSearch for processing multiple queries
+- `--fuzzy` flag in CLI commands with `--threshold` parameter
+- `search_with_fuzzy()` method in UnifiedSearch and dataset-specific search classes
+
+#### Syntax-Based Search
+- **Unified syntax patterns** for searching by syntactic structure
+- **Hierarchical pattern matching** where general patterns match specific subtypes
+- **Syntax parser** for converting string patterns to unified format
+- **Support for wildcards** and optional elements in patterns
+- New CLI command: `glazing search syntax`
+- `search_by_syntax()` method in UnifiedSearch class
+
+#### Cross-Reference Enhancements
+- **Automatic cross-reference extraction** on first use with progress indicators
+- **Fuzzy resolution** for cross-references with typo tolerance
+- **Confidence scoring** for mapping quality (0.0 to 1.0 scale)
+- **Transitive mapping support** for indirect relationships
+- **Reverse lookup capabilities** for bidirectional navigation
+- New CLI commands: `glazing xref resolve`, `glazing xref extract`, `glazing xref clear-cache`
+
+#### Structured Role/Argument Search
+- **Property-based role search** for VerbNet thematic roles (optional, required, etc.)
+- **Argument type filtering** for PropBank arguments (ARGM-LOC, ARGM-TMP, etc.)
+- **Frame element search** by core type in FrameNet
+- Support for complex queries with multiple property filters
+
+#### Docker Support
+- **Dockerfile** for containerized usage without local installation
+- Full CLI exposed through Docker container
+- Volume support for persistent data storage
+- Docker Compose configuration example
+- Interactive Python session support via container
+
+#### CLI Improvements
+- `--json` output mode for all search and xref commands
+- `--progress` flag for long-running operations
+- `--force` flag for cache clearing and re-extraction
+- Better error messages with actionable suggestions
+- Support for batch operations
+
+### Changed
+
+#### Type System Improvements
+- Expanded `ArgumentNumber` type to include all modifier patterns (M-LOC, M-TMP, etc.)
+- Added "C" and "R" prefixes to `FunctionTag` for continuation/reference support
+- Stricter validation for `ThematicRoleType` with proper indexed variants
+- More precise TypedDict definitions for parsed symbols
+
+#### API Refinements
+- `CrossReferenceIndex` now supports fuzzy matching in `resolve()` method
+- `UnifiedSearch` class (renamed from `Search` for clarity)
+- Consistent `None` returns for missing values (not empty strings or -1)
+- Better separation of concerns between extraction, mapping, and resolution
+
+### Fixed
+
+- **CacheBase abstract methods** now have default implementations instead of NotImplementedError
+- **VerbNet class ID generation** now uses deterministic pattern-based generation instead of hash-based fallback
+- **Backward compatibility code removed** from PropBank symbol parser - no longer checks for argnum attribute
+- **Legacy MappingSource removed** - "legacy" value no longer accepted in types
+- **Documentation language** - removed promotional terms from fuzzy-match.md
+- **Test compatibility** - Fixed PropBank symbol parser tests to work without backward compatibility
+- PropBank `ArgumentNumber` type corrected to match actual data (removed invalid values like "7", "M-ADJ")
+- ARGA argument in PropBank now correctly handled with proper arg_number value
+- VerbNet member `verbnet_key` validation fixed to require proper format (e.g., "give#1")
+- ThematicRole validation properly handles indexed role types (Patient_i, Theme_j)
+- Import paths corrected for UnifiedSearch class
+- Modifier type extraction returns `None` for non-modifiers consistently
+- Frame element parsing handles abbreviations correctly
+- Test fixtures updated to use correct data models and validation rules
+
+### Technical Improvements
+
+- Full mypy strict mode compliance across all modules
+- Comprehensive test coverage for new symbol parsing features
+- Performance optimizations for fuzzy matching with large datasets
+- Better memory management for cross-reference extraction
+- Caching improvements for repeated fuzzy searches
+
 ## [0.1.1] - 2025-09-27
 
 ### Fixed
@@ -29,7 +125,7 @@ Initial release of `glazing`, a package containing unified data models and inter
 - **Unified data models** for all four linguistic resources using Pydantic v2
 - **One-command initialization** with `glazing init` to download and convert all datasets
 - **JSON Lines format** for efficient storage and streaming of large datasets
-- **Type-safe interfaces** with comprehensive type hints for Python 3.13+
+- **Type-safe interfaces** with comprehensive type hints using Python 3.13+ conventions
 - **Cross-reference resolution** between FrameNet, PropBank, VerbNet, and WordNet
 - **Memory-efficient streaming** support for processing large datasets
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -31,7 +31,7 @@ glazing init
 
 ## Code Style
 
-We use `ruff` for code quality:
+We use `ruff` for code quality and `mypy` for type checking:
 
 ```bash
 # Format code
@@ -40,23 +40,34 @@ ruff format src/ tests/
 # Lint code
 ruff check src/ tests/
 
-# Type checking
+# Type checking (strict mode required)
 mypy --strict src/
 ```
 
 ## Testing
 
 ```bash
-# Run all tests
-pytest
+# Run all tests with verbose output
+pytest tests/ -v
 
 # Run with coverage
-pytest --cov=glazing
+pytest tests/ -v --cov=src/glazing --cov-report=term-missing
 
-# Run specific test
-pytest tests/test_verbnet/
+# Run specific test module
+pytest tests/test_verbnet/test_models.py -v
+
+# Run specific test with debugging output
+pytest tests/test_base.py::TestBaseModel::test_model_validation -xvs
 ```
 
+### Testing Requirements
+
+- All new features must have tests
+- Tests should cover edge cases and error conditions
+- Use descriptive test names that explain what is being tested
+- Mock external dependencies and file I/O where appropriate
+- Maintain or improve code coverage (aim for >90%)
+
 ## Documentation
 
 ```bash

diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,47 @@
+# Use official Python 3.13 slim image as base
+FROM python:3.13-slim
+
+# Set working directory
+WORKDIR /app
+
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PIP_DISABLE_PIP_VERSION_CHECK=1
+
+# Install system dependencies required for building packages
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc \
+    g++ \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy only requirements first to leverage Docker cache
+COPY pyproject.toml README.md ./
+COPY src/glazing/__version__.py src/glazing/
+
+# Install package dependencies
+RUN pip install --upgrade pip && \
+    pip install -e .
+
+# Copy the rest of the application code
+COPY src/ src/
+COPY tests/ tests/
+
+# Create data directory for datasets
+RUN mkdir -p /data
+
+# Set environment variable for data directory
+ENV GLAZING_DATA_DIR=/data
+
+# Initialize datasets during build
+RUN glazing init --data-dir /data
+
+# Expose data directory as volume
+VOLUME ["/data"]
+
+# Set the entrypoint to the glazing CLI
+ENTRYPOINT ["glazing"]
+
+# Default command shows help
+CMD ["--help"]
diff --git a/README.md b/README.md
@@ -14,16 +14,43 @@ Unified data models and interfaces for syntactic and semantic frame ontologies.
 - 🚀 **One-command setup**: `glazing init` downloads and prepares all datasets
 - 📦 **Type-safe models**: Pydantic v2 validation for all data structures
 - 🔍 **Unified search**: Query across all datasets with consistent API
-- 🔗 **Cross-references**: Automatic mapping between resources
+- 🔗 **Cross-references**: Automatic mapping between resources with confidence scores
+- 🎯 **Fuzzy search**: Find matches even with typos or partial queries
+- 🐳 **Docker support**: Use via Docker without local installation
 - 💾 **Efficient storage**: JSON Lines format with streaming support
 - 🐍 **Modern Python**: Full type hints, Python 3.13+ support
 
 ## Installation
 
+### Via pip
+
 ```bash
 pip install glazing
 ```
 
+### Via Docker
+
+Build and run Glazing in a containerized environment:
+
+```bash
+# Build the image
+git clone https://github.com/aaronstevenwhite/glazing.git
+cd glazing
+docker build -t glazing:latest .
+
+# Initialize datasets (persisted in volume)
+docker run --rm -v glazing-data:/data glazing:latest init
+
+# Use the CLI
+docker run --rm -v glazing-data:/data glazing:latest search query "give"
+docker run --rm -v glazing-data:/data glazing:latest search query "transfer" --fuzzy
+
+# Interactive Python session
+docker run --rm -it -v glazing-data:/data --entrypoint python glazing:latest
+```
+
+See the [installation docs](https://glazing.readthedocs.io/en/latest/installation/#docker-installation) for more Docker usage examples.
+
 ## Quick Start
 
 Initialize all datasets (one-time setup, ~54MB download):
@@ -56,8 +83,23 @@ glazing search query "abandon"
 # Search specific dataset
 glazing search query "run" --dataset verbnet
 
+# Use fuzzy search for typos
+glazing search query "giv" --fuzzy
+glazing search query "instrment" --fuzzy --threshold 0.7
+```
+
+Resolve cross-references:
+
+```bash
+# Extract cross-reference index (one-time setup)
+glazing xref extract
+
 # Find cross-references
-glazing search cross-ref --source propbank --id "give.01" --target verbnet
+glazing xref resolve "give.01" --source propbank
+glazing xref resolve "give-13.1" --source verbnet
+
+# Use fuzzy matching
+glazing xref resolve "giv.01" --source propbank --fuzzy
 ```
 
 ## Python API
@@ -79,24 +121,32 @@ verb_classes = list(vn_loader.classes.values())
 Cross-reference resolution:
 
 ```python
-from glazing.references.extractor import ReferenceExtractor
-from glazing.verbnet.loader import VerbNetLoader
-from glazing.propbank.loader import PropBankLoader
-
-# Load datasets
-vn_loader = VerbNetLoader()
-pb_loader = PropBankLoader()
-
-# Extract references
-extractor = ReferenceExtractor()
-extractor.extract_verbnet_references(list(vn_loader.classes.values()))
-extractor.extract_propbank_references(list(pb_loader.framesets.values()))
-
-# Access PropBank cross-references
-if "give.01" in extractor.propbank_refs:
-    refs = extractor.propbank_refs["give.01"]
-    vn_classes = refs.get_verbnet_classes()
-    print(f"VerbNet classes for give.01: {vn_classes}")
+from glazing.references.index import CrossReferenceIndex
+
+# Automatic extraction on first use (cached for future runs)
+xref = CrossReferenceIndex()
+
+# Resolve references for a PropBank roleset
+refs = xref.resolve("give.01", source="propbank")
+print(f"VerbNet classes: {refs['verbnet_classes']}")
+print(f"Confidence scores: {refs['confidence_scores']}")
+
+# Use fuzzy matching for typos
+refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
+print(f"Found match with fuzzy search: {refs['verbnet_classes']}")
+```
+
+Fuzzy search in Python:
+
+```python
+from glazing.search import UnifiedSearch
+
+# Use fuzzy search to handle typos
+search = UnifiedSearch()
+results = search.search_with_fuzzy("instrment", fuzzy_threshold=0.8)
+
+for result in results[:5]:
+    print(f"{result.dataset}: {result.name} (score: {result.score:.2f})")
 ```
 
 ## Supported Datasets

diff --git a/docs/api/framenet/symbol-parser.md b/docs/api/framenet/symbol-parser.md
@@ -0,0 +1,5 @@
+# glazing.framenet.symbol_parser
+
+FrameNet symbol parsing utilities for frame and frame element names.
+
+::: glazing.framenet.symbol_parser
diff --git a/docs/api/index.md b/docs/api/index.md
@@ -118,7 +118,7 @@ except ValidationError as e:
 
 ## Version Compatibility
 
-This documentation covers Glazing version 0.1.1. Check your installed version:
+This documentation covers Glazing version 0.2.0. Check your installed version:
 
 ```python
 import glazing

diff --git a/docs/api/propbank/symbol-parser.md b/docs/api/propbank/symbol-parser.md
@@ -0,0 +1,5 @@
+# glazing.propbank.symbol_parser
+
+PropBank symbol parsing utilities for roleset IDs and argument labels.
+
+::: glazing.propbank.symbol_parser