FACTSlab · aaronstevenwhite · Oct 28, 2025 · Oct 28, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -9,6 +9,7 @@ repos:
       - id: check-added-large-files
       - id: check-merge-conflict
       - id: debug-statements
+        language_version: python3.13
       - id: mixed-line-ending
 
   - repo: local

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.2.1] - 2025-10-28
+
+### Fixed
+
+- **FrameNet lexical units now properly loaded during conversion**
+  - Lexical units are now parsed from `luIndex.xml` during frame conversion
+  - All frames now include their associated lexical units with complete metadata
+  - Fixes critical data completeness issue where `frame.lexical_units` was always empty
+  - Enables querying frames by lexical unit name via the frame index
+  - Approximately 13,500 lexical units now correctly associated with their frames
+
 ## [0.2.0] - 2025-09-30
 
 ### Added
@@ -20,7 +31,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Support for parsing complex symbols like ARG1-PPT, ?Theme_i, Core[Agent]
 
 #### Fuzzy Search and Matching
-- **Fuzzy search capability** with Levenshtein distance-based matching
+- **Fuzzy search capability** with Levenshtein distance-based matching to find data with typos, morphological variants, and spelling inconsistencies
 - **Configurable similarity thresholds** for controlling match precision
 - **Multi-field fuzzy matching** across names, descriptions, and identifiers
 - **Search result ranking** - New ranking module for scoring search results by match type and field relevance
@@ -186,7 +197,8 @@ Initial release of `glazing`, a package containing unified data models and inter
 - `tqdm >= 4.60.0` (progress bars)
 - `rich >= 13.0.0` (CLI formatting)
 
-[Unreleased]: https://github.com/aaronstevenwhite/glazing/compare/v0.2.0...HEAD
+[Unreleased]: https://github.com/aaronstevenwhite/glazing/compare/v0.2.1...HEAD
+[0.2.1]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.2.1
 [0.2.0]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.2.0
 [0.1.1]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.1.1
 [0.1.0]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.1.0
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ Unified data models and interfaces for syntactic and semantic frame ontologies.
 - 📦 **Type-safe models**: Pydantic v2 validation for all data structures
 - 🔍 **Unified search**: Query across all datasets with consistent API
 - 🔗 **Cross-references**: Automatic mapping between resources with confidence scores
-- 🎯 **Fuzzy search**: Find matches even with typos or partial queries
+- 🎯 **Fuzzy search**: Find data with typos, spelling variants, and inconsistencies
 - 🐳 **Docker support**: Use via Docker without local installation
 - 💾 **Efficient storage**: JSON Lines format with streaming support
 - 🐍 **Modern Python**: Full type hints, Python 3.13+ support
@@ -83,9 +83,9 @@ glazing search query "abandon"
 # Search specific dataset
 glazing search query "run" --dataset verbnet
 
-# Use fuzzy search for typos
-glazing search query "giv" --fuzzy
-glazing search query "instrment" --fuzzy --threshold 0.7
+# Find data with typos or spelling variants
+glazing search query "realize" --fuzzy
+glazing search query "organize" --fuzzy --threshold 0.8
 ```
 
 Resolve cross-references:
@@ -98,8 +98,8 @@ glazing xref extract
 glazing xref resolve "give.01" --source propbank
 glazing xref resolve "give-13.1" --source verbnet
 
-# Use fuzzy matching
-glazing xref resolve "giv.01" --source propbank --fuzzy
+# Find data with variations or inconsistencies
+glazing xref resolve "realize.01" --source propbank --fuzzy
 ```
 
 ## Python API
@@ -131,8 +131,8 @@ refs = xref.resolve("give.01", source="propbank")
 print(f"VerbNet classes: {refs['verbnet_classes']}")
 print(f"Confidence scores: {refs['confidence_scores']}")
 
-# Use fuzzy matching for typos
-refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
+# Find data with variations or inconsistencies
+refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
 print(f"Found match with fuzzy search: {refs['verbnet_classes']}")
 ```
 
@@ -141,9 +141,9 @@ Fuzzy search in Python:
 ```python
 from glazing.search import UnifiedSearch
 
-# Use fuzzy search to handle typos
+# Find data with typos or spelling variants
 search = UnifiedSearch()
-results = search.search_with_fuzzy("instrment", fuzzy_threshold=0.8)
+results = search.search_with_fuzzy("organize", fuzzy_threshold=0.8)
 
 for result in results[:5]:
     print(f"{result.dataset}: {result.name} (score: {result.score:.2f})")

diff --git a/docs/api/index.md b/docs/api/index.md
@@ -118,7 +118,7 @@ except ValidationError as e:
 
 ## Version Compatibility
 
-This documentation covers Glazing version 0.2.0. Check your installed version:
+This documentation covers Glazing version 0.2.1. Check your installed version:
 
 ```python
 import glazing

diff --git a/docs/api/references/index.md b/docs/api/references/index.md
@@ -18,8 +18,8 @@ xref = CrossReferenceIndex()
 refs = xref.resolve("give.01", source="propbank")
 print(refs["verbnet_classes"])  # ['give-13.1']
 
-# Use fuzzy matching for typos
-refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
+# Find data with variations or inconsistencies
+refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
 ```
 
 ## Main Classes
@@ -53,7 +53,7 @@ class CrossReferenceIndex(
 
 - **Automatic Extraction**: References are extracted automatically on first use
 - **Caching**: Extracted references are cached for fast subsequent loads
-- **Fuzzy Matching**: Handle typos and variations with configurable thresholds
+- **Fuzzy Matching**: Find data with typos, morphological variants, and spelling inconsistencies
 - **Confidence Scores**: All mappings include confidence scores
 - **Progress Indicators**: Visual feedback during extraction
 

diff --git a/docs/api/utils/fuzzy-match.md b/docs/api/utils/fuzzy-match.md
@@ -4,7 +4,7 @@ Fuzzy string matching utilities using Levenshtein distance.
 
 ## Overview
 
-The fuzzy_match module provides functions for fuzzy string matching using Levenshtein distance and other similarity metrics. It includes text normalization and caching for performance.
+The fuzzy_match module provides functions for fuzzy string matching using Levenshtein distance and other similarity metrics. It includes text normalization and caching for performance. The primary use case is finding data that contains typos, morphological variants, or spelling inconsistencies in the underlying datasets.
 
 ## Functions
 

diff --git a/docs/citation.md b/docs/citation.md
@@ -12,22 +12,22 @@ If you use Glazing in your research, please cite our work.
   title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
   year = {2025},
   url = {https://github.com/aaronstevenwhite/glazing},
-  version = {0.2.0},
+  version = {0.2.1},
   doi = {10.5281/zenodo.17185626}
 }
 ```
 
 ### APA
 
-White, A. S. (2025). *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies* (Version 0.2.0) [Computer software]. https://github.com/aaronstevenwhite/glazing
+White, A. S. (2025). *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies* (Version 0.2.1) [Computer software]. https://github.com/aaronstevenwhite/glazing
 
 ### Chicago
 
-White, Aaron Steven. 2025. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.0. https://github.com/aaronstevenwhite/glazing.
+White, Aaron Steven. 2025. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.1. https://github.com/aaronstevenwhite/glazing.
 
 ### MLA
 
-White, Aaron Steven. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.0, 2025, https://github.com/aaronstevenwhite/glazing.
+White, Aaron Steven. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.1, 2025, https://github.com/aaronstevenwhite/glazing.
 
 ## Citing Datasets
 

diff --git a/docs/index.md b/docs/index.md
@@ -93,7 +93,7 @@ If you use Glazing in your research, please cite:
   title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
   year = {2025},
   url = {https://github.com/aaronstevenwhite/glazing},
-  version = {0.2.0},
+  version = {0.2.1},
   doi = {10.5281/zenodo.17185626}
 }
 ```

diff --git a/docs/installation.md b/docs/installation.md
@@ -168,8 +168,8 @@ docker run --rm -v glazing-data:/data glazing:latest init
 # Search across datasets
 docker run --rm -v glazing-data:/data glazing:latest search query "give"
 
-# Search with fuzzy matching
-docker run --rm -v glazing-data:/data glazing:latest search query "giv" --fuzzy
+# Find data with variations using fuzzy matching
+docker run --rm -v glazing-data:/data glazing:latest search query "realize" --fuzzy
 
 # Extract cross-references
 docker run --rm -v glazing-data:/data glazing:latest xref extract

diff --git a/docs/quick-start.md b/docs/quick-start.md
@@ -81,8 +81,8 @@ refs = xref.resolve("give.01", source="propbank")
 print(f"VerbNet classes: {refs['verbnet_classes']}")
 print(f"Confidence scores: {refs['confidence_scores']}")
 
-# Use fuzzy matching for typos
-refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
+# Find data with variations or inconsistencies
+refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
 print(f"VerbNet classes: {refs['verbnet_classes']}")
 ```
 

diff --git a/docs/user-guide/cli.md b/docs/user-guide/cli.md
@@ -38,15 +38,15 @@ glazing search query "give" --limit 10 --json
 
 ### Fuzzy Search
 
-Use fuzzy matching to find results even with typos or partial matches:
+Use fuzzy matching to find data with typos, morphological variants, or spelling inconsistencies:
 
 ```bash
-# Find matches for typos
-glazing search query "giv" --fuzzy
-glazing search query "instrment" --fuzzy --threshold 0.7
+# Find data with variations
+glazing search query "realize" --fuzzy
+glazing search query "organize" --fuzzy --threshold 0.8
 
 # Adjust the threshold (0.0-1.0, higher is stricter)
-glazing search query "runing" --fuzzy --threshold 0.85
+glazing search query "analyze" --fuzzy --threshold 0.85
 ```
 
 ### Syntactic Pattern Search
@@ -110,9 +110,9 @@ Find mappings between datasets:
 glazing xref resolve "give.01" --source propbank
 glazing xref resolve "give-13.1" --source verbnet
 
-# Use fuzzy matching for typos
-glazing xref resolve "giv.01" --source propbank --fuzzy
-glazing xref resolve "transfer-11.1" --source verbnet --fuzzy --threshold 0.8
+# Find data with variations or inconsistencies
+glazing xref resolve "realize.01" --source propbank --fuzzy
+glazing xref resolve "organize-74" --source verbnet --fuzzy --threshold 0.8
 
 # Get JSON output
 glazing xref resolve "Giving" --source framenet --json

diff --git a/docs/user-guide/cross-references.md b/docs/user-guide/cross-references.md
@@ -30,8 +30,8 @@ refs = xref.resolve("give.01", source="propbank")
 print(f"VerbNet classes: {refs['verbnet_classes']}")
 print(f"Confidence scores: {refs['confidence_scores']}")
 
-# Use fuzzy matching for typos
-refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
+# Find data with variations or inconsistencies
+refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
 print(f"VerbNet classes: {refs['verbnet_classes']}")
 ```
 
@@ -93,13 +93,13 @@ xref.clear_cache()
 
 ### Fuzzy Matching
 
-The system supports fuzzy matching for handling typos and variations:
+The system supports fuzzy matching for finding data with typos, morphological variants, and spelling inconsistencies:
 
 ```python
-# Find matches even with typos
-refs = xref.resolve("transferr.01", source="propbank", fuzzy=True, threshold=0.7)
+# Find data with variations
+refs = xref.resolve("organize.01", source="propbank", fuzzy=True, threshold=0.8)
 
-# The system will find "transfer.01" and return its references
+# The system will find variants if they exist and return their references
 ```
 
 ### Confidence Scores
@@ -111,4 +111,4 @@ All mappings include confidence scores based on:
 
 ## Limitations
 
-Cross-references in these datasets are incomplete and sometimes approximate. VerbNet members don't always have WordNet mappings. PropBank rolesets may lack VerbNet mappings. The quality and coverage of references varies between dataset pairs. Fuzzy matching can occasionally produce false positives at lower thresholds.
+Cross-references in these datasets are incomplete and sometimes approximate. VerbNet members don't always have WordNet mappings. PropBank rolesets may lack VerbNet mappings. The quality and coverage of references varies between dataset pairs. The datasets themselves may contain typos or morphological variants, which fuzzy matching helps to address. Fuzzy matching can occasionally produce false positives at lower thresholds.