Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ repos:
- id: check-added-large-files
- id: check-merge-conflict
- id: debug-statements
language_version: python3.13
- id: mixed-line-ending

- repo: local
Expand Down
16 changes: 14 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.2.1] - 2025-10-28

### Fixed

- **FrameNet lexical units now properly loaded during conversion**
- Lexical units are now parsed from `luIndex.xml` during frame conversion
- All frames now include their associated lexical units with complete metadata
- Fixes critical data completeness issue where `frame.lexical_units` was always empty
- Enables querying frames by lexical unit name via the frame index
- Approximately 13,500 lexical units now correctly associated with their frames

## [0.2.0] - 2025-09-30

### Added
Expand All @@ -20,7 +31,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Support for parsing complex symbols like ARG1-PPT, ?Theme_i, Core[Agent]

#### Fuzzy Search and Matching
- **Fuzzy search capability** with Levenshtein distance-based matching
- **Fuzzy search capability** with Levenshtein distance-based matching to find data with typos, morphological variants, and spelling inconsistencies
- **Configurable similarity thresholds** for controlling match precision
- **Multi-field fuzzy matching** across names, descriptions, and identifiers
- **Search result ranking** - New ranking module for scoring search results by match type and field relevance
Expand Down Expand Up @@ -186,7 +197,8 @@ Initial release of `glazing`, a package containing unified data models and inter
- `tqdm >= 4.60.0` (progress bars)
- `rich >= 13.0.0` (CLI formatting)

[Unreleased]: https://github.com/aaronstevenwhite/glazing/compare/v0.2.0...HEAD
[Unreleased]: https://github.com/aaronstevenwhite/glazing/compare/v0.2.1...HEAD
[0.2.1]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.2.1
[0.2.0]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.2.0
[0.1.1]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.1.1
[0.1.0]: https://github.com/aaronstevenwhite/glazing/releases/tag/v0.1.0
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Unified data models and interfaces for syntactic and semantic frame ontologies.
- 📦 **Type-safe models**: Pydantic v2 validation for all data structures
- 🔍 **Unified search**: Query across all datasets with consistent API
- 🔗 **Cross-references**: Automatic mapping between resources with confidence scores
- 🎯 **Fuzzy search**: Find matches even with typos or partial queries
- 🎯 **Fuzzy search**: Find data with typos, spelling variants, and inconsistencies
- 🐳 **Docker support**: Use via Docker without local installation
- 💾 **Efficient storage**: JSON Lines format with streaming support
- 🐍 **Modern Python**: Full type hints, Python 3.13+ support
Expand Down Expand Up @@ -83,9 +83,9 @@ glazing search query "abandon"
# Search specific dataset
glazing search query "run" --dataset verbnet

# Use fuzzy search for typos
glazing search query "giv" --fuzzy
glazing search query "instrment" --fuzzy --threshold 0.7
# Find data with typos or spelling variants
glazing search query "realize" --fuzzy
glazing search query "organize" --fuzzy --threshold 0.8
```

Resolve cross-references:
Expand All @@ -98,8 +98,8 @@ glazing xref extract
glazing xref resolve "give.01" --source propbank
glazing xref resolve "give-13.1" --source verbnet

# Use fuzzy matching
glazing xref resolve "giv.01" --source propbank --fuzzy
# Find data with variations or inconsistencies
glazing xref resolve "realize.01" --source propbank --fuzzy
```

## Python API
Expand Down Expand Up @@ -131,8 +131,8 @@ refs = xref.resolve("give.01", source="propbank")
print(f"VerbNet classes: {refs['verbnet_classes']}")
print(f"Confidence scores: {refs['confidence_scores']}")

# Use fuzzy matching for typos
refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
# Find data with variations or inconsistencies
refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
print(f"Found match with fuzzy search: {refs['verbnet_classes']}")
```

Expand All @@ -141,9 +141,9 @@ Fuzzy search in Python:
```python
from glazing.search import UnifiedSearch

# Use fuzzy search to handle typos
# Find data with typos or spelling variants
search = UnifiedSearch()
results = search.search_with_fuzzy("instrment", fuzzy_threshold=0.8)
results = search.search_with_fuzzy("organize", fuzzy_threshold=0.8)

for result in results[:5]:
print(f"{result.dataset}: {result.name} (score: {result.score:.2f})")
Expand Down
2 changes: 1 addition & 1 deletion docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ except ValidationError as e:

## Version Compatibility

This documentation covers Glazing version 0.2.0. Check your installed version:
This documentation covers Glazing version 0.2.1. Check your installed version:

```python
import glazing
Expand Down
6 changes: 3 additions & 3 deletions docs/api/references/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ xref = CrossReferenceIndex()
refs = xref.resolve("give.01", source="propbank")
print(refs["verbnet_classes"]) # ['give-13.1']

# Use fuzzy matching for typos
refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
# Find data with variations or inconsistencies
refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
```

## Main Classes
Expand Down Expand Up @@ -53,7 +53,7 @@ class CrossReferenceIndex(

- **Automatic Extraction**: References are extracted automatically on first use
- **Caching**: Extracted references are cached for fast subsequent loads
- **Fuzzy Matching**: Handle typos and variations with configurable thresholds
- **Fuzzy Matching**: Find data with typos, morphological variants, and spelling inconsistencies
- **Confidence Scores**: All mappings include confidence scores
- **Progress Indicators**: Visual feedback during extraction

Expand Down
2 changes: 1 addition & 1 deletion docs/api/utils/fuzzy-match.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Fuzzy string matching utilities using Levenshtein distance.

## Overview

The fuzzy_match module provides functions for fuzzy string matching using Levenshtein distance and other similarity metrics. It includes text normalization and caching for performance.
The fuzzy_match module provides functions for fuzzy string matching using Levenshtein distance and other similarity metrics. It includes text normalization and caching for performance. The primary use case is finding data that contains typos, morphological variants, or spelling inconsistencies in the underlying datasets.

## Functions

Expand Down
8 changes: 4 additions & 4 deletions docs/citation.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,22 @@ If you use Glazing in your research, please cite our work.
title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
year = {2025},
url = {https://github.com/aaronstevenwhite/glazing},
version = {0.2.0},
version = {0.2.1},
doi = {10.5281/zenodo.17185626}
}
```

### APA

White, A. S. (2025). *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies* (Version 0.2.0) [Computer software]. https://github.com/aaronstevenwhite/glazing
White, A. S. (2025). *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies* (Version 0.2.1) [Computer software]. https://github.com/aaronstevenwhite/glazing

### Chicago

White, Aaron Steven. 2025. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.0. https://github.com/aaronstevenwhite/glazing.
White, Aaron Steven. 2025. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.1. https://github.com/aaronstevenwhite/glazing.

### MLA

White, Aaron Steven. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.0, 2025, https://github.com/aaronstevenwhite/glazing.
White, Aaron Steven. *Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies*. Version 0.2.1, 2025, https://github.com/aaronstevenwhite/glazing.

## Citing Datasets

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ If you use Glazing in your research, please cite:
title = {Glazing: Unified Data Models and Interfaces for Syntactic and Semantic Frame Ontologies},
year = {2025},
url = {https://github.com/aaronstevenwhite/glazing},
version = {0.2.0},
version = {0.2.1},
doi = {10.5281/zenodo.17185626}
}
```
Expand Down
4 changes: 2 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,8 @@ docker run --rm -v glazing-data:/data glazing:latest init
# Search across datasets
docker run --rm -v glazing-data:/data glazing:latest search query "give"

# Search with fuzzy matching
docker run --rm -v glazing-data:/data glazing:latest search query "giv" --fuzzy
# Find data with variations using fuzzy matching
docker run --rm -v glazing-data:/data glazing:latest search query "realize" --fuzzy

# Extract cross-references
docker run --rm -v glazing-data:/data glazing:latest xref extract
Expand Down
4 changes: 2 additions & 2 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ refs = xref.resolve("give.01", source="propbank")
print(f"VerbNet classes: {refs['verbnet_classes']}")
print(f"Confidence scores: {refs['confidence_scores']}")

# Use fuzzy matching for typos
refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
# Find data with variations or inconsistencies
refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
print(f"VerbNet classes: {refs['verbnet_classes']}")
```

Expand Down
16 changes: 8 additions & 8 deletions docs/user-guide/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,15 @@ glazing search query "give" --limit 10 --json

### Fuzzy Search

Use fuzzy matching to find results even with typos or partial matches:
Use fuzzy matching to find data with typos, morphological variants, or spelling inconsistencies:

```bash
# Find matches for typos
glazing search query "giv" --fuzzy
glazing search query "instrment" --fuzzy --threshold 0.7
# Find data with variations
glazing search query "realize" --fuzzy
glazing search query "organize" --fuzzy --threshold 0.8

# Adjust the threshold (0.0-1.0, higher is stricter)
glazing search query "runing" --fuzzy --threshold 0.85
glazing search query "analyze" --fuzzy --threshold 0.85
```

### Syntactic Pattern Search
Expand Down Expand Up @@ -110,9 +110,9 @@ Find mappings between datasets:
glazing xref resolve "give.01" --source propbank
glazing xref resolve "give-13.1" --source verbnet

# Use fuzzy matching for typos
glazing xref resolve "giv.01" --source propbank --fuzzy
glazing xref resolve "transfer-11.1" --source verbnet --fuzzy --threshold 0.8
# Find data with variations or inconsistencies
glazing xref resolve "realize.01" --source propbank --fuzzy
glazing xref resolve "organize-74" --source verbnet --fuzzy --threshold 0.8

# Get JSON output
glazing xref resolve "Giving" --source framenet --json
Expand Down
14 changes: 7 additions & 7 deletions docs/user-guide/cross-references.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ refs = xref.resolve("give.01", source="propbank")
print(f"VerbNet classes: {refs['verbnet_classes']}")
print(f"Confidence scores: {refs['confidence_scores']}")

# Use fuzzy matching for typos
refs = xref.resolve("giv.01", source="propbank", fuzzy=True)
# Find data with variations or inconsistencies
refs = xref.resolve("realize.01", source="propbank", fuzzy=True)
print(f"VerbNet classes: {refs['verbnet_classes']}")
```

Expand Down Expand Up @@ -93,13 +93,13 @@ xref.clear_cache()

### Fuzzy Matching

The system supports fuzzy matching for handling typos and variations:
The system supports fuzzy matching for finding data with typos, morphological variants, and spelling inconsistencies:

```python
# Find matches even with typos
refs = xref.resolve("transferr.01", source="propbank", fuzzy=True, threshold=0.7)
# Find data with variations
refs = xref.resolve("organize.01", source="propbank", fuzzy=True, threshold=0.8)

# The system will find "transfer.01" and return its references
# The system will find variants if they exist and return their references
```

### Confidence Scores
Expand All @@ -111,4 +111,4 @@ All mappings include confidence scores based on:

## Limitations

Cross-references in these datasets are incomplete and sometimes approximate. VerbNet members don't always have WordNet mappings. PropBank rolesets may lack VerbNet mappings. The quality and coverage of references varies between dataset pairs. Fuzzy matching can occasionally produce false positives at lower thresholds.
Cross-references in these datasets are incomplete and sometimes approximate. VerbNet members don't always have WordNet mappings. PropBank rolesets may lack VerbNet mappings. The quality and coverage of references varies between dataset pairs. The datasets themselves may contain typos or morphological variants, which fuzzy matching helps to address. Fuzzy matching can occasionally produce false positives at lower thresholds.
Loading