fix: Fixes FrameNet lexical unit loading by aaronstevenwhite · Pull Request #5 · FACTSlab/glazing

aaronstevenwhite · 2025-10-28T14:47:00Z

Fix FrameNet Lexical Unit Loading

Fixes #4

Description

This PR fixes a critical data completeness issue where FrameNet lexical units were not being loaded during dataset conversion. All frames had empty lexical_units fields despite the raw FrameNet data containing 13,575 lexical units. This fix parses lexical units from luIndex.xml and properly associates them with their frames during conversion.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

Key Changes

Lexical Unit Loading

Added LU parsing: New methods to parse lexical units from luIndex.xml
Frame association: LUs now properly associated with frames by frame_id
Complete metadata: Preserves POS tags, annotation status, sentence counts, lexeme structures
High success rate: 13,572 out of 13,575 LUs successfully parsed (99.98%)

Validation Updates

Relaxed patterns: Updated validators to handle real-world FrameNet data
Proper nouns: Now accepts "April.n", "Monday.n"
Multi-word expressions: Now accepts "a bit.n", "give up.v"
Special characters: Now accepts "(can't) help.v", "American [N and S Am].n"

Version Bump

Version: 0.2.0 → 0.2.1 (patch release)
Updated: All version references across package and documentation

Problem

Frames had empty lexical_units fields after conversion:

>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
0  # Should be 5!

The converter only parsed frame XML files and never loaded lexical unit data from luIndex.xml.

Solution

1. Added LU Parsing Methods

_parse_lu_from_index() - Parse individual LU from XML element:

Extract metadata (ID, name, POS, frame association, annotation status)
Create lexemes from multi-word expressions
Build sentence count statistics

convert_lu_index_file() - Convert entire luIndex.xml:

Parse all LU elements from index
Handle errors gracefully with warnings
Return list of LexicalUnit models

2. Updated Frame Conversion

Modified convert_frames_directory() to:

Parse all frame XML files
Load lexical units from luIndex.xml (in parent directory)
Group LUs by frame_id
Associate LUs with their frames

3. Relaxed Validators

Before:

LU_NAME_PATTERN = r"^[a-z][a-z0-9_\'-]*\.[a-z]+$"  # Too strict
LEXEME_NAME_PATTERN = r"^[A-Za-z][a-zA-Z0-9\'-]*$"  # Too strict

After:

LU_NAME_PATTERN = r"^.+\.[a-z]+$"  # Permissive: anything.pos
LEXEME_NAME_PATTERN = r"^.+$"      # Permissive: any non-empty string

Impact

Before This Fix

✗ All frames had empty lexical_units fields
✗ Could not query frames by lexical unit name
✗ ~1,100+ LUs rejected by strict validators (91.3% success)
✗ Missing critical FrameNet data

After This Fix

✓ All frames include their lexical units with complete metadata
✓ Can query frames by lexical unit name via frame index
✓ Only 3 LUs rejected (99.98% success)
✓ Complete FrameNet data coverage

Example

>>> from glazing.framenet.loader import FrameNetLoader
>>> loader = FrameNetLoader()
>>> index = loader.build_frame_index(loader.frames)
>>> frame = index.get_frame_by_name("Abandonment")
>>> len(frame.lexical_units)
5
>>> frame.lexical_units[0].name
'abandon.v'
>>> frame.lexical_units[0].pos
'V'

Files Changed

Core Implementation

src/glazing/framenet/converter.py - Added LU parsing and frame association
src/glazing/framenet/types.py - Relaxed validation patterns
src/glazing/cli/convert.py - Updated CLI progress messages

Tests

tests/test_framenet/test_converter.py - Added 5 new LU parsing tests
tests/test_framenet/test_types.py - Updated validation tests
tests/test_framenet/test_models.py - Updated model tests

Version and Documentation

pyproject.toml - Version bump to 0.2.1
src/glazing/__version__.py - Version bump to 0.2.1
CHANGELOG.md - Added 0.2.1 entry
docs/ - Updated all version references (9 files)
.pre-commit-config.yaml - Fixed Python 3.13 compatibility for hooks

Testing

All tests pass with comprehensive coverage:

pytest tests/ -v --tb=short -q
# 1,338 tests passed, 81% code coverage

mypy --strict src/
# Success: no issues found

ruff check src/ tests/
# All checks passed

ruff format src/ tests/
# All files formatted correctly

Compatibility

✓ Fully backwards compatible with v0.2.0
✓ No API changes
✓ No breaking changes
⚠️ Users must reconvert FrameNet data to populate lexical units:
```
glazing init --force
```

Checklist

Code follows project style guidelines (ruff, mypy)
All tests pass locally
Added tests for new functionality
Updated documentation
Updated CHANGELOG.md
Version bumped appropriately (0.2.0 → 0.2.1)
Release notes prepared

…xing validators to handle all actual data.

Fixes FrameNet lexical unit loading by parsing luIndex.xml and rela…

bf362a9

…xing validators to handle all actual data.

aaronstevenwhite merged commit 25044e1 into main Oct 28, 2025
9 checks passed

aaronstevenwhite deleted the fix/framenet-lexical-units-loading branch October 28, 2025 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fixes FrameNet lexical unit loading#5

fix: Fixes FrameNet lexical unit loading#5
aaronstevenwhite merged 1 commit intomainfrom
fix/framenet-lexical-units-loading

aaronstevenwhite commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aaronstevenwhite commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix FrameNet Lexical Unit Loading

Description

Type of Change

Key Changes

Lexical Unit Loading

Validation Updates

Version Bump

Problem

Solution

1. Added LU Parsing Methods

2. Updated Frame Conversion

3. Relaxed Validators

Impact

Before This Fix

After This Fix

Example

Files Changed

Core Implementation

Tests

Version and Documentation

Testing

Compatibility

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aaronstevenwhite commented Oct 28, 2025 •

edited

Loading