Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Changed

- Improved documentation clarity and conciseness

## [0.1.0] - 2025-09-23

Initial release of `glazing`, a package containing unified data models and interfaces for syntactic and semantic frame ontologies.
Expand Down
2 changes: 1 addition & 1 deletion docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ results = search.search("abandon")

## Type Safety

All models use Pydantic v2 for validation and provide comprehensive type hints. This ensures:
All models use Pydantic v2 for validation and provide complete type hints. This ensures:

- Runtime validation of data
- IDE autocomplete support
Expand Down
28 changes: 6 additions & 22 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Glazing provides a unified, type-safe interface for working with FrameNet, PropB

- 🚀 **One-command initialization:** Download and convert all datasets with `glazing init`
- 📦 **Type-safe data models:** Using Pydantic v2 for validation and serialization
- 🔍 **Comprehensive CLI:** Download, convert, and search datasets from the command line
- 🔍 **Command-line interface:** Download, convert, and search datasets from the command line
- 🔗 **Cross-dataset references:** Find connections between different linguistic resources
- 🐍 **Python 3.13+:** Modern Python with comprehensive type hints
- 🐍 **Python 3.13+:** Modern Python with full type hints
- 📊 **Efficient storage:** JSON Lines format for fast loading and streaming

## Supported Datasets
Expand Down Expand Up @@ -60,33 +60,17 @@ for result in results[:5]:
print(f"{result.dataset}: {result.name} - {result.description}")
```

## Documentation Structure
## Documentation

- **[Installation](installation.md):** System requirements and installation options
- **[Quick Start](quick-start.md):** Get up and running in minutes
- **[User Guide](user-guide/cli.md):** Detailed usage instructions
- **[API Reference](api/index.md):** Complete API documentation
- **[Contributing](contributing.md):** How to contribute to the project
Start with [Installation](installation.md) for system requirements, then follow the [Quick Start](quick-start.md) to get running in minutes. The [User Guide](user-guide/cli.md) covers detailed usage, while the [API Reference](api/index.md) documents all classes and methods. See [Contributing](contributing.md) if you'd like to help improve the project.

## Why Glazing?

Working with linguistic resources traditionally requires:

- Understanding different data formats (XML, custom databases, etc.)
- Writing custom parsers for each resource
- Managing cross-references manually
- Dealing with inconsistent APIs

Glazing solves these problems by providing:

- Unified data models across all resources
- Automatic data conversion to efficient formats
- Built-in cross-reference resolution
- Consistent search and access patterns
Working with linguistic resources traditionally requires understanding different data formats (XML, custom databases), writing custom parsers for each resource, managing cross-references manually, and dealing with inconsistent APIs. Glazing solves these problems by providing unified data models across all resources, automatic data conversion to efficient formats, built-in cross-reference resolution, and consistent search and access patterns.

## Project Status

Glazing is actively maintained and welcomes contributions. The project follows semantic versioning and maintains comprehensive test coverage.
Glazing is actively maintained and welcomes contributions. The project follows semantic versioning and includes extensive test coverage.

## Links

Expand Down
232 changes: 33 additions & 199 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
@@ -1,262 +1,96 @@
# Quick Start

Get up and running with Glazing in minutes. This guide assumes you've already [installed](installation.md) the package.
Get Glazing running in minutes. This guide assumes you have Python 3.13+ and pip installed.

## Initialize Datasets

Start by downloading and converting all datasets:
## Installation and Setup

```bash
glazing init
pip install glazing
glazing init # Downloads ~54MB, creates ~130MB of data
```

This one-time setup downloads ~54MB of data and prepares it for use (~130MB total after conversion).

## CLI Usage
The `init` command downloads all four datasets and converts them to an efficient format. This can take a few minutes but only needs to be done once.

### Search for a Word
## Command Line

Find entries across all datasets:
Search across all datasets:

```bash
# Search for "give" in all datasets (uses default data directory)
glazing search query "give"

# Search only in VerbNet
glazing search query "give" --dataset verbnet

# Get JSON output
glazing search query "give" --json
glazing search query "give" --dataset verbnet # Limit to one dataset
```

### Find Cross-References

Discover connections between datasets:
Find cross-references between datasets:

```bash
# Find VerbNet classes for a PropBank roleset
glazing search cross-ref --source propbank --target verbnet --id "give.01"
glazing search cross-ref --source propbank --id "give.01" --target verbnet
```

### Get Dataset Information

Learn about available datasets:

```bash
# List all datasets
glazing download list

# Get info about VerbNet
glazing download info verbnet
```

## Python API Usage

### Basic Search
## Python API

```python
from glazing.search import UnifiedSearch

# Initialize search (automatically uses default paths)
# Search all datasets
search = UnifiedSearch()

# Search across all datasets
results = search.search("abandon")

for result in results[:5]:
print(f"{result.dataset}: {result.name}")
print(f" Type: {result.type}")
print(f" Description: {result.description[:100]}...")
print()
```

### Load Individual Datasets

```python
from glazing.framenet.loader import FrameNetLoader
from glazing.verbnet.loader import VerbNetLoader

# Loaders automatically use default paths and load data after 'glazing init'
fn_loader = FrameNetLoader() # Data is already loaded
frames = fn_loader.frames
print(f"Loaded {len(frames)} frames")

vn_loader = VerbNetLoader() # Data is already loaded
verb_classes = list(vn_loader.classes.values())
print(f"Loaded {len(verb_classes)} verb classes")
print(f"{result.dataset}: {result.name} - {result.description}")
```

### Work with VerbNet Classes
Load specific datasets:

```python
from glazing.verbnet.loader import VerbNetLoader

# Loader automatically uses default path and loads data
loader = VerbNetLoader()

# Access already loaded verb classes
classes = list(loader.classes.values())
verb_classes = list(loader.classes.values())

# Find a specific class
give_class = next(
(vc for vc in classes if vc.id == "give-13.1"),
None
)

give_class = next((vc for vc in verb_classes if vc.id == "give-13.1"), None)
if give_class:
print(f"Class: {give_class.id}")
print(f"Members: {[m.name for m in give_class.members[:5]]}")
print(f"Thematic Roles: {[tr.role_type for tr in give_class.themroles]}")

# Examine frames
for frame in give_class.frames[:2]:
print(f"\nFrame: {frame.description.primary}")
print(f" Example: {frame.examples[0] if frame.examples else 'N/A'}")
print(f"Members: {[m.name for m in give_class.members]}")
print(f"Roles: {[tr.role_type for tr in give_class.themroles]}")
```

### Work with PropBank
Work with WordNet synsets:

```python
from glazing.propbank.loader import PropBankLoader

# Loader automatically uses default path and loads data
loader = PropBankLoader()

# Access already loaded framesets
framesets = list(loader.framesets.values())
from glazing.wordnet.loader import WordNetLoader

# Find rolesets for "give"
give_framesets = [fs for fs in framesets if fs.lemma == "give"]
loader = WordNetLoader()
synsets = list(loader.synsets.values())

for frameset in give_framesets:
print(f"Frameset: {frameset.lemma}")
for roleset in frameset.rolesets:
print(f" Roleset: {roleset.id} - {roleset.name}")
for role in roleset.roles:
print(f" {role.argnum}: {role.description}")
# Find synsets for "dog"
dog_synsets = [s for s in synsets if any(l.lemma == "dog" for l in s.lemmas)]
for synset in dog_synsets[:3]:
print(f"{synset.id}: {synset.definition}")
```

### Cross-Reference Resolution
Extract cross-references:

```python
from glazing.references.extractor import ReferenceExtractor
from glazing.references.resolver import ReferenceResolver
from glazing.verbnet.loader import VerbNetLoader
from glazing.propbank.loader import PropBankLoader

# Load datasets
vn_loader = VerbNetLoader() # Automatically loads data
pb_loader = PropBankLoader() # Automatically loads data
vn_loader = VerbNetLoader()
pb_loader = PropBankLoader()

# Extract references
extractor = ReferenceExtractor()
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
extractor.extract_propbank_references(list(pb_loader.framesets.values()))

# Resolve references for a PropBank roleset
resolver = ReferenceResolver(extractor.mapping_index)
related = resolver.resolve("give.01", source="propbank")

print(f"PropBank roleset: give.01")
print(f"VerbNet classes: {related.verbnet_classes}")
print(f"FrameNet frames: {related.framenet_frames}")
print(f"WordNet senses: {related.wordnet_senses}")
```

### WordNet Synsets and Relations

```python
from glazing.wordnet.loader import WordNetLoader

# Loader automatically uses default path and loads data
loader = WordNetLoader()
synsets = list(loader.synsets.values()) # Already loaded

# Find synsets for "dog"
dog_synsets = [s for s in synsets if any(
l.lemma == "dog" for l in s.lemmas
)]

for synset in dog_synsets[:3]:
print(f"Synset: {synset.id}")
print(f" POS: {synset.pos}")
print(f" Definition: {synset.definition}")
print(f" Lemmas: {[l.lemma for l in synset.lemmas]}")

# Show hypernyms
if synset.relations:
hypernyms = [r for r in synset.relations if r.type == "hypernym"]
if hypernyms:
print(f" Hypernyms: {[h.target_id for h in hypernyms]}")
```

### Streaming Large Files

For memory-efficient processing:

```python
from glazing.verbnet.loader import VerbNetLoader

# For memory-efficient streaming, use lazy loading
loader = VerbNetLoader(lazy=True, autoload=False)

# Stream verb classes one at a time
for verb_class in loader.iter_verb_classes():
# Process each class without loading all into memory
if "run" in [m.name for m in verb_class.members]:
print(f"Found 'run' in class: {verb_class.id}")
break
```

## Common Patterns

### Find Semantic Roles

```python
from glazing.verbnet.search import VerbNetSearch
from glazing.verbnet.loader import VerbNetLoader

# Loader automatically loads data
loader = VerbNetLoader()
search = VerbNetSearch(list(loader.classes.values()))

# Find all classes with an Agent role
agent_classes = []
for vc in search.verb_classes:
if any(tr.role_type == "Agent" for tr in vc.themroles):
agent_classes.append(vc.id)

print(f"Classes with Agent role: {len(agent_classes)}")
```

### Export to Custom Format

```python
import json
from glazing.framenet.loader import FrameNetLoader

# Loader automatically uses default path and loads data
loader = FrameNetLoader()
frames = loader.frames # Already loaded

# Export as simple JSON
simple_frames = []
for frame in frames[:10]:
simple_frames.append({
"id": frame.id,
"name": frame.name,
"definition": frame.definition.plain_text if frame.definition else "",
"frame_elements": [fe.name for fe in frame.frame_elements]
})

# Save to file
with open("frames_simple.json", "w") as f:
json.dump(simple_frames, f, indent=2)
```

## Next Steps

- Explore the [CLI documentation](user-guide/cli.md) for advanced command-line usage
- Read the [Python API guide](user-guide/python-api.md) for detailed programming examples
- Check the [API Reference](api/index.md) for complete documentation
- Learn about [cross-references](user-guide/cross-references.md) between datasets
- [CLI Documentation](user-guide/cli.md) for command-line options
- [Python API Guide](user-guide/python-api.md) for programming details
- [Cross-References](user-guide/cross-references.md) for connecting datasets
- [API Reference](api/index.md) for complete documentation
Loading
Loading