Skip to content

Conversation

@MALathon
Copy link
Owner

Summary

Implement a global registry for storing and retrieving site schemas, with support for automatic URL-based schema detection. This builds on #11 (SiteSchema base class).

Changes

New Files

  • fetcharoo/schemas/registry.py - Registry implementation
  • tests/test_schemas_registry.py - 35 comprehensive tests

Updated Files

  • fetcharoo/schemas/__init__.py - Export registry functions

Registry API

from fetcharoo.schemas import (
    SiteSchema,
    register_schema,
    get_schema,
    detect_schema,
    list_schemas,
    schema,  # decorator
)

# Register a schema
my_schema = SiteSchema(
    name='my_site',
    url_pattern=r'https://mysite\.com/.*',
    sort_by='numeric'
)
register_schema(my_schema)

# Or use the decorator
@schema
class SpringerBook(SiteSchema):
    def __init__(self):
        super().__init__(
            name='springer_book',
            url_pattern=r'https?://link\.springer\.com/book/.*'
        )

# Auto-detect schema from URL
detected = detect_schema('https://link.springer.com/book/10.1007/978-3-031-41026-0')
print(detected.name)  # 'springer_book'

# List all schemas
print(list_schemas())  # ['my_site', 'springer_book']

Functions

Function Description
register_schema(schema, overwrite=False) Add schema to registry
unregister_schema(name) Remove schema by name
get_schema(name) Get schema by name
detect_schema(url) Auto-detect schema from URL
list_schemas() List all schema names (sorted)
get_all_schemas() Get all schemas as dict
clear_registry() Clear all schemas
is_registered(name) Check if schema exists
schema_count() Count registered schemas
@schema Decorator to register class/instance

Test Plan

  • 35 new tests for registry functionality
  • All 304 tests pass (269 existing + 35 new)

Next Steps

This PR enables:

Closes

Closes #12

Implement a global registry for storing and retrieving site schemas,
with support for automatic URL-based schema detection.

Registry functions:
- register_schema(): Add schema to registry
- unregister_schema(): Remove schema from registry
- get_schema(): Retrieve schema by name
- detect_schema(): Auto-detect schema from URL pattern
- list_schemas(): List all registered schema names
- get_all_schemas(): Get all schemas as dict
- clear_registry(): Clear all schemas (for testing)
- is_registered(): Check if schema exists
- schema_count(): Count registered schemas
- @Schema decorator: Register class or instance

Features:
- Duplicate name detection with optional overwrite
- Type validation for registration
- Sorted listing of schema names
- First-match-wins auto-detection

Includes 35 comprehensive tests.

Closes #12
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 99.59677% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
tests/test_schemas_registry.py 99.50% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement schema registry with auto-detection

3 participants