-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
Description
Summary
Propose a reusable template system for common crawling patterns (e.g., news, e-commerce) to improve developer experience, reduce repetitive configuration, and standardize crawler behavior across similar site types.
Motivation
- Reduce repetitive configuration for common site types
- Standardize crawling patterns across the project
- Improve developer productivity
Detailed Design
Template Structure
{
"name": "news-site",
"description": "Template for news websites",
"config": {
"maxPages": 500,
"maxDepth": 3,
"spaRenderingEnabled": true,
"extractTextContent": true,
"politenessDelay": 1000
},
"patterns": {
"articleSelectors": ["article", ".post", ".story"],
"titleSelectors": ["h1", ".headline", ".title"],
"contentSelectors": [".content", ".body", ".article-body"]
}
}API Endpoints
GET /api/templates- List available templatesPOST /api/crawl/add-site-with-template- Use template for crawlingPOST /api/templates- Create new template
Implementation Plan
- Phase 1: Core template system
- Phase 2: Template API endpoints
- Phase 3: Pre-built templates
- Phase 4: Template validation and testing
Testing Strategy
- Unit tests for template loading/application
- Integration tests with real websites
- Performance benchmarks
Migration Strategy
- Backward compatible with existing API
- Gradual migration path
Reactions are currently unavailable