Skip to content

Search and Health Endpoints with Multilingual Support#42

Open
RichardCMX wants to merge 36 commits intomainfrom
feature/search-health-endpoints
Open

Search and Health Endpoints with Multilingual Support#42
RichardCMX wants to merge 36 commits intomainfrom
feature/search-health-endpoints

Conversation

@RichardCMX
Copy link
Collaborator

📋 Summary

Implements intelligent search functionality with fuzzy matching and accent-insensitive search for multilingual support, plus comprehensive health monitoring endpoints. Adds interactive Swagger UI for API testing and exploration.

This PR resolves issue #28 (Search/autocomplete endpoints and health checks)

✨ Features Added

🔍 Search API with Advanced Matching

  • GET /api/search/ - Unified search endpoint with intelligent ranking
    • Query parameters:

      • q (required): Search query string
      • type (optional): 'stops', 'routes', or 'all' (default)
      • limit (optional): Maximum results (1-100), defaults to 20
      • feed_id (optional): Limit search to specific feed
    • Key Features:

      • 🎯 Relevance Scoring (0.0-1.0): Exact matches = 1.0, sorted by relevance
      • 🔍 Fuzzy Text Matching: PostgreSQL pg_trgm extension handles typos
      • 🌐 Multilingual Support: Unaccent extension for accent-insensitive search
        • "San José" matches "San Jose" and vice versa
        • Perfect for Spanish, Portuguese, and other accented languages
      • 📊 Multi-field Search: Searches names, descriptions across stops and routes
      • Optimized Queries: Trigram similarity with intelligent fallback

🏥 Health & Monitoring Endpoints

  • GET /api/health/ - Simple health check

    • Returns {"status": "ok", "timestamp": "..."}
    • Lightweight, no database queries
    • Perfect for basic uptime monitoring and load balancer health checks
  • GET /api/ready/ - Comprehensive readiness check

    • Returns 200 when ready, 503 when not ready
    • Validates:
      • Database connectivity (PostgreSQL)
      • Current GTFS feed availability
    • Returns detailed status including feed_id, database health, and timestamp
    • Ideal for Kubernetes readiness probes and deployment validation

🗄️ PostgreSQL Extensions

  • pg_trgm extension - Trigram similarity for fuzzy text matching
  • unaccent extension - Accent-insensitive text matching for multilingual support
  • Configured via:
    • docker/db/init.sql - Automatic setup on database creation
    • datahub/test_runner.py - Custom test runner ensures extensions in test database
    • Works in both development and test environments

🎮 Interactive API Documentation

  • Swagger UI - Added at /api/docs/swagger/

    • Interactive forms for all API endpoints
    • "Try it out" functionality for live testing
    • Real-time request/response preview
    • Parameter descriptions and validation
    • Perfect for exploring and testing the search API
  • ReDoc - Available at /api/docs/

    • Clean, organized API documentation
    • Detailed endpoint descriptions
    • Request/response examples

🧪 Testing

Search Endpoint Tests (test_search.py)

  • Exact name matching validation
  • Partial name matching tests
  • Description-based search tests
  • Type filtering (stops, routes, all)
  • Limit parameter validation (bounds checking)
  • Relevance score validation
  • Query parameter requirement tests
  • Comprehensive edge case coverage

Health Endpoint Tests (test_health.py)

  • Health endpoint structure validation
  • Ready endpoint with/without current feed
  • Database connectivity error handling
  • Feed availability checks
  • Multiple current feeds handling
  • Response structure validation
  • Status value validation (ready/not_ready)
  • Error condition testing

📚 Documentation

  • CHANGELOG.md - Comprehensive documentation of all features
  • README.md - Updated with:
    • Multilingual search documentation
    • Accent-insensitive search examples
    • Interactive API documentation section
    • Health and readiness endpoint usage
    • Swagger UI and ReDoc information
  • Test Documentation - All tests include descriptive docstrings

📝 Commits

  • Merge from feat/api-read-endpoints (includes DAL and previous features)
  • feat: add unaccent support for multilingual search
  • fix(db): add missing pg_trgm extension setup (cherry-picked from future branch)
  • feat: add Swagger UI for interactive API documentation
  • docs: add CHANGELOG for search and health endpoints feature
  • docs: update README with unaccent and Swagger UI documentation

🔍 Related Issues

Closes #28

✅ Checklist

  • Code follows project style guidelines
  • Tests added and passing (search and health endpoints)
  • Documentation updated (README, CHANGELOG)
  • No breaking changes introduced
  • OpenAPI schema updated with drf-spectacular
  • PostgreSQL extensions configured
  • Interactive Swagger UI added

📸 Example Usage

Search with Multilingual Support

# Search with accented characters
curl "http://localhost:8000/api/search/?q=San+José&type=stops"

# Search without accents (still finds "San José")
curl "http://localhost:8000/api/search/?q=San+Jose&type=stops"

# Fuzzy search (handles typos)
curl "http://localhost:8000/api/search/?q=Univercidad&type=routes"

# Search everything
curl "http://localhost:8000/api/search/?q=UCR&limit=10"

Health Checks

# Simple health check
curl "http://localhost:8000/api/health/"

# Readiness check
curl "http://localhost:8000/api/ready/"

Interactive Testing

Visit in your browser:

🎯 Next Steps

After merging, the following branches will need to be rebased/merged to include these changes:

  • feature/auth-rate-limits
  • feature/client-management
  • feature/security-performance
  • feature/admin-panel-metrics
  • feature/unit-integration-contract-tests

💡 Technical Notes

  • Search Performance: Trigram indexing provides sub-second search on thousands of stops/routes
  • Multilingual: Unaccent extension normalizes Unicode characters for matching
  • Graceful Degradation: Search falls back to basic text matching if extensions unavailable
  • Health Checks: /health/ is lightweight (no DB), /ready/ is comprehensive (validates DB and feeds)
  • Test Isolation: Custom test runner ensures PostgreSQL extensions available in test database

RichardCMX and others added 30 commits September 28, 2025 17:22
…as HH:MM:SS; validate stop_id; add Fuseki flags and DAL docs; add /api/schedule/departures/ endpoint
…tures endpoint; tests(api): add tests for DAL-backed schedule departures
…add Fuseki dev guide and update README/architecture; fix duplicate FUSEKI_ENDPOINT
…tion (LimitOffsetPagination, page size 50)\n- Add /api/alerts route (ServiceAlertViewSet)\n- Add /api/arrivals endpoint integrating with ETAS_API_URL (Project 4)\n- Add /api/status endpoint reporting DB/Redis/Fuseki health\n- Update OpenAPI (datahub.yml) for pagination + new endpoints
…Message, StopTimeUpdate to real model fields
…, stop-time-updates), pagination, OpenAPI docs, ETAs config, and testing instructions
…rules, feed-messages, stop-time-updates; polish examples for core endpoints
- Implement /api/search/ endpoint with ranking for stops and routes
  - Support for fuzzy text matching with relevance scoring
  - Configurable search types (stops, routes, all)
  - Limit and pagination support
  - Feed-specific search capability

- Add /api/health/ endpoint for basic health checks
  - Simple status check returning 200 OK
  - Minimal response for lightweight monitoring

- Add /api/ready/ endpoint for readiness checks
  - Database connectivity verification
  - Current feed availability check
  - Returns 503 when not ready to serve requests

- Comprehensive test coverage for all new endpoints
  - Search functionality tests with various scenarios
  - Health endpoint validation tests
  - Edge cases and error handling tests

- Full OpenAPI documentation integration
- Proper error handling and validation
- Follows existing code patterns and conventions
- Fix missing FloatField import in views.py for search functionality
- Add comprehensive edge case tests for search:
  - Case insensitivity testing
  - Special characters and Unicode handling
  - Numbers and symbols in queries
  - Very long query handling
- Add additional health endpoint test for multiple current feeds
- Ensure robust error handling and graceful degradation
- Add custom API root view to include search, health, and ready endpoints
- Update /api/ to show all endpoints including new search and health services
- Add comprehensive README documentation for Issue #28:
  - Search API with intelligent ranking and fuzzy matching
  - Health monitoring endpoints for load balancers and Kubernetes
  - Complete usage examples with curl commands
  - Response format documentation
  - Integration examples (Docker health checks, K8s probes)

The API root now shows:
- search: http://localhost:8000/api/search/
- health: http://localhost:8000/api/health/
- ready: http://localhost:8000/api/ready/
- Remove Fuseki Docker service from docker-compose.yml
- Remove fuseki_data volume
- Delete storage/fuseki_schedule.py implementation
- Delete api/tests/test_fuseki_schedule.py integration tests
- Remove docker/fuseki/ configuration directory
- Remove docs/dev/fuseki.md documentation
- Update storage/factory.py to use only PostgreSQL repository
- Remove FUSEKI_ENABLED and FUSEKI_ENDPOINT from settings.py
- Remove Fuseki environment variables from .env.local.example
- Update README.md and docs/architecture.md to remove Fuseki references

PostgreSQL with Redis caching is now the sole storage backend.
- Document Data Access Layer implementation
- Document new /api/schedule/departures/ endpoint
- Document Redis caching configuration
- Document Fuseki removal
- Follow Keep a Changelog format
- Add class-level docstring explaining DAL testing
- Document setUp method for test data preparation
- Add docstrings for test_returns_404_when_stop_missing
- Add docstrings for test_returns_departures_with_expected_shape
- Improve test readability and maintainability
- Document test structure and organization
- Explain test coverage for schedule departures endpoint
- Provide examples for running tests
- Document test data setup approach
- Add guidelines for adding new tests
- Document /api/arrivals/ endpoint with ETA service integration
- Document /api/status/ health check endpoint
- Document /api/alerts/, /api/feed-messages/, /api/stop-time-updates/
- Document global pagination implementation
- Document ETAS_API_URL configuration
- Document comprehensive test suite for arrivals endpoint
- Add class-level docstring explaining ETA service integration testing
- Document test_arrivals_returns_expected_shape
- Document test_arrivals_propagates_upstream_error
- Document test_arrivals_requires_stop_id
- Document test_arrivals_accepts_wrapped_results_object
- Document test_arrivals_handles_unexpected_upstream_structure_as_empty_list
- Document limit validation tests
- Document test_arrivals_returns_501_if_not_configured
- Add test_arrivals.py documentation
- Document all 9 test cases for arrivals endpoint
- Add examples for running arrivals tests
- Document mocked HTTP request testing approach
- Update coverage section with new test areas
- Add unittest.mock to dependencies
RichardCMX and others added 6 commits November 13, 2025 13:09
- Update search queries to use __unaccent lookup for accent-insensitive matching
- Support multilingual searches (Spanish, Portuguese, etc.)
- Searches like 'San José' now match 'San Jose' and vice versa
- Trigram similarity now operates on unaccented text for better fuzzy matching

This improves search experience for Costa Rican transit data with accented characters.
BUG DISCOVERED:
Issue #28 (search/autocomplete endpoints) implemented TrigramSimilarity
for fuzzy text matching but never created the required PostgreSQL pg_trgm
extension. The code silently fell back to basic string matching (icontains)
via try/except blocks in api/views.py lines 1064-1104 and 1125-1179.

This bug went undetected because:
- Original tests validated API response structure, not trigram functionality
- Exception handling masked the missing extension
- Fallback logic allowed endpoints to return results

IMPACT:
- Search accuracy degraded (no fuzzy matching)
- Search performance reduced (no trigram indexing)
- Feature deployed incomplete

FIX:
Add PostgreSQL extension setup for both main and test databases:

1. docker/db/init.sql
   - Creates pg_trgm extension in dev/prod database on first container run
   - Mounted via docker-compose.yml at /docker-entrypoint-initdb.d/
   - Enables TrigramSimilarity queries in search endpoints

2. datahub/test_runner.py
   - Custom Django test runner (InfobusTestRunner)
   - Creates pg_trgm extension in isolated test database
   - Required because Django doesn't copy extensions to test DB

3. datahub/settings.py
   - Configure TEST_RUNNER to use InfobusTestRunner
   - Ensures extensions available during test execution

4. docker-compose.yml
   - Mount init.sql to PostgreSQL initialization directory
   - Extension created automatically on database first start

VERIFICATION:
Comprehensive integration tests now verify actual trigram functionality
instead of just API response structure, catching this missing setup.

Resolves incomplete implementation from commit ea877e2 (Issue #28).
- Add SpectacularSwaggerView to api/urls.py
- Available at /api/docs/swagger/
- Provides interactive forms for testing all API endpoints
- Complements existing ReDoc documentation at /api/docs/
- Document /api/search/ with fuzzy matching and unaccent support
- Document /api/health/ and /api/ready/ endpoints
- Document PostgreSQL extensions (pg_trgm, unaccent)
- Document Swagger UI and ReDoc integration
- Document comprehensive test suites
- Add multilingual search documentation with unaccent extension
- Document accent-insensitive search (San Jose matches San José)
- Add Interactive API Documentation section
- Document Swagger UI at /api/docs/swagger/
- Document ReDoc and DRF Browsable API
- Improve search feature descriptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search/autocomplete and health checks

2 participants