Skip to content

feat(queries): add classification and metadata to saved queries#134

Merged
shefeek-jinnah merged 4 commits intomainfrom
shefeek/classify_queries
Feb 26, 2026
Merged

feat(queries): add classification and metadata to saved queries#134
shefeek-jinnah merged 4 commits intomainfrom
shefeek/classify_queries

Conversation

@shefeek-jinnah
Copy link
Contributor

@shefeek-jinnah shefeek-jinnah commented Feb 26, 2026

Summary

  • Add automatic SQL query classification that analyzes DataFusion logical plans and categorizes saved queries into: full_scan, projection, filtered_scan, point_lookup, aggregation, join
  • Store rich classification metadata per version: category, num_tables, has_predicate, has_join, has_aggregation, has_group_by, has_order_by, has_limit
  • Add user-provided tags (array) and description (text) fields to saved queries
  • Add category_override and table_size_override columns for future manual overrides (read APIs use COALESCE(category_override, category))
  • Add v7 migrations for both SQLite and Postgres

API Changes

  • POST /v1/queries — accepts optional tags and description; response includes classification fields
  • PUT /v1/queries/{id} — accepts optional tags and description; updates without creating new version when only metadata changes
  • GET /v1/queries/{id} and GET /v1/queries — responses include tags, description, and classification fields
  • GET /v1/queries/{id}/versions — version entries include classification fields
  • Classification fields are omitted from JSON when null (e.g., if plan analysis fails)
  • Tags are trimmed and validated (max 50 tags, max 100 chars each; whitespace-only rejected)
  • Description validated (max 10,000 chars)

Design Decisions

  • Classification is best-effort: if logical plan building fails (e.g., missing tables), fields are stored as NULL and a warning is logged
  • Classification uses a fresh SessionContext per call to avoid racing with concurrent queries
  • Tags stored as JSON text in a single column for simplicity
  • Tags JSON deserialization failures log a warning and fall back to empty (consistent with timestamp parsing pattern)

@shefeek-jinnah shefeek-jinnah force-pushed the shefeek/classify_queries branch from 257fca2 to c6573e4 Compare February 26, 2026 13:55
@shefeek-jinnah shefeek-jinnah marked this pull request as ready for review February 26, 2026 14:18
@shefeek-jinnah shefeek-jinnah merged commit 4c27c5a into main Feb 26, 2026
11 checks passed
@shefeek-jinnah shefeek-jinnah deleted the shefeek/classify_queries branch February 26, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant