-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Problem
The various dimensions graph traversal endpoints (which help answer questions like "what dimensions are shared by these metrics" or "what metrics can use these dimensions") use recursive CTEs to traverse the graph. For large dimension graphs, these queries are slow and can overload the database.
Solution
We can pre-compute the transitive closure of dimension relationships in a dedicated table. This turns expensive recursive queries into simple joins. A transitive closure contains all indirect relationships in a graph. For example, consider the dimensions graph below:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ User │ ──▶ │ Country │ ──▶ │ Region │
└─────────┘ └─────────┘ └─────────┘
│
▼
┌─────────┐
│ Date │
└─────────┘
The direct edges User -> Country, User -> Date, Country -> Region are already captured in the database today through dimension links. Storing the transitive closure would add the indirect relationships:
| From | Can Reach |
|---|---|
| User | User, Country, Date, Region |
| Country | Country, Region |
| Date | Date |
| Region | Region |
With this pre-computation, we can now get from a recursive query at runtime into an O(1) table lookup, at the cost of maintaining the table when dimensions change.
Changes
- New database table dimension_reachability:
| source_dimension_id | target_dimension_id |
|---------------------|---------------------|
| user_dim | user_dim | ← self-loop (can reach itself)
| user_dim | country_dim | ← direct edge
| user_dim | region_dim | ← transitive (via country)
| country_dim | country_dim |
| country_dim | region_dim |
| region_dim | region_dim |
- Functions to pre-compute and maintain this table:
compute_dimension_graph()- computes the full transitive closure of the entire dimension graphadd_dimension_link_reachability()- incremental update for reachability table when a dimension link is createdremove_dimension_node_reachability- incremental delete across reachability table when a dimension link is removed
- Optimized Query Function: we can change the graph traversal to use this pre-computed table instead of a recursive CTE.
Maintenance Strategy
n = total number of dimension nodes
E = total number of direct edges (aka dimension links)
k = number of dimension nodes that can reach the source (when linking from source to target)
m = number of dimension nodes that the target can reach (when linking from source to target)
| Operation | Action | Cost |
|---|---|---|
| Initial | Full rebuild (starts empty) | O(n^3) |
| Dimension created | Add self-loop | O(1) |
| Dimension link created | Incremental add | O(k * m) |
| Dimension link deleted | Incremental delete | O(k * E) |
| Dimension deleted | Remove all edges for that dimension | O(1) |