Skip to content

Pre-Compute Dimension Reachability Graph #1593

@shangyian

Description

@shangyian

Problem

The various dimensions graph traversal endpoints (which help answer questions like "what dimensions are shared by these metrics" or "what metrics can use these dimensions") use recursive CTEs to traverse the graph. For large dimension graphs, these queries are slow and can overload the database.

Solution

We can pre-compute the transitive closure of dimension relationships in a dedicated table. This turns expensive recursive queries into simple joins. A transitive closure contains all indirect relationships in a graph. For example, consider the dimensions graph below:

┌─────────┐      ┌─────────┐      ┌─────────┐
│  User   │ ──▶  │ Country │ ──▶  │ Region  │
└─────────┘      └─────────┘      └─────────┘
     │
     ▼
┌─────────┐
│  Date   │
└─────────┘

The direct edges User -> Country, User -> Date, Country -> Region are already captured in the database today through dimension links. Storing the transitive closure would add the indirect relationships:

From Can Reach
User User, Country, Date, Region
Country Country, Region
Date Date
Region Region

With this pre-computation, we can now get from a recursive query at runtime into an O(1) table lookup, at the cost of maintaining the table when dimensions change.

Changes

  1. New database table dimension_reachability:
| source_dimension_id | target_dimension_id |
|---------------------|---------------------|
| user_dim            | user_dim            |  ← self-loop (can reach itself)
| user_dim            | country_dim         |  ← direct edge
| user_dim            | region_dim          |  ← transitive (via country)
| country_dim         | country_dim         |
| country_dim         | region_dim          |
| region_dim          | region_dim          |
  1. Functions to pre-compute and maintain this table:
  • compute_dimension_graph() - computes the full transitive closure of the entire dimension graph
  • add_dimension_link_reachability() - incremental update for reachability table when a dimension link is created
  • remove_dimension_node_reachability - incremental delete across reachability table when a dimension link is removed
  1. Optimized Query Function: we can change the graph traversal to use this pre-computed table instead of a recursive CTE.

Maintenance Strategy

n = total number of dimension nodes
E = total number of direct edges (aka dimension links)
k = number of dimension nodes that can reach the source (when linking from source to target)
m = number of dimension nodes that the target can reach (when linking from source to target)

Operation Action Cost
Initial Full rebuild (starts empty) O(n^3)
Dimension created Add self-loop O(1)
Dimension link created Incremental add O(k * m)
Dimension link deleted Incremental delete O(k * E)
Dimension deleted Remove all edges for that dimension O(1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions