Skip to content

Dependency ordering: Generated schema/migration file has wrong DDL ordering with tables and functions, is not valid #11

@SegiNyn

Description

@SegiNyn

Hello, this issue was originally reported in the migra repository djrobstep/migra#196 by @alexkreidler but since it is yet to be fixed, I am reporting it here again.

The original message:

I'm using migra with tusker, a tool that creates a temporary PG database CREATE DATABASE; for both my schema and migrations separately, runs them from my SQL files on disk, and then diffs the output.

When I run tusker diff -r schema migrations, it generates something like this (truncated for brevity):

create schema if not exists "app";

create extension if not exists "pgcrypto" with schema "public" version '1.3';

create table "app"."datasets" (
    "id" uuid not null default generate_ulid_uuid(),
    [...]
);

[...]

CREATE OR REPLACE FUNCTION public.generate_ulid_uuid()
 RETURNS uuid
 LANGUAGE plpgsql
AS $function$
DECLARE
    [...]
END
$function$
;

Generally, this generated diff/migration is good, eg it creates schemas before tables before constraints. But the user defined functions are at the end, and my table as you can see relies on a function, so directly applying this migration without manually reordering it results in an error.

I've looked at some issues that seem related to dependency tracking of DDL statements/db objects: #8 for functions, #142 inherited tables, #7 views.

Based on #8, my impression is that migra currently relies on native Postgres functionality to track the dependencies between DB objects, something like this.

The first bullet from #189:

  • Proper dependency resolution (make a graph of all dependencies and topologically sort it)

makes me think it would be great to implement in Python, eg parsing the DDL with a library, applying some rules to the parsed objects to extract possible references to other objects, then creating a DAG and sorting. I have a few thoughts on that:

  • it could be great to make the dependency sorter a separate library/CLI tools so people with a bunch of SQL scripts could keep them in whatever order they want and then run it.
  • because there are often many valid topological sorts of a given DAG, it could be useful to have some additional rules to sort objects when dependencies are not an issue, like ordering by object type: schemas, tables, constraints, functions, like the existing logic. This "stylistic sort" could make things more consistent and human readable. I don't know how complex it would be to combine those two goals.

I believe that ensuring outputted migrations have a valid DB object creation order, so they can be immediately run would be a serious UX (#25) benefit and hopefully encourage adoption of the tool.

I'd love any feedback anyone has on the next steps towards implementing this. Thanks for this great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions