Proposal: Change migration strategy for database object SQL #11696

bferguso · 2024-12-17T19:26:42Z

Background

With the current patterm for deploying database objects (table, view, function, etc.), where the SQL is directly embedded in the migration .py files, often with several objects embedded in a single file, it can be quite difficult to trace the changes made to those objects over time.

It would be nice to externalize SQL for those database objects (not 1-off SQL statements, but the SQL that represents database objects) so that it can:

be read clearly in its own file, not embedded as a Python string
can be diff'd to previous versions of the same object and
creates an easily visible history of the database object as it changes through time

Proposed Solution

Create a sub-directory of the migrations directory, say migrations/sql
Any SQL that is used to create a database object not be directly included in the migration file, but as a separate file in the sql/ directory, one object per file
SQL filename could be something ala Liquibase or Flyway pattern , something liike vYYYY.MM.DD.SEQ__object_name.sql, eg: v2024.12.17.001__my_function_call.sql
For other projects, we've broken down the SQL by year so the list of files doesn't grow to be too long over time (eg sql/2024, sql/2025..)
Write a Python migration utility to allow those files (1 or more) to be easily digested by the migrations. First cut of ours is something like this:

def format_files_into_sql(files: [str], sql_dir: str):
    sql_string = ""
    for filename in files:
        file_path = os.path.join(sql_dir, filename)
        with open(file_path) as file:
            sql_string = sql_string + "\n" + file.read()
    return sql_string

When referencing those file(s) from the migration it would look something like this:

    sql_dir = os.path.join(os.path.dirname(__file__), "sql")
    files = [
        "2024-07-25_fossil_sample.fossil_name_vw.sql",
        "2024-07-25_fossil_sample.fossil_name_mv.sql",
...
        "2024-08-15_databc_grants.sql"
    ]
    sql_statement = format_files_into_sql(files, sql_dir)
    
    operations = [
        migrations.RunSQL(create_materialized_views, migrations.RunSQL.noop),
    ]

A PR for this could target v8.

The text was updated successfully, but these errors were encountered:

bferguso · 2024-12-17T19:27:23Z

fyi @chiatt, @chrabyrd - Here's a starting ticket from our conversation today.

jacobtylerwalls · 2024-12-19T17:43:47Z

I really like this @bferguso.

I mentioned at standup today my interest in ensuring there's a pathway to use whatever pattern we land on here with existing packages like django-pgtriggers and django-pgviews, but looking again at your proposal, I think it's already pretty compatible 👍

(Reason: the result of format_files_into_sql() can be passed into either a RunSQL forward operation directly as you modeled above or into a python class attribute expected by those packages that will then generate the migrations and reverse ops.)

The minor suggestion I would have on the implementation sketch would be to skip the concat'ing of all the strings together and instead return an array, since it's simple enough for a user to concat themselves with "/n".join() if they desire. (I'd probably want a RunSQL operation for each statement so I could control order and reverse for each one.)

bferguso · 2024-12-19T18:05:25Z

@jacobtylerwalls - Thanks for having a look.

re: the concatenating - I went back and forth on this, and I got caught (probably overthinking) about how to manage the granularity of statements, and landed on the idea that if you wanted to run them independently then that could be done by either writing separate migrations, thereby putting each statement in it's own transaction, or by implementing one-by-one execution by using it like this:

    operations = [
        migrations.RunSQL(format_files_into_sql("2024-07-25_fossil_sample.fossil_name_vw.sql", sql_dir), migrations.RunSQL.noop),
        migrations.RunSQL(format_files_into_sql("2024-07-25_fossil_sample.fossil_name_mv.sql", sql_dir), migrations.RunSQL.noop),
        ...
    ]

Having said all that, I agree that it might be cleaner and give the user more control over how each statement is processed, and the more I think about it the less of an opinion I have.

I guess another question is if we even want to use lists of files within the migrations or if that's opening a can of worms we'd rather leave unopened.

chiatt added this to pipeline Dec 17, 2024

aj-he mentioned this issue Jan 3, 2025

Fixes for 76 search_filters and create_spatial_views sql files archesproject/arches-her#1356

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Change migration strategy for database object SQL #11696

Proposal: Change migration strategy for database object SQL #11696

bferguso commented Dec 17, 2024

bferguso commented Dec 17, 2024

jacobtylerwalls commented Dec 19, 2024

bferguso commented Dec 19, 2024

Proposal: Change migration strategy for database object SQL #11696

Proposal: Change migration strategy for database object SQL #11696

Comments

bferguso commented Dec 17, 2024

Background

Proposed Solution

bferguso commented Dec 17, 2024

jacobtylerwalls commented Dec 19, 2024

bferguso commented Dec 19, 2024