Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Change migration strategy for database object SQL #11696

Open
bferguso opened this issue Dec 17, 2024 · 3 comments
Open

Proposal: Change migration strategy for database object SQL #11696

bferguso opened this issue Dec 17, 2024 · 3 comments

Comments

@bferguso
Copy link
Contributor

Background

With the current patterm for deploying database objects (table, view, function, etc.), where the SQL is directly embedded in the migration .py files, often with several objects embedded in a single file, it can be quite difficult to trace the changes made to those objects over time.

It would be nice to externalize SQL for those database objects (not 1-off SQL statements, but the SQL that represents database objects) so that it can:

  1. be read clearly in its own file, not embedded as a Python string
  2. can be diff'd to previous versions of the same object and
  3. creates an easily visible history of the database object as it changes through time

Proposed Solution

  1. Create a sub-directory of the migrations directory, say migrations/sql
  2. Any SQL that is used to create a database object not be directly included in the migration file, but as a separate file in the sql/ directory, one object per file
  3. SQL filename could be something ala Liquibase or Flyway pattern , something liike vYYYY.MM.DD.SEQ__object_name.sql, eg: v2024.12.17.001__my_function_call.sql
  4. For other projects, we've broken down the SQL by year so the list of files doesn't grow to be too long over time (eg sql/2024, sql/2025..)
  5. Write a Python migration utility to allow those files (1 or more) to be easily digested by the migrations. First cut of ours is something like this:
def format_files_into_sql(files: [str], sql_dir: str):
    sql_string = ""
    for filename in files:
        file_path = os.path.join(sql_dir, filename)
        with open(file_path) as file:
            sql_string = sql_string + "\n" + file.read()
    return sql_string
  1. When referencing those file(s) from the migration it would look something like this:
    sql_dir = os.path.join(os.path.dirname(__file__), "sql")
    files = [
        "2024-07-25_fossil_sample.fossil_name_vw.sql",
        "2024-07-25_fossil_sample.fossil_name_mv.sql",
...
        "2024-08-15_databc_grants.sql"
    ]
    sql_statement = format_files_into_sql(files, sql_dir)
    
    operations = [
        migrations.RunSQL(create_materialized_views, migrations.RunSQL.noop),
    ]

A PR for this could target v8.

@chiatt chiatt added this to pipeline Dec 17, 2024
@bferguso
Copy link
Contributor Author

fyi @chiatt, @chrabyrd - Here's a starting ticket from our conversation today.

@jacobtylerwalls
Copy link
Member

I really like this @bferguso.

I mentioned at standup today my interest in ensuring there's a pathway to use whatever pattern we land on here with existing packages like django-pgtriggers and django-pgviews, but looking again at your proposal, I think it's already pretty compatible 👍

(Reason: the result of format_files_into_sql() can be passed into either a RunSQL forward operation directly as you modeled above or into a python class attribute expected by those packages that will then generate the migrations and reverse ops.)


The minor suggestion I would have on the implementation sketch would be to skip the concat'ing of all the strings together and instead return an array, since it's simple enough for a user to concat themselves with "/n".join() if they desire. (I'd probably want a RunSQL operation for each statement so I could control order and reverse for each one.)

@bferguso
Copy link
Contributor Author

@jacobtylerwalls - Thanks for having a look.

re: the concatenating - I went back and forth on this, and I got caught (probably overthinking) about how to manage the granularity of statements, and landed on the idea that if you wanted to run them independently then that could be done by either writing separate migrations, thereby putting each statement in it's own transaction, or by implementing one-by-one execution by using it like this:

    operations = [
        migrations.RunSQL(format_files_into_sql("2024-07-25_fossil_sample.fossil_name_vw.sql", sql_dir), migrations.RunSQL.noop),
        migrations.RunSQL(format_files_into_sql("2024-07-25_fossil_sample.fossil_name_mv.sql", sql_dir), migrations.RunSQL.noop),
        ...
    ]

Having said all that, I agree that it might be cleaner and give the user more control over how each statement is processed, and the more I think about it the less of an opinion I have.

I guess another question is if we even want to use lists of files within the migrations or if that's opening a can of worms we'd rather leave unopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants