Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2166] [Feature] Use ALTER TABLE to expand column / field length #335

Open
3 tasks done
alejandrofm opened this issue Feb 23, 2023 · 6 comments
Open
3 tasks done
Labels
good_first_issue Good for newcomers pkg:dbt-redshift type:enhancement New feature or request

Comments

@alejandrofm
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-redshift functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When there is a change in a field type and the new field size is "bigger" than the previous, DBT should ALTER the column not recreate on a new column.

Describe alternatives you've considered

ALTER TABLE schema.table ALTER COLUMN field TYPE varchar(1024);
instead of creating a new field, copying the data, renaming, and dropping the old column.
This is a notably better approach on Redshift because of the overhead it generates and the time spent later to run VACUUM on a big table with an update on all rows.

It includes another step, but would do the same, when possible when reducing field size:

  • Check if the old values fit on the new size (Select max(len(field)) from schema.table;)
  • If it fits, then ALTER TABLE
  • if not, then ERROR

Who will this benefit?

Redshift users who have changes to their schema

Are you interested in contributing this feature?

No response

Anything else?

No response

@github-actions github-actions bot changed the title [Feature] when lenght of field changes to a higher value run ALTER TABLE not recreate field [CT-2166] [Feature] when lenght of field changes to a higher value run ALTER TABLE not recreate field Feb 23, 2023
@dbeatty10 dbeatty10 changed the title [CT-2166] [Feature] when lenght of field changes to a higher value run ALTER TABLE not recreate field [CT-2166] [Feature] when length of field changes to a higher value run ALTER TABLE not recreate field Feb 23, 2023
@dbeatty10 dbeatty10 self-assigned this Feb 23, 2023
@alejandrofm alejandrofm changed the title [CT-2166] [Feature] when length of field changes to a higher value run ALTER TABLE not recreate field [CT-2166] [Feature] when length of field changes to a higher value run ALTER TABLE don't recreate field Feb 23, 2023
@dbeatty10
Copy link
Contributor

Thanks for reaching out @alejandrofm !

Agreed that a simple alter table ... is preferable if/when possible.

How to try this implementation

Here is the default implementation which all adapters will inherit unless they override the implementation of alter_column_type explicitly.

dbt-snowflake and dbt-spark are examples of dbt adapters that override alter_column_type with an implementation like you are suggesting:

So doing a similar implementation in Redshift might be as simple as adding the following to dbt/include/redshift/macros/adapters.sql:

{% macro redshift__alter_column_type(relation, column_name, new_column_type) -%}
  {% call statement('alter_column_type') %}
    alter table {{ relation }} alter column {{ adapter.quote(column_name) }} type {{ new_column_type }};
  {% endcall %}
{% endmacro %}

You could actually test this out yourself (in a non-production environment!) by adding that macro definition to the macros folder of your dbt project (in a file named macros/adapters.sql for example).

@dbeatty10 dbeatty10 removed their assignment Feb 23, 2023
@dbeatty10 dbeatty10 changed the title [CT-2166] [Feature] when length of field changes to a higher value run ALTER TABLE don't recreate field [CT-2166] [Feature] Use ALTER TABLE to expand column / field length Feb 23, 2023
@dbeatty10 dbeatty10 added the good_first_issue Good for newcomers label Mar 17, 2023
@rriley99
Copy link

rriley99 commented Jan 4, 2024

I believe that will throw an error as you cannot alter dist/sort columns directly, you will need to do something like this:

create table new_table
   distkey (dist_col)
   sortkey (sort_col)
as
select *
from old_table;

Then swap the name. At least that is the best I could come up with for our ops engineers.

@tin-homa
Copy link

+1 on this issue (also commenting to unstale). The current behavior did cause our pipeline to randomly break because some models are written with SELECT * UNION ALL. Admittedly we can just use dbt_utils macros to not require strict column order but we have many contributors/existing code and it's not so easy to enforce.

@dbeatty10
Copy link
Contributor

Thanks for adding a +1 and commenting to un-stale @tin-homa 👍

I don't know if it's relevant to your case at all, but here's another issue where we had a special recommendation for models that utilize union all:
#659 (comment)

@alejandrofm
Copy link
Author

alejandrofm commented Aug 3, 2024

Hi! I am bumping this that still happens and think it will be good to resolve for the dbt-redshift project.
Is A LOT more performant to alter a varchar field size than to copy all the data to a new column, and rename. Altering the size should be tried, at least on varchar fields.
Thanks!

@dbeatty10
Copy link
Contributor

@alejandrofm this isn't something we're able to prioritize right now, but we'd welcome a PR from the community.

There is a possible implementation in #335 (comment) that I didn't try out myself one way or another, so I'm not sure if it works or not.

If anyone finds that it works or comes up with their own working version of alter_column_type, we'd love to hear about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good_first_issue Good for newcomers pkg:dbt-redshift type:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants