Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress_table: add in_place option for modifying column encodings w/o creating a deep copy #59

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jessicabuzzelli
Copy link

Description & motivation

Back in 2020, Redshift introduced support for modifying column compression encodings in-place (some details here & official documentation here), which can be preferable to the older mechanism of recreating the table with the optimal encodings from a deep copy in certain cases.

For the past 2.5ish years at Lattice, my team has been using this modified version of the compress_table macro to leverage this "new" Redshift feature (credit to @neddonaldson!).

Changes:

  • add an optional in_place argument to compress_table that defaults to the existing deep copy behavior
  • track which columns will need to be re-encoded during build_optimized_definition, returning the optimized definition (no change) and an updated_encodings column name: encoding dict
  • introduce a alter_column_encodings helper to modify column encodings w/ an ALTER {{ table }} ALTER {{ column }} statement when in_place=True and at least one encoding is due to change

Example logs from build_optimized_definition & alter_column_encodings:

[...]
17:02:40.038226 [debug] [Thread-1  ]: SQL status: SELECT in 0 seconds
17:02:40.043104 [debug] [Thread-1  ]:     Changing event_id: lzo -> zstd (40.81%)
17:02:40.043394 [debug] [Thread-1  ]:     Changing event_source: lzo -> zstd (48.28%)
17:02:40.043550 [debug] [Thread-1  ]:     Changing next_event_at: az64 -> zstd (25.99%)
17:02:40.043691 [debug] [Thread-1  ]:     Changing event_sequence_number: az64 -> zstd (14.97%)
17:02:40.043829 [debug] [Thread-1  ]:     Changing source_sequence_number: az64 -> zstd (24.38%)
17:02:40.043965 [debug] [Thread-1  ]:     Changing next_page: lzo -> zstd (39.10%)
17:02:40.044099 [debug] [Thread-1  ]:     Changing next_page_at: az64 -> raw (0.00%)
17:02:40.044234 [debug] [Thread-1  ]:     Changing page_dwell_time_seconds: az64 -> zstd (7.43%)
17:02:40.044364 [debug] [Thread-1  ]: Not Changing final_page_in_session: raw
17:02:40.044499 [debug] [Thread-1  ]:     Changing product_page: lzo -> raw (0.00%)
17:02:40.044631 [debug] [Thread-1  ]:     Changing last_updated_at: az64 -> zstd (81.95%)
17:02:40.047105 [debug] [Thread-1  ]: Using redshift connection "model.lattice_dwh.model"
17:02:40.047358 [debug] [Thread-1  ]: On model.lattice_dwh.model: /* {"app": "dbt", "dbt_version": "1.4.6", "profile_name": "lattice_dwh", "target_name": "dev", "node_id": "model.lattice_dwh.model"} */

        -- only alter the column encodings if at least one column will change to avoid returning an empty query
      
        
      ALTER TABLE dbt_jbuzzelli.model ALTER event_id encode zstd; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER event_source encode zstd; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER next_event_at encode zstd; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER event_sequence_number encode zstd; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER source_sequence_number encode zstd; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER next_page encode zstd; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER next_page_at encode raw; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER page_dwell_time_seconds encode zstd; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER product_page encode raw; 
  
      ALTER TABLE dbt_jbuzzelli.model ALTER last_updated_at encode zstd; 
       
17:02:51.209146 [debug] [Thread-1  ]: SQL status: COMMIT in 11 seconds
[...]

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)

@jessicabuzzelli jessicabuzzelli changed the title add in_place option for compression w/o deep copy compress_table: add in_place option for modifying column encodings w/o creating a deep copy Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant