Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dbt build fails when flag restrict_direct_pg_catalog_access set True #914

Closed
2 tasks done
Tracked by #920
dave-tomkinson opened this issue Sep 19, 2024 · 9 comments · Fixed by #940
Closed
2 tasks done
Tracked by #920

[Bug] dbt build fails when flag restrict_direct_pg_catalog_access set True #914

dave-tomkinson opened this issue Sep 19, 2024 · 9 comments · Fixed by #940
Assignees
Labels
type:bug Something isn't working

Comments

@dave-tomkinson
Copy link

Is this a new bug in dbt-redshift?

  • I believe this is a new bug in dbt-redshift
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When I switch dbt cloud to versionless and I build with no flags set then the build works, but with Warnings to set the flag restrict_direct_pg_catalog_access

When I set the flag restrict_direct_pg_catalog_access to True the build fails, with a number of tables (probably around 10%) Failing

flags:
  restrict_direct_pg_catalog_access: True

Mostly with one of these Errors:

Redshift adapter: Redshift error: column "id" of relation "<RELATION_NAME>" already exists
Redshift adapter: Redshift error: syntax error at or near ")" in context 

Setting it to False or removing the flag means the build works again.

Expected Behavior

The build to succeed with no Warnings.

Steps To Reproduce

In dbt cloud with the project version set to versionless
Set the following flag in dbt_project.yml

flags:
  restrict_direct_pg_catalog_access: True

Run dbt build

Relevant log output

No response

Environment

- OS: dbt cloud
- Python: dbt cloud
- dbt-core: versionless - 'Running latest version'
- dbt-redshift: redshift=1.9.0-post5+04bd2c07f526e226c80f6527e8b0d49033bbd639

Additional Context

image
In the above

  1. Added the flag set to True in the dbt_project.yml
  2. Removed the flag completely
  3. Added the flag back in set to False (to check it wasn't due to mis-typed config)
  4. Change the flag back to True... you know just to check again.
@dave-tomkinson dave-tomkinson added type:bug Something isn't working triage:product labels Sep 19, 2024
@mikealfare mikealfare self-assigned this Sep 19, 2024
@mikealfare
Copy link
Contributor

mikealfare commented Sep 27, 2024

@dave-tomkinson Thanks for the bug report. Would you be able to provide a model with column names and configs for one of the models that fails? You can redact the logic that goes into each column if you prefer. I need to know things like whether the columns are quoted or not, whether they are lowercase or uppercase, etc. I suspect that the SDK returns columns that do not match the case, or which are/aren't quoted as expected. This could cause dbt to attempt to add a column that's already there because of a scenario like "id" <> "Id". I'll setup a skeleton of a reproduction in the meantime, but nothing beats real data. Thanks in advance!

One additional note, I'm making the assumption that this is an incremental model with on_schema_change set to either append_new_columns or sync_all_columns, hence the "column already exists" error. Please let me know if that's incorrect.

@dave-tomkinson
Copy link
Author

dave-tomkinson commented Sep 30, 2024

I think we're getting 3 main categories of error.
column "id" of relation "TABLE_NAME" already exists
Snapshot target is not a snapshot table (missing "dbt_scd_id", "dbt_valid_from", "dbt_valid_to")
and Syntax errors like syntax error at or near ")" in context ... or

syntax error at or near "when" in context "then update set
          
      
  
      when", at line 25, column 5

The snippet of generated code where the error is:

       on (
               DBT_INTERNAL_SOURCE.account_id = "<DATABASE>"."<SCHEMA>"."linkedin_accounts".account_id
           )

   
   when matched then update set
       
   

   when not matched then insert (
       
   )
   values (
       
   )

In the case of the ones with relation already exists it does look like we have:

{{ config(
    materialized="incremental", 
    sort="sort_key", 
    dist="dist_key",
    on_schema_change="append_new_columns"
) }}

A sanitised example of a model with the error syntax error at or near ")" in context ...
column names have been replaced but the capitalisation is as it was in the source.
The apply_env_specific_limits macro just adds a where clause to set a date and I've tried this model with that part completely removed and it fails with the same reason (and works when the flag restrict_direct_pg_catalog_access flag is switched to False)

{{ config(materialized="incremental", dist="col6") }}

with
    source_events as (
        select *
        from {{ source("SOURCE", "events") }}
        where
            {{ apply_env_specific_limits("batch_loaded::timestamptz") }}
            {% if is_incremental() %}
            and batch_loaded > (select max(batch_loaded) from {{ this }})
            {% endif %}
    )

select
    id::bigint as id,
    col::varchar(512) as col,
    col2::varchar(512) as col2,
    col3 as col3,
    col4::varchar(512) as col4,
    occurred_at::timestamptz as occurred_at,
    col5::varchar(128) as col5,
    col6::varchar(128) as col6,
    col7::varchar(56) as col7,
    region::varchar(8) as region,
    batch_loaded
from source_events
where occurred_at::timestamptz >= '2024-01-01'::timestamptz
and occurred_at::timestamptz < '2025-01-01'::timestamptz

All our column names are lower case with underscore separators.

Hope that helps.
I'm hoping to submit some unredacted notes with the support ticket.

I also noticed someone else having an issue with the flag in the dbt Redshift slack
https://getdbt.slack.com/archives/CJARVS0RY/p1727671736302229

@colin-rogers-dbt
Copy link
Contributor

@dave-tomkinson can you share what packages you're depending on in your project?

@dave-tomkinson
Copy link
Author

  • package: dbt-labs/dbt_utils
    version: 1.1.1
  • package: dbt-labs/codegen
    version: 0.12.1

We did have Elementary but have removed it (only mentioning as I saw that someone else in the dbt slack channel said they thought that could be the issue).

@mikealfare
Copy link
Contributor

@dave-tomkinson I searched through the code and the two packages you identified and I can't find the apply_env_specific_limits macro. Do you know where that's coming from? Is it possible that it's a custom macro?

@dave-tomkinson
Copy link
Author

dave-tomkinson commented Oct 11, 2024

Ah yes apologies, That is a custom macro, I thought I'd said that when I mentioned what it did above, but must have lost that in an edit before I posted.
I think that might be a red herring though. I've just been through our code base and removed all references to it and still get the errors.

I get the same error as before in for this model (bear in mind we're getting 3 classes of error across 43 models, this is just one example, not all the models have that macro).

From the logs, it looks like parsing our model is okay (the -- start edit indicates the line where I removed the macro)

  create temporary table
    "TEMP_TABLE__dbt_tmp081635203763"

    diststyle key distkey (col6)

  as (

with
    source_events as (
        select *
        from "DATABASE"."SOURCE"."TABLE"
        where
        -- start edit
        batch_loaded > (select max(batch_loaded) from "DATABASE"."dbt_SCHEMA_staging"."TABLE")

    )

select
    id::bigint as id,
    col::varchar(512) as col,
    col2::varchar(512) as col2,
    col3 as col3,
    col4::varchar(512) as col4,
    occurred_at::timestamptz as occurred_at,
    col5::varchar(128) as col5,
    col6::varchar(128) as col6,
    col7::varchar(56) as col7,
    region::varchar(8) as region,
    batch_loaded
from source_events
where occurred_at::timestamptz >= '2024-01-01'::timestamptz
and occurred_at::timestamptz < '2025-01-01'::timestamptz
  );

Then this part is where the dbt generated code pulls from the temp table back to the main model table; there's no columns in the brackets or columns in the select clause for the insert.
The error in the log is for the first problem it hits syntax error at or near ")" in context ...

 insert into "DATABASE"."dbt_SCHEMA_staging"."TABLE"" ()
    (
        select
        from "TEMP_TABLE__dbt_tmp081635203763"
    );

Just in case though, the macro code is

{% macro apply_env_specific_limits(sort_key) %}

    {%- if target.name == 'dev' %}
        -- Restrict 6 months for devs
        {{ sort_key }} > dateadd(month, -6, current_date)::timestamptz
    {%- elif target.name == 'ci' %}
        -- Restrict to 7 days for CI jobs
        {{ sort_key }} > dateadd(day, -7, current_date)::timestamptz
    {% else %}
        -- Prod gets everything.
        true
    {% endif %}

{% endmacro %}

@mikealfare
Copy link
Contributor

Thanks for the information @dave-tomkinson. This gives me something to go on. I'll continue troubleshooting.

@dave-tomkinson
Copy link
Author

Someone else posting on dbt Slack about facing the same issue. They post another example.
https://getdbt.slack.com/archives/CJARVS0RY/p1729520217596369

@krestogit
Copy link

krestogit commented Oct 22, 2024

Another example: #921 related to snapshots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment