Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: only evaluate modification time on matched local files #235

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

otosky
Copy link

@otosky otosky commented Dec 23, 2024

contributes to astronomer/astronomer-cosmos#1075

Description

There is a very hard to reproduce race-condition that occurs in some environments where dbt is used on a concurrently accessed filesystem, like Airflow with Celery workers. The symptom is one process walking the dbt project tree and hitting an error similar to:

  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/parser/search.py", line 74, in filesystem_search
    for result in find_matching(root, relative_dirs, ext, ignore_spec):
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/airflow/dbt_venv/lib/python3.11/site-packages/dbt/clients/system.py", line 79, in find_matching
    modification_time = os.path.getmtime(absolute_path)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 55, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpf6h80niz/models/intermediate/shipment/.int_easy_post_tracking.sql.GgjJgL'

It's unclear what actually creates those temp files (I would love to know!), but that error should not even occur, since the filename doesn't match any extension spec that dbt should be searching for. Note, I've seen these files created for all extensions that dbt would be concerned with (.md, .yml, .sql); it's not a problem localized to just model files or macros.

This PR reorders the operations in find_matching such that any file names that don't match the regex pattern are skipped before getmtime is even called.

Checklist

@otosky otosky requested a review from a team as a code owner December 23, 2024 20:44
@cla-bot cla-bot bot added the cla:yes label Dec 23, 2024
Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@oliver-tosky-sh
Copy link

Not sure who the correct person would be for a review here - @MichelleArk would you be able to assist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants