Skip to content

FINN Data Org - Data Warehouse - dbt code and data lineage quality checks

License

Notifications You must be signed in to change notification settings

finn-auto/pre-commit-dbt

 
 

Repository files navigation

dbt-pre-commit

pre-commit-dbt

CI black black

List of pre-commit hooks to ensure the quality of your dbt projects.

Goal

Quick ensure the quality of your dbt projects.

dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.

If this is the case, pre-commit-dbt is here to help you!

List of pre-commit-dbt hooks

💡 Click on hook name to view the details.

Model checks:

Script checks:

Source checks:

Macro checks:

Modifiers:

dbt commands:


If you have an idea for a new hook or you found a bug, let us know

Install

For detailed installation and usage, instructions see pre-commit.com site.

pip install pre-commit

Setup

  1. Create a file named .pre-commit-config.yaml in your dbt root folder.
  2. Add list of hooks you want to run befor every commit. E.g.:
repos:
- repo: https://github.com/offbi/pre-commit-dbt
  rev: v1.0.0
  hooks:
  - id: check-script-semicolon
  - id: check-script-has-no-table-name
  - id: dbt-test
  - id: dbt-docs-generate
  - id: check-model-has-all-columns
    name: Check columns - core
    files: ^models/core
  - id: check-model-has-all-columns
    name: Check columns - mart
    files: ^models/mart
  - id: check-model-columns-have-desc
    files: ^models/mart
  1. Optionally, run pre-commit install to set up the git hook scripts. With this, pre-commit will run automatically on git commit! You can also manually run pre-commit run after you stage all files you want to run. Or pre-commit run --all-files to run the hooks against all of the files (not only staged).

Run as Github Action

Unfortunately, you cannot natively use pre-commit-dbt if you are using dbt Cloud. But you can run checks after you push changes into Github.

pre-commit-dbt for the most of the hooks needs manifest.json (see requirements section in hook documentation), that is in the target folder. Since this target folder is usually in .gitignore, you need to generate it. For that you need to run dbt-compile (or dbt-run) command. To be able to compile dbt, you also need profiles.yml file with your credentials. To provide passwords and secrets use Github Secrets (see example).

So you want to e.g. run chach on number of tests:

repos:
- repo: https://github.com/offbi/pre-commit-dbt
 rev: v1.0.0
 hooks:
 - id: check-model-has-tests
   args: ["--test-cnt", "2", "--"]

To be able to run this in Github actions you need to modified it to:

repos:
- repo: https://github.com/offbi/pre-commit-dbt
 rev: v1.0.0
 hooks:
 - id: dbt-compile
   args: ["--cmd-flags", "++profiles-dir", "."]
 - id: check-model-has-tests
   args: ["--test-cnt", "2", "--"]

Create profiles.yml

First step is to create profiles.yml. E.g.

# example profiles.yml file
jaffle_shop:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: alice
      password: "{{ env_var('DB_PASSWORD') }}"
      port: 5432
      dbname: jaffle_shop
      schema: dbt_alice
      threads: 4

and store this file in project root ./profiles.yml.

Create new workflow

  • inside your Github repository create folder .github/workflows (unless it already exists).
  • create new file e.g. main.yml
  • specify your workflow e.g.:
name: pre-commit

on:
  pull_request:
  push:
  branches: [main]

jobs:
  pre-commit:
  runs-on: ubuntu-latest
  steps:
  - uses: actions/checkout@v2
  - uses: actions/setup-python@v2
  - id: file_changes
    uses: trilom/[email protected]
    with:
      output: ' '
  - uses: offbi/[email protected]
    env:
      DB_PASSWORD: ${{ secrets.SuperSecret }}
    with:
      args: run --files ${{ steps.file_changes.outputs.files}}

About

FINN Data Org - Data Warehouse - dbt code and data lineage quality checks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Dockerfile 0.2%