Skip to content
This repository has been archived by the owner on Oct 15, 2022. It is now read-only.

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
fabrice-etanchaud committed Oct 6, 2020
2 parents e82341b + e6c8f8d commit 0dd78ca
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 14 deletions.
31 changes: 18 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
![dbt-dremio](https://www.dremio.com/img/blog/gnarly-wave-data-lake.png)

> *This project is developed during my spare time, along side my lead dev position at [MAIF-VIE](http://www.maif.fr), and aims to provide a competitive alternative solution for our current ETL stack.*
# dbt-dremio
[dbt](https://www.getdbt.com/)'s adapter for [dremio](https://www.dremio.com/)

Expand All @@ -19,7 +21,7 @@ os dependency :
`pip install dbt-dremio`

# Relation types
A dremio's relation can be a view or a table. A reflection is a special kind of table : a view's materialization with a refresh policy.
In dbt's world, A dremio's dataset can be either a `view` or a `table`. A dremio's reflection - a dataset's materialization with a refresh policy - will be mapped to a dbt's `materializedview`.

# Databases
As Dremio is a federation tool, dbt's queries can span locations and so, in dremio's adapter, databases are first class citizens.
Expand Down Expand Up @@ -50,6 +52,17 @@ Because dremio accepts almost any string character in the objects' names, the ad
- if schema is equal to `no_schema`, the schema will not be included, leading to a simple `"database"."identifier"` being rendered
- if schema spans multiple folders, each folder's name will be double quoted, leading to `"database"."folder"."sub-folder"."sub-sub-folder"."identifier"`.

# Sources

## Environments

A same dremio installation could handle several data environments. In order to group sources by environment, you can use the undocumented `target.profile_name` or the adapter specific `environment` configuration to map environments between dremio and dbt :

- dremio's side: prefix all the sources' names of a specific environment `prd` with the environment's name, for example : `prd_crm, prd_hr, prd_accounting`
- dbt's side: prefix all source's database configs like this : `{{target.environment}}_crm` or `{{target.profile_name}}_crm`

That way you can configure seperately input sources and output target `database`.

# Materializations

## Dremio's SQL specificities
Expand All @@ -75,7 +88,7 @@ I tried to keep things secure setting up a kind of logical interface between the
adapter's specific configuration|type|required|default
-|-|-|-
materialization_database|CTAS/DROP TABLE allowed source's name|no|`$scratch`
materialization_schema||no|`no_schema`
materialization_schema||no|`target.environment` (we don't want the environments to share the same underlying table)

CREATE TABLE AS
SELECT *
Expand All @@ -92,7 +105,7 @@ As dremio does not support query's bindings, the python value is converted as st
adapter's specific configuration|type|required|default
-|-|-|-
materialization_database|CTAS/DROP TABLE allowed source's name|no|`$scratch`
materialization_schema||no|`no_schema`
materialization_schema||no|`target.environment`
partition| the list of partitioning columns|no|
sort| the list of sorting columns|no|

Expand All @@ -104,7 +117,7 @@ sort| the list of sorting columns|no|
adapter's specific configuration|type|required|default
-|-|-|-
materialization_database|CTAS/DROP TABLE allowed source's name|no|`$scratch`
materialization_schema||no|`no_schema`
materialization_schema||no|`target.environment`
partition| the list of partitioning columns|no|
sort| the list of sorting columns|no|

Expand Down Expand Up @@ -162,20 +175,12 @@ This materialization creates a table without a view interface. It's an easy way
# Connection
Be careful to provide the right odbc driver's name in the adapter specific `driver` attribute, the one you gave to your dremio's odbc driver installation.

## Environments

You can use the undocumented `target.profile_name` or the adapter specific `environment` attribute as a way to map environments between dremio and dbt :

- dremio's side: prefix all the sources' names of a specific environment `prd` with the environment's name, for example : `prd_crm, prd_hr, prd_accounting`
- dbt's side: prefix all source's database configs with `{{target.environment}}_` or `{{target.profile_name}}_`
That way you can configure seperately input sources and output target `database`.

## Managed or unmanaged target
Thanks to [Ronald Damhof's article](https://prudenza.typepad.com/files/english---the-data-quadrant-model-interview-ronald-damhof.pdf), I wanted to have a clear separation between managed environments (prod, preprod...) and unmanaged ones (developers' environments). So there are two distinct targets : managed and unmanaged.

In an unmanaged environment, if no target database is provided, all models are materialized in the user's home space, under the target schema.

In a managed environment, target and custom databases and schemas are used as usual. If no target database is provided, `target.profile_name` will be used as the default value.
In a managed environment, target and custom databases and schemas are used as usual. If no target database is provided, `target.environment` will be used as the default value.

You will find in [the macros' directory](https://github.com/fabrice-etanchaud/dbt-dremio/tree/master/dbt/include/dremio/macros) an environment aware implementation for custom database and schema names.

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
description=description,
long_description=description,
author="Fabrice Etanchaud",
author_email="fabrice.etanchaud@maif.fr",
author_email="fabrice.etanchaud@netc.fr",
url="https://github.com/fabrice-etanchaud/dbt-dremio",
packages=find_packages(),
package_data={
Expand Down

0 comments on commit 0dd78ca

Please sign in to comment.