Merge branch 'master' of https://github.com/fabrice-etanchaud/dbt-dremio

fabrice-etanchaud · Oct 6, 2020 · 0dd78ca · 0dd78ca
2 parents e82341b + e6c8f8d
commit 0dd78ca
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,7 @@
 ![dbt-dremio](https://www.dremio.com/img/blog/gnarly-wave-data-lake.png)
 
+> *This project is developed during my spare time, along side my lead dev position at [MAIF-VIE](http://www.maif.fr), and aims to provide a competitive alternative solution for our current ETL stack.*
+
 # dbt-dremio
 [dbt](https://www.getdbt.com/)'s adapter for [dremio](https://www.dremio.com/)
 
@@ -19,7 +21,7 @@ os dependency :
 `pip install dbt-dremio`
 
 # Relation types
-A dremio's relation can be a view or a table. A reflection is a special kind of table : a view's materialization with a refresh policy.
+In dbt's world, A dremio's dataset can be either a `view` or a `table`. A dremio's reflection - a dataset's materialization with a refresh policy - will be mapped to a dbt's `materializedview`.
 
 # Databases
 As Dremio is a federation tool, dbt's queries can span locations and so, in dremio's adapter, databases are first class citizens.
@@ -50,6 +52,17 @@ Because dremio accepts almost any string character in the objects' names, the ad
  - if schema is equal to `no_schema`, the schema will not be included, leading to a simple `"database"."identifier"` being rendered
  - if schema spans multiple folders, each folder's name will be double quoted, leading to `"database"."folder"."sub-folder"."sub-sub-folder"."identifier"`.
 
+# Sources
+
+## Environments
+
+A same dremio installation could handle several data environments. In order to group sources by environment, you can use the undocumented `target.profile_name` or the adapter specific `environment` configuration to map environments between dremio and dbt :
+
+ - dremio's side: prefix all the sources' names of a specific environment `prd` with the environment's name, for example : `prd_crm, prd_hr, prd_accounting`
+ - dbt's side: prefix all source's database configs like this : `{{target.environment}}_crm` or `{{target.profile_name}}_crm`
+
+That way you can configure seperately input sources and output target `database`.
+
 # Materializations
 
 ## Dremio's SQL specificities
@@ -75,7 +88,7 @@ I tried to keep things secure setting up a kind of logical interface between the
 adapter's specific configuration|type|required|default
 -|-|-|-
 materialization_database|CTAS/DROP TABLE allowed source's name|no|`$scratch`
-materialization_schema||no|`no_schema`
+materialization_schema||no|`target.environment` (we don't want the environments to share the same underlying table)
 
     CREATE TABLE AS
     SELECT *
@@ -92,7 +105,7 @@ As dremio does not support query's bindings, the python value is converted as st
 adapter's specific configuration|type|required|default
 -|-|-|-
 materialization_database|CTAS/DROP TABLE allowed source's name|no|`$scratch`
-materialization_schema||no|`no_schema`
+materialization_schema||no|`target.environment`
 partition| the list of partitioning columns|no|
 sort| the list of sorting columns|no|
 
@@ -104,7 +117,7 @@ sort| the list of sorting columns|no|
 adapter's specific configuration|type|required|default
 -|-|-|-
 materialization_database|CTAS/DROP TABLE allowed source's name|no|`$scratch`
-materialization_schema||no|`no_schema`
+materialization_schema||no|`target.environment`
 partition| the list of partitioning columns|no|
 sort| the list of sorting columns|no|
 
@@ -162,20 +175,12 @@ This materialization creates a table without a view interface. It's an easy way
 # Connection
 Be careful to provide the right odbc driver's name in the adapter specific `driver` attribute, the one you gave to your dremio's odbc driver installation.
 
-## Environments
-
-You can use the undocumented `target.profile_name` or the adapter specific `environment` attribute as a way to map environments between dremio and dbt :
-
- - dremio's side: prefix all the sources' names of a specific environment `prd` with the environment's name, for example : `prd_crm, prd_hr, prd_accounting`
- - dbt's side: prefix all source's database configs with `{{target.environment}}_` or `{{target.profile_name}}_`
-That way you can configure seperately input sources and output target `database`.
-
 ## Managed or unmanaged target
 Thanks to [Ronald Damhof's article](https://prudenza.typepad.com/files/english---the-data-quadrant-model-interview-ronald-damhof.pdf), I wanted to have a clear separation between managed environments (prod, preprod...) and unmanaged ones (developers' environments). So there are two distinct targets : managed and unmanaged.
 
 In an unmanaged environment, if no target database is provided, all models are materialized in the user's home space, under the target schema.
 
-In a managed environment, target and custom databases and schemas are used as usual. If no target database is provided,  `target.profile_name` will be used as the default value.
+In a managed environment, target and custom databases and schemas are used as usual. If no target database is provided,  `target.environment` will be used as the default value.
 
 You will find in [the macros' directory](https://github.com/fabrice-etanchaud/dbt-dremio/tree/master/dbt/include/dremio/macros) an environment aware implementation for custom database and schema names.
 

diff --git a/setup.py b/setup.py
@@ -12,7 +12,7 @@
     description=description,
     long_description=description,
     author="Fabrice Etanchaud",
-    author_email="fabrice.etanchaud@maif.fr",
+    author_email="fabrice.etanchaud@netc.fr",
     url="https://github.com/fabrice-etanchaud/dbt-dremio",
     packages=find_packages(),
     package_data={