Welcome to Eon Collective dbt® Practices

Versions
- Revision History
- dbt Project Version
Overview
Application Functional Overview
Using This dbt Project
dbt Commands: How to Run
Control/Audit Framework/Monitoring
Testing
- Continuos Integration: Bitbucket Pipelines
  - Credentials
  - Pipe Setup
- dbt Tests
Deployment
- Continuous Deployment: dbt_runner
Support Team Structure
Service Level Agreements (SLA)
Troubleshoot/FAQs
- Known Issues
- Debugging
Migration Notes
Resources
Cutover Plan

Versions ↑

Revision History ↑

Version	Revision Date
1.0	May 06, 2023

dbt Project Version ↑

Version	Latest Version used for testing
1.5	1.5

Overview ↑

This project makes use of dbt, a data transformation tool, to derive insights from data stored in Snowflake.

It demonstrates a simple implementation of a dbt ELT job.

Objective

To establish an organizational standard for transforming and analyzing data stored in a data warehouse using the data build tool (dbt).

↑

Application Functional Overview

The high level overview of application overview.

graph LR
A[orders] -- ELT --> W{this dbt project}
B[customers] -- ELT --> W
C[payments] -- ELT --> W
W -- dbt build: on_error --> X(notifications sent)
X -- via --> V>email]
X -- via --> U>slack]
W -- dbt build: on_success --> Y{snowflake objects built}
W -- dbt build --> S((metadata))
S -- stored in --> P>manifest.json]
S -- stored in --> Q>run_results.json]
W -- dbt docs generate --> O((docs site))
O -- generates --> S
S -- stored in --> R>catalog.json]
Y -- consumed by --> T>application]
Y -- consumed by --> Z>dashboard]

↑

Project Input

↑

Source Data from Raw Layer

The following raw tables are consumed by this project

DATABASE	SCHEMA	TABLE
RAW	JAFFLE_SHOP	CUSTOMERS
RAW	JAFFLE_SHOP	ORDERS
RAW	STRIPE	PAYMENT

↑

Static/Seeds

The following static tables are used by this project. Non-seed files are located in the 's3://bose-bucket/*' folder. They get COPY INTO the appropriate tables using a storage integration.

FILENAME	DATABASE	SCHEMA	SEED	NOTES
	RAW	DBT_JANEROSE	DBT_PROJECT_EVALUATOR_EXCEPTIONS

↑

Cleansing/Transformation

This project handles the following transformations/cleansing or curation details during its execution.

Project output

DATABASE	SCHEMA	MODEL	MATERIALIZATION	TAGS
RAW	DBT_JANEROSE	FCT_DBT__MODEL_INFORMATION	VIEW	[]
RAW	DBT_JANEROSE	FCT_DBT__TEST_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	FCT_DBT__SNAPSHOT_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	DIM_DBT__TESTS	VIEW	[]
RAW	DBT_JANEROSE	DIM_DBT__SNAPSHOTS	VIEW	[]
RAW	DBT_JANEROSE	FCT_DBT__SEED_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	FCT_DBT__MODEL_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	DIM_DBT__CURRENT_MODELS	VIEW	[]
RAW	DBT_JANEROSE	DIM_DBT__EXPOSURES	VIEW	[]
RAW	DBT_JANEROSE	DIM_DBT__MODELS	VIEW	[]
RAW	DBT_JANEROSE	DIM_DBT__SOURCES	VIEW	[]
RAW	DBT_JANEROSE	DIM_DBT__SEEDS	VIEW	[]
RAW	DBT_JANEROSE	FCT_DBT__INVOCATIONS	VIEW	[]
RAW	DBT_JANEROSE	MODEL_INFORMATION	INCREMENTAL	[]
RAW	DBT_JANEROSE	SNAPSHOT_EXECUTIONS	INCREMENTAL	[]
RAW	DBT_JANEROSE	TEST_EXECUTIONS	INCREMENTAL	[]
RAW	DBT_JANEROSE	EXPOSURES	INCREMENTAL	[]
RAW	DBT_JANEROSE	MODELS	INCREMENTAL	[]
RAW	DBT_JANEROSE	SEEDS	INCREMENTAL	[]
RAW	DBT_JANEROSE	SOURCES	INCREMENTAL	[]
RAW	DBT_JANEROSE	TESTS	INCREMENTAL	[]
RAW	DBT_JANEROSE	SNAPSHOTS	INCREMENTAL	[]
RAW	DBT_JANEROSE	SEED_EXECUTIONS	INCREMENTAL	[]
RAW	DBT_JANEROSE	MODEL_EXECUTIONS	INCREMENTAL	[]
RAW	DBT_JANEROSE	INVOCATIONS	INCREMENTAL	[]
RAW	DBT_JANEROSE	STG_DBT__INVOCATIONS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__MODEL_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__SEED_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__SNAPSHOTS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__TESTS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__EXPOSURES	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__MODEL_INFORMATION	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__SNAPSHOT_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__TEST_EXECUTIONS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__SEEDS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__MODELS	VIEW	[]
RAW	DBT_JANEROSE	STG_DBT__SOURCES	VIEW	[]

↑

Data Lineage

The full data lineage image isn't reader friendly when outputted to an image as it is too long and too wide.

To generate and serve the docs on your machine see dbt Commands: How to Run.

Note: In dbt data lineage diagrams green bubbles are sources and blue bubbles are models.

↑

Sample Data Lineages

Lineage graph for jaffle shop

Lineage graph for Stripe payments

Functional Context ↑

Using This dbt Project ↑

Project structure ↑

The project structure defined for dbt projects is as defined below:

dbt_snowflake_tutorial_v2
 ┣ analyses
 ┃ ┣ interactive_ad_hoc_testing.sql
 ┃ ┣ orders_by_day.sql
 ┃ ┗ total_revenues.sql
 ┣ etc
 ┃ ┗ images
 ┃ ┃ ┣ jaffle_shop_lineage.png
 ┃ ┃ ┗ stripe_payment_lineage.png
 ┃ ┗ project_tree_structure.md
 ┣ logs
 ┃ ┗ dbt.log
 ┣ macros
 ┃ ┣ cents_to_dollars.sql
 ┃ ┣ clean_stale_models.sql
 ┃ ┣ grant_select.sql
 ┃ ┣ limit_data_dev.sql
 ┃ ┗ union_tables_by_prefix.sql
 ┣ models
 ┃ ┣ marts
 ┃ ┃ ┗ core
 ┃ ┃ ┃ ┣ dim_customers.sql
 ┃ ┃ ┃ ┣ fct_orders.sql
 ┃ ┃ ┃ ┣ int_orders_pivoted.sql
 ┃ ┃ ┃ ┣ int_order_status_pivoted.sql
 ┃ ┃ ┃ ┗ str_pivoted_customers.sql
 ┃ ┣ staging
 ┃ ┃ ┣ jaffle_shop
 ┃ ┃ ┃ ┣ jaffle_shop.md
 ┃ ┃ ┃ ┣ sources.yml
 ┃ ┃ ┃ ┣ src_jaffle_shop.yml
 ┃ ┃ ┃ ┣ stg_customers.sql
 ┃ ┃ ┃ ┣ stg_jaffle_shop.yml
 ┃ ┃ ┃ ┣ stg_jaffle_shop_orders.sql
 ┃ ┃ ┃ ┗ stg_orders.sql
 ┃ ┃ ┣ stripe
 ┃ ┃ ┃ ┣ sources.yml
 ┃ ┃ ┃ ┣ src_stripe.yml
 ┃ ┃ ┃ ┣ stg_payments.sql
 ┃ ┃ ┃ ┣ stg_stripe.yml
 ┃ ┃ ┃ ┗ stripe.md
 ┃ ┃ ┗ all_dates.sql
 ┃ ┗ .sqlfluff
 ┗ tests
   ┗ assert_positive_total_for_payments.sql

This same project tree view above is persisted/versioned under etc/project_tree_structure.md - If project changes, please update both markdown documents.

↑

Permissions

In order for this project to run successfuly, the following permissions are needed on Snowflake.

DATABASE NAME	SCHEMA	TABLE NAME	PERMISSION
`<ENV>_RAW`	BODS, SAPE03, SAPSHP, SFTPGTSI	ALL	READ
`<ENV>_MARTS`	UMM_EMEA_REF_DATA, UMM_EMEA_STAGE, UMM_EMEA_DW	ALL	READ & WRITE

NOTE: Replace <ENV> with the any of the following environment codes: DEV, TEST or PROD

Project Settings ↑

The project settings for this projects are persisted in the dbt_project.yml file

Application Variables ↑

This application uses the following variables defined in the dbt_project.yml file under the vars section.

Application Argument	Default Value	Options	Description/Notes
env	TEST	`TEST` `DEV` `PROD`	Controls which environment the source data comes from

The variables should be passed to the application from the command line at runtime. Examples:

Run a monthly refresh using the TEST_RAW environment dbt run --vars '{refresh_type: MONTHLY}'
Run a refresh from 201801 - 202201 using the PROD_RAW environment dbt run --vars '{refresh_type: CUSTOM, fmthid_start: 201801, fmth_id_end: 202201, env: PROD}'

Project Packages/Dependencies ↑

Package	Version	Usage
codegen	0.9.0	Macros that generate dbt code, and log it to the command line.
audit_helper	0.7.0	Useful macros when performing data audits
dbt_meta_testing	0.3.6	Contains macros to assert test and documentation coverage from dbt_project.yml configuration settings.
dbt_readme_helper	0.1.1	Logs information about source, seed, & model nodes to log files in pipe-delimited csv format.
dbt_artifacts_eon	0.2.0	Builds a mart of tables and views describing the dbt project
dbt_project_evaluator	0.5.0	Highlights areas of a dbt project that are misaligned with dbt Labs' best practices.

Development Environment Setup ↑

There are two possible environments where development can happen while building in Bose COE environment. Namely:

Local Development - this can be IDE that is leveraging dbt core
Cloud IDE - dbt cloud environment

Below are instructions to get setup on both of these environments.

Local Development: VSCode/Atom etc ↑

Install Requirements ↑

Install dbt.
Optionally, you can set up venv to allow for environment switching.

Setup ↑

Set up a profile called dbt_project_template to connect to a data warehouse by following these instructions. If you have access to a data warehouse, you can use those credentials.

We recommend setting your target schema to be a new schema (dbt will create the schema for you, as long as you have the right privileges). If you don't have access to an existing data warehouse, you can also setup a local postgres database and connect to it in your profile. Open your terminal and navigate to your profiles.yml. This is in the .dbt hidden folder on your computer, located in your home directory.

On macOS, you can open the file from your terminal similar to this (which is using the Atom text editor to open the file):

$ atom ~/.dbt/profiles.yml

Insert the following into your profiles.yml file and change out the bracketed lines with your own information. Here is further documentation for setting up your profile.

dbt_project_template:
    outputs:
        dev:
        type: snowflake
        account: <account>.<region>

        # User/password auth
        user: <email/username>
        authenticator: externalbrowser

        role: <insert role>
        database: <insert db>
        warehouse: <insert warehouse>
        schema: DBT_<YOUR_USERNAME>
        threads: 50
        client_session_keep_alive: True
        query_tag: <insert tag>
    target: dev

Configuration Key	Definition
my_project	This is defining a profile - this specific name should be the profile that is referenced in our dbt_project.yml
target: dev	This is the default environment that will be used during our runs.
outputs:	This is a prompt to start defining targets and their configurations. You likely won't need more than `dev`, but this and any other targets you define can be used to accomplish certain functionalities throughout dbt.
dev:	This is defining a target named `dev`.
type: [warehouse_name]	This is the type of target connection we are using, based on our warehouse. For Bose COE, snowflake should be used: Can be [env]_EDW_WH or [env]_MARTS_WH for example in DEV environment, the warehouse for EDW layer would be `DEV_EDW_WH`
threads: 50	This is the amount of concurrent models that can run against our warehouse, for this user, at one time when conducting a `dbt run`
account: .1	Change this out to the warehouse's account. Defaulted to Bose COE Snowflake URL
user: [your_username]	Change this to use your own username that you use to log in to the warehouse. Use Bose® userid of format ABC1234
authenticator: authenticator	To use with Bose® Single Sign On, use `authenticator` as the value to be prompted for a sign on window from Snowflake. Must use MFA during sign in.
role: transformer	This is the role that has the correct permissions for working in this project.
database: analytics	This is the database name where our models will build
schema: dbt_[userid]	Change this to a custom name. Follow the convention `dbt_userid`. This is the schema that models will build into / test from when conducting runs locally.

Clone this repository

$ git@github.com:eon-collective/dbt_snowflake_tutorial_v2.git

Change into the dbt_snowflake_tutorial_v2 directory from the command line:
```
$ cd dbt_snowflake_tutorial_v2
```
Try running the following generic/common commands one at a time from your command line:
- dbt debug - tests your connection. If this fails, check your profiles.yml.
- dbt deps - installs any packages defined in the packages.yml file.
For project specific commands see the section below dbt Commands: How to Run

Developing in the Cloud IDE ↑

The easiest way to contribute to this project is by developing in dbt Cloud. If you have access, navigate to the develop tab in the menu and fill out any required information to get connected.

In the command line bar at the bottom of the interface, run the following generic/common command:

dbt deps - installs any packages defined in the packages.yml file.

For project specific commands see the section below: dbt Commands: How to Run

dbt Commands: How to Run ↑

Assuming you have your environment setup as described in previous section, you can run the following commands to manage this dbt project during deployment.

To install dependencies: ↑

See dbt docs on deps command

dbt deps

To compile the dbt project ↑

Generates executable SQL from source model, test, and analysis files. You can find these compiled SQL files in the target/ directory of your dbt project in your development environment. See dbt docs on compile

dbt compile

To get the documentation: ↑

Generate documentation for the project. See dbt docs on docs generate

dbt docs generate --no-compile

NOTE: since this project uses call statements in some of the dbt model that execute directly in the DW (snowflake), if dbt runs any compile step which includes dbt docs generate, existing data in the mart is lost since the call statements execute with delete from {{this}} <where condition>

To view the generated in local IDE, run the following command to start a webserver on default port 8000 and serve the generated documents. See dbt docs on docs serve

dbt docs serve

Loading static/seed files ↑

Builds any .csv files as tables in the warehouse. These are located in the data/seed folder of the project. Otherwise specified in the dbt_project.yml. This materializes the CSVs as tables in your target schema. Note that not all projects require this step since the assummption is that raw data is already in your warehouse. See dbt docs on seed

dbt seed

See dbt docs on seed

Mart build ↑

There are three types of Mart refreshes that get run at different times of the fiscal month. The variables passed to the project determines how the Mart will get refreshed. To understand the three refresh types assume that today is April 14, 2022 (202204):

Full Refresh - will use the IDs from April 2019 (201904) to April 2022 (202204)
Monthly Refresh - will use the IDs from January 2022 (202201) to March 2022 (202203)
Weekly Refresh - will use the IDs from February 2022 (202202) to April 2022 (202204)

The Default Values (SOFT Full Refresh w/ TEST_RAW Environment)

dbt run

A Monthly Refresh w/ the TEST_RAW environment

dbt run --vars '{refresh_type: MONTHLY}'

A Custom Refresh from 201801 - 202201 w/ the PROD_RAW Environment

dbt run --vars '{refresh_type: CUSTOM, fmthid_start: 201801, fmth_id_end: 202201, env: PROD}'

A HARD Full Refresh w/ the PROD_RAW Environment

dbt run --full-refresh --vars'{env: PROD}'

Both a HARD & SOFT Full Refresh reloads the previous 36 months of data. A HARD refresh drops and recreates the table whereas a SOFT refresh drops and recreates the data.

List Project resources ↑

List project resource. dbt docs on ls command and reference to resource selection syntax

dbt ls

CRT Models ↑

In the models/onboarding/* folder there is a collection of models with the prefix crt_. These models are a reference/tool to help migrate the data from Redshift to Snowflake. They simply execute a CREATE TABLE IF NOT EXISTS command to create tables in Snowflake with structures identical to Redshift. Once those tables are converted to DBT models the crt_model becomes irrelevant because DBT uses the CREATE OR REPLACE command.

Control/Audit Framework ↑

Introduction ↑

Mart Auditor is a dbt package developed to capture runtime information about marts being executed in Bose COE data loads. These runtime metrics can be used to analyze data marts executed in past and for auditing purpose.

Design

Any data mart that is developed in Bose COE project can plug in Mart Auditor package to capture data mart run time metrics. Mart Auditor creates two different tables i.e., Batch and Batch_Loging for all marts in database. These tables can be used to query to know past executions, errors, execution retries etc.

Mart Auditor is a light weight dbt package that inserts records into above tables as mart progresses and creates table or views as part of model creating and execution. There is only one entry per execution of Mart in Batch table and multiple entries one per model in batch_logging table. Association between batch and batch_logging table is: 1 -> M

Batch table structure ↑

To get the DDL Structure from Snowflake, run the command: select get_ddl('table', 'dev_marts.cf.batch'); and it will output the following:

create or replace TABLE BATCH (
	BATCHID NUMBER(38,0),
	RETRY_CNT NUMBER(38,0),
	DBT_MART VARCHAR(16777216),
	DBT_MART_REFRESH_TYPE VARCHAR(16777216),
	START_DATE TIMESTAMP_NTZ(9),
	END_DATE TIMESTAMP_NTZ(9),
	DURATION_IN_SECONDS NUMBER(38,0),
	LOAD_STATUS VARCHAR(16777216)
);

Batch_logging table structure ↑

To get the DDL Structure from Snowflake, run the command: select get_ddl('table', 'dev_marts.cf.batch_logging'); and it will output the following:

create or replace TABLE BATCH_LOGGING (
	BLG_KEY NUMBER(38,0),
	BATCHID NUMBER(38,0),
	DBT_MART VARCHAR(16777216),
	DBT_MART_REFRESH_TYPE VARCHAR(16777216),
	DBT_MODEL_NAME VARCHAR(16777216),
	SF_TABLE_NAME VARCHAR(16777216),
	ROWS_BEFORE NUMBER(38,0),
	ROWS_AFTER NUMBER(38,0),
	STARTDATE TIMESTAMP_NTZ(9),
	COMPLETEDATE TIMESTAMP_NTZ(9),
	ERROR_MESSAGE VARCHAR(16777216)
);

Macros developed in Mart Auditor: ↑

Total 4 macros are developed as part of Mart Auditor dbt package. These macros can be called by any dbt mart by deploying into their dbt mart project.

Insert_batch : ↑

This macro inserts a record into batch table as soon as mart stars execution. Status in batch table will be inserted as â€œStartedâ€� along with Mart name, BatchID, StartDateTime etc.

insert_batch_logging : ↑

This macro inserts a record into batch_logging table as soon as any model in dbt mart stars executing. It will insert all the information about models that we execute as part of data mart. Important information like model name, mart name, rows in table/view before model executes, model execution start date time and model execution status as â€œStartedâ€� will be captured into table.

update_batch_logging: ↑

This macro is developed to validate the model executed by data mart. If a model in mart is executed successfully then this macro updates the previously inserted record in batch_logging table with updated values like status will be updated as completed, rows in table after model executes, end date time for model and Table or View name that macro creates in database.

Update_batch : ↑

Once all the models in data mart are executed successfully then this macro updates batch table record by changing status to completed and captures information like duration in seconds, retry_count etc. If any of the model in the mart is failed then this macro updates batch table with failed status and increase the retry count value by one.

Steps to use Mart Auditor in your dbt mart: ↑

Note: Mart Auditor is available in bitbucket at following location:

Bitbucket Location:

Add Mart Auditor package in packages.yml file like below example:
Execute dbt deps command to deploy mart auditor package into your dbt mart
You can find mart auditor deployed into your dbt packages folder of your project
You can call above macros in your dbt_proejct.yml file with below example:
1. Insert_batch(dbt_mart_name, dbt_mart_refresh_type) : This macro accepts two parameters for Mart_name and Mart_refresh_type. Mart_refresh_type is optional parameter and if you donâ€™t provide refresh type parameter then it will insert NULL value into batch table. Call this macro at on-run-start section of dbt_project.yml.
2. insert_batch_logging (dbt_mart_name, dbt_model_name, table_name, refresh_type): This macro accepts four parameters for Mart_name, model_name, table_name and refresh_type. Table_name and refresh_type are optional parameters where table name takes model name and refresh_type takes null value by default if you don't supply. Call this macro at pre-hook section of dbt_project.yml. Pass current model name to the macro with keyword â€œthisâ€�.
3. Update_batch_logging (dbt_mart_name, dbt_model_name, table_name, refresh_type): This macro accepts four parameters for Mart_name, model_name, table_name and refresh_type. Table_name and refresh_type are optional parameters where table name takes model name and refresh_type takes null value by default if you don't supply. Call this macro at post-hook section of dbt_project.yml
4. update_batch (dbt_mart_name, dbt_mart_refresh_type) : This macro accepts two parameters for Mart_name and Mart_refresh_type. Mart_refresh_type is optional parameter and if you donâ€™t provide refresh type parameter then it will insert null value into batch table. Call this macro at on-run-end section of dbt_project.yml.

Dataops Queries for this auditing this mart run are provided in the analysis folder with the file name mart_auditor_dataops.sql

Testing ↑

dbt docs on testing

Continuos Integration: Bitbucket Pipelines ↑

Credentials ↑

Credentials in a bitbucket pipe are passed through Repository variables. See more on Bitbucket pipeline and environment variables: User-defined variables. The following variables have been configured with DEV service account details.

Custom Variable	Variable Description
DBT_TARGET_TYPE	Type of DW for dbt to use : default is `snowflake`
DBT_TARGET_URL	snowflake url: .
DBT_USER	snowflake username for authentication
DBT_PASSWORD	snowflake password for authentication

Most parameters are configured as a secret in Bitbucket and cannot be viewed.

Bitbucket Pipe Setup ↑

This dbt project CI pipe is configured as step in a shared bitbucket pipeline yml file (at root level of this repo). The following is the settings for this mart.

      - step:
          name: 'dbt_project_template_DBT - Compile'
          trigger: manual
          script:
            - echo "Starting build for dbt_project_template_DBT"
            - echo "$DBT_PROJECT_BITBUCKET_PUB_CERT" > ~/.ssh/authorized_keys
            - mkdir ~/.dbt
            - touch ~/.dbt/profiles.yml
            - 'echo -e "dbt_project_template:\n  outputs:\n    dev:\n      type: $DBT_TARGET_TYPE\n      account: $DBT_TARGET_URL\n\n      user: $DBT_USER\n      password: $DBT_PASSWORD\n\n      role: $DBT_ROLE\n      database: DEV_MARTS\n      warehouse: $DBT_WH\n      schema: dbt_dbt_project_template\n      threads: $DBT_NUM_THREADS\n      client_session_keep_alive: True\n      query_tag: dbt\n  target: dev\n " >> ~/.dbt/profiles.yml'
            - cd dbt_project_template/
            - ls -ltr
            - dbt deps
            - dbt compile
            - echo "Done build for dbt_project_template_DBT"

dbt Tests ↑

Deployment ↑

Continuous Deployment: dbt_runner ↑

This project can be run in Bose COE TMC environment using the dbt_runner template. This template is available on all environments and leveraged during from SIT to Deployment phase of a build project. The dbt_runner artifcat/template is responsible for:

Cloning the latest code from specified branch and url in a repo
Setting up a profiles.yml for the tasks run based on the provided credentials.
Execute a dbt deps to get the project prepared for any commands passed
Run the provided dbt commands - send notification incase of error during this run
At job end, upload the dbt artifact files to S3 for technical lineage harvesting. The following files are uploaded to S3: manifest.json, catalog.json (if available) and the run_results.json

The three Bose COE environments available for building and deployment of dbt projects:

ENVIRONMENT	PURPOSE	Branch used
DEV	System Integration Testing	feature/[more_descptive_name]
TEST	User Acceptance Testing	feature/[more_descptive_name]
PROD	Deployment (Go Live)	main

All developer unit testing are done on the local/Cloud IDE. See Setting development environment

Bose® corp leverages “continuous deployment” or “direct promotion” for branching strategy. The diagram below illustrates the branching strategy as used at Bose® for dbt project development.

graph TD
A((main)) -- clone --> B{FeatureA branch}
B -- pull request --> A
B -- fork --> D>builder1 branch]
D -- pull request--> B
B -- fork --> E>builder2 branch]
E -- pull request --> B
A -- clone --> F{FeatureB branch}
F -- pull request --> A 
F -- fork --> G>builder3 branch]
G -- pull request--> F
F -- fork --> H>builder4 branch]
H -- pull request --> F

Credentials ↑

The dbt_runner in Bose COE TMC uses Credentials Vaulting Framework to store credentials for various projects. dbt_runner needs two main credentials to run. The following credentials are used to run this project across all Bose COE Environments

\	DEV	TEST	PROD
Bose COE_SNOWFLAKE_CredVaultKey	`dev/snowflake/mart/dbt_project_template`	`test/snowflake/mart/dbt_project_template`	`prod/snowflake/mart/dbt_project_template`
Bose COE_BUGTRACKER_CredVaultKey	`N/A- Not Applicable`	`N/A- Not Applicable`	`bose/prod/operations/alerts/bug_tracker_service`

Alerts ↑

The dbt_runner template has a notification/alerting mechanism incase of any WARNING, ERROR or FATAL messages. To configure notification in dbt_runner for this project;

In the advanced parameter Email_Notify parameter, provide the value true
In the advanced parameter Bose COE_BUGTRACKER_CredVaultKey parameter, provide a valid credential key provided by the Bose COE platform team. See the values to use per environment in above section

Job Names ↑

The following is a list of dbt_runner tasks needed to run this mart successfully end to end

JOB_NAME	JOB_DESCRIPTION	JOB_RUN_PARAMETERS	NOTES
DBT_RUNNER-MCA-SEED	Load static files with seed	`dbt seed`	run one time during project deployment/initialization or incase of a change in the csv files within the project

Adhoc Runs

(Re-)Loading seeded files ↑

Incase of changes to the static files that are hosted within the dbt project in the data or seeds folder (varies with the project as specified by the seed-paths config in dbt_project.yml). Below are the steps that should be followed to reload the seeded data once this project has been deployed to PRODUCTION.

Create a new feature branch from main
Checkout feature branch to IDE
Replace the existing files in the data/ref_data_seeds folder.
Commit the changes to seed files and push to remote feature branch
Create a Pull Request from feature branch to main
Merge the feature branch into main
Run the TMC Task DBT_RUNNER-MCA-SEED to refresh the Ref Data for MCA project.
- If no new files are provided but Seed data has to be refreshed, then only Run the TMC Task DBT_RUNNER-MCA-SEED

(Re-)Loading static S3 files ↑

Incase of changes to the static files that hosted within Bose COE S3 folders Below steps should be followed to reload the snowflake data once this project has been deployed to PRODUCTION.

Load the new files to S3 production bucket: check in macros/init_project.sql for precise S3 data locations.
Run the TMC Task DBT_RUNNER-MARTS-GTSi_EMEA-INIT_PROJECT to refresh the Ref Data for UMM_EMEA.
- If no new files are provided but Seed data has to be refreshed, then only Run the TMC Task DBT_RUNNER-MARTS-GTSi_EMEA-INIT_PROJECT

Orchestration ↑

To run this project end to end, the following diagram illustrates all the steps and dependencies and their orchestration flow

Below is the orchestrated steps to execute the full refresh run for this project

Below is the orchestrated steps to execute the openweek run for this project

Below is the orchestrated steps to execute the closeweek run for this project

Support Team Structure ↑

Below is the support team structure for this project:

Team	Contact Information
Build Team	EON Collective - Talend to dbt Migration project
Run Team	run-team@bose
Source Team	source-team@bose
Consumer Team	consumer-team@bose

Service Level Agreements (SLA) ↑

This project utilizes the following SLA schedule.

PRIORITY LEVEL	TARGET RESPONSE TIME	TARGET RESOLUTION TIME	SLA CLOCK
1 - Critical	30 Minutes	4 Hours	24 x 7
2 - High	40 Hours	1 Business Day	In Region
3 - Medium	N/A	5 Business Days	In Region
4 - Low	N/A	10 Business Days	In Region

Troubleshoot/FAQ ↑

dbt FAQs

Known Issues ↑

Existing data is lost or deleted after running dbt docs generate without the --no-compile flag. This is a known issue, since the project uses call statements that execute on the DW directly, if dbt internally executes a compile step, the call statements are executed. To avoid this behavior, use the --no-compile flag with the dbt docs generate command. See documentation on dbt-docs-generate with --no-compile flag and dbt execute

If existing data is lost, please rerun the data mart build using the FULLREFRESH plan and tasks.

Debugging ↑

See dbt docs on Debugging errors - this has a comprehensive list and categorization of common errors seen on a typical dbt project. Note the common pitfalls section also

Additional Notes ↑

This project builds on the tutorial which is located in dbt fundamentals course

The raw data was sourced from s3://dbt-tutorial-public and loaded into Snowflake. The raw data is stored as CSV files in a public S3 bucket.

The transformed data is stored in the RAW database. For best practices, the transformed data should be stored in a separate database in the data warehouse.

Rules were defined in a .sqlfluff file which can be found in the models directory. These rules enforce a consistent code style in all models.

Resources ↑

Learn more about dbt in the docs
Check out Discourse for commonly asked questions and answers
Join the chat on Slack for live discussions and support
Find dbt events near you
Check out the blog for the latest news on dbt's development and best practices
Check out dbt Developer blog on git cherry pick

Cutover Plan ↑

TODO: ../etc/CUTOVER_PLAN.md

Back to top ↑

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows		.github/workflows
analyses		analyses
etc		etc
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
.sqlfluffignore		.sqlfluffignore
.user.yml		.user.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
dbt_project.yml		dbt_project.yml
packages.yml		packages.yml

License

eon-collective/dbt_snowflake_tutorial_v2

Folders and files

Latest commit

History

Repository files navigation

Welcome to Eon Collective dbt® Practices

Table Of Contents

Versions ↑

Revision History ↑

dbt Project Version ↑

Overview ↑

Objective

Application Functional Overview

Project Input

Source Data from Raw Layer

Static/Seeds

Cleansing/Transformation

Project output

Data Lineage

Sample Data Lineages

Functional Context ↑

Using This dbt Project ↑

Project structure ↑

Permissions

Project Settings ↑

Application Variables ↑

Project Packages/Dependencies ↑

Development Environment Setup ↑

Local Development: VSCode/Atom etc ↑

Install Requirements ↑

Setup ↑

Developing in the Cloud IDE ↑

dbt Commands: How to Run ↑

To install dependencies: ↑

To compile the dbt project ↑

To get the documentation: ↑

Loading static/seed files ↑

Mart build ↑

The Default Values (SOFT Full Refresh w/ TEST_RAW Environment)

A Monthly Refresh w/ the TEST_RAW environment

A Custom Refresh from 201801 - 202201 w/ the PROD_RAW Environment

A HARD Full Refresh w/ the PROD_RAW Environment

List Project resources ↑

CRT Models ↑

Control/Audit Framework ↑

Introduction ↑

Design

Batch table structure ↑

Batch_logging table structure ↑

Macros developed in Mart Auditor: ↑

Insert_batch : ↑

insert_batch_logging : ↑

update_batch_logging: ↑

Update_batch : ↑

Steps to use Mart Auditor in your dbt mart: ↑

Testing ↑

Continuos Integration: Bitbucket Pipelines ↑

Credentials ↑

Bitbucket Pipe Setup ↑

dbt Tests ↑

Deployment ↑

Continuous Deployment: dbt_runner ↑

Credentials ↑

Alerts ↑

Job Names ↑

Adhoc Runs

(Re-)Loading seeded files ↑

(Re-)Loading static S3 files ↑

Orchestration ↑

Support Team Structure ↑

Service Level Agreements (SLA) ↑

Troubleshoot/FAQ ↑

Known Issues ↑

Debugging ↑

Additional Notes ↑

Resources ↑

Cutover Plan ↑

About

Resources

Packages