- Versions
- Overview
- Application Functional Overview
- Using This dbt Project
- dbt Commands: How to Run
- Control/Audit Framework/Monitoring
- Testing
- Deployment
- Support Team Structure
- Service Level Agreements (SLA)
- Troubleshoot/FAQs
- Migration Notes
- Resources
- Cutover Plan
Versions ↑
Revision History ↑
| Version | Revision Date |
|---|---|
| 1.0 | May 06, 2023 |
dbt Project Version ↑
| Version | Latest Version used for testing |
|---|---|
| 1.5 | 1.5 |
Overview ↑
This project makes use of dbt, a data transformation tool, to derive insights from data stored in Snowflake.
It demonstrates a simple implementation of a dbt ELT job.
To establish an organizational standard for transforming and analyzing data stored in a data warehouse using the data build tool (dbt).
The high level overview of application overview.
graph LR
A[orders] -- ELT --> W{this dbt project}
B[customers] -- ELT --> W
C[payments] -- ELT --> W
W -- dbt build: on_error --> X(notifications sent)
X -- via --> V>email]
X -- via --> U>slack]
W -- dbt build: on_success --> Y{snowflake objects built}
W -- dbt build --> S((metadata))
S -- stored in --> P>manifest.json]
S -- stored in --> Q>run_results.json]
W -- dbt docs generate --> O((docs site))
O -- generates --> S
S -- stored in --> R>catalog.json]
Y -- consumed by --> T>application]
Y -- consumed by --> Z>dashboard]
The following raw tables are consumed by this project
| DATABASE | SCHEMA | TABLE | NOTES |
|---|---|---|---|
| RAW | JAFFLE_SHOP | CUSTOMERS | |
| RAW | JAFFLE_SHOP | ORDERS | |
| RAW | STRIPE | PAYMENT |
The following static tables are used by this project. Non-seed files are located in the
's3://bose-bucket/*' folder. They get COPY INTO the appropriate tables using a storage integration.
| FILENAME | DATABASE | SCHEMA | SEED | NOTES |
|---|---|---|---|---|
| RAW | DBT_JANEROSE | DBT_PROJECT_EVALUATOR_EXCEPTIONS |
This project handles the following transformations/cleansing or curation details during its execution.
| DATABASE | SCHEMA | MODEL | MATERIALIZATION | TAGS | NOTES |
|---|---|---|---|---|---|
| RAW | DBT_JANEROSE | FCT_DBT__MODEL_INFORMATION | VIEW | [] | |
| RAW | DBT_JANEROSE | FCT_DBT__TEST_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | FCT_DBT__SNAPSHOT_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | DIM_DBT__TESTS | VIEW | [] | |
| RAW | DBT_JANEROSE | DIM_DBT__SNAPSHOTS | VIEW | [] | |
| RAW | DBT_JANEROSE | FCT_DBT__SEED_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | FCT_DBT__MODEL_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | DIM_DBT__CURRENT_MODELS | VIEW | [] | |
| RAW | DBT_JANEROSE | DIM_DBT__EXPOSURES | VIEW | [] | |
| RAW | DBT_JANEROSE | DIM_DBT__MODELS | VIEW | [] | |
| RAW | DBT_JANEROSE | DIM_DBT__SOURCES | VIEW | [] | |
| RAW | DBT_JANEROSE | DIM_DBT__SEEDS | VIEW | [] | |
| RAW | DBT_JANEROSE | FCT_DBT__INVOCATIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | MODEL_INFORMATION | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | SNAPSHOT_EXECUTIONS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | TEST_EXECUTIONS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | EXPOSURES | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | MODELS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | SEEDS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | SOURCES | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | TESTS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | SNAPSHOTS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | SEED_EXECUTIONS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | MODEL_EXECUTIONS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | INVOCATIONS | INCREMENTAL | [] | |
| RAW | DBT_JANEROSE | STG_DBT__INVOCATIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__MODEL_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__SEED_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__SNAPSHOTS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__TESTS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__EXPOSURES | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__MODEL_INFORMATION | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__SNAPSHOT_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__TEST_EXECUTIONS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__SEEDS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__MODELS | VIEW | [] | |
| RAW | DBT_JANEROSE | STG_DBT__SOURCES | VIEW | [] |
The full data lineage image isn't reader friendly when outputted to an image as it is too long and too wide.
To generate and serve the docs on your machine see dbt Commands: How to Run.
Note: In dbt data lineage diagrams green bubbles are sources and blue bubbles are models.
Lineage graph for Stripe payments

Functional Context ↑
Using This dbt Project ↑
Project structure ↑
The project structure defined for dbt projects is as defined below:
dbt_snowflake_tutorial_v2
┣ analyses
┃ ┣ interactive_ad_hoc_testing.sql
┃ ┣ orders_by_day.sql
┃ ┗ total_revenues.sql
┣ etc
┃ ┗ images
┃ ┃ ┣ jaffle_shop_lineage.png
┃ ┃ ┗ stripe_payment_lineage.png
┃ ┗ project_tree_structure.md
┣ logs
┃ ┗ dbt.log
┣ macros
┃ ┣ cents_to_dollars.sql
┃ ┣ clean_stale_models.sql
┃ ┣ grant_select.sql
┃ ┣ limit_data_dev.sql
┃ ┗ union_tables_by_prefix.sql
┣ models
┃ ┣ marts
┃ ┃ ┗ core
┃ ┃ ┃ ┣ dim_customers.sql
┃ ┃ ┃ ┣ fct_orders.sql
┃ ┃ ┃ ┣ int_orders_pivoted.sql
┃ ┃ ┃ ┣ int_order_status_pivoted.sql
┃ ┃ ┃ ┗ str_pivoted_customers.sql
┃ ┣ staging
┃ ┃ ┣ jaffle_shop
┃ ┃ ┃ ┣ jaffle_shop.md
┃ ┃ ┃ ┣ sources.yml
┃ ┃ ┃ ┣ src_jaffle_shop.yml
┃ ┃ ┃ ┣ stg_customers.sql
┃ ┃ ┃ ┣ stg_jaffle_shop.yml
┃ ┃ ┃ ┣ stg_jaffle_shop_orders.sql
┃ ┃ ┃ ┗ stg_orders.sql
┃ ┃ ┣ stripe
┃ ┃ ┃ ┣ sources.yml
┃ ┃ ┃ ┣ src_stripe.yml
┃ ┃ ┃ ┣ stg_payments.sql
┃ ┃ ┃ ┣ stg_stripe.yml
┃ ┃ ┃ ┗ stripe.md
┃ ┃ ┗ all_dates.sql
┃ ┗ .sqlfluff
┗ tests
┗ assert_positive_total_for_payments.sql
This same project tree view above is persisted/versioned under etc/project_tree_structure.md - If project changes, please update both markdown documents.
In order for this project to run successfuly, the following permissions are needed on Snowflake.
| DATABASE NAME | SCHEMA | TABLE NAME | PERMISSION |
|---|---|---|---|
<ENV>_RAW |
BODS, SAPE03, SAPSHP, SFTPGTSI | ALL | READ |
<ENV>_MARTS |
UMM_EMEA_REF_DATA, UMM_EMEA_STAGE, UMM_EMEA_DW | ALL | READ & WRITE |
NOTE: Replace <ENV> with the any of the following environment codes: DEV, TEST or PROD
Project Settings ↑
The project settings for this projects are persisted in the dbt_project.yml file
Application Variables ↑
This application uses the following variables defined in the dbt_project.yml file under the vars section.
| Application Argument | Default Value | Options | Description/Notes |
|---|---|---|---|
| env | TEST | TEST DEV PROD |
Controls which environment the source data comes from |
The variables should be passed to the application from the command line at runtime. Examples:
- Run a monthly refresh using the TEST_RAW environment
dbt run --vars '{refresh_type: MONTHLY}' - Run a refresh from 201801 - 202201 using the PROD_RAW environment
dbt run --vars '{refresh_type: CUSTOM, fmthid_start: 201801, fmth_id_end: 202201, env: PROD}'
Project Packages/Dependencies ↑
| Package | Version | Usage |
|---|---|---|
| codegen | 0.9.0 | Macros that generate dbt code, and log it to the command line. |
| audit_helper | 0.7.0 | Useful macros when performing data audits |
| dbt_meta_testing | 0.3.6 | Contains macros to assert test and documentation coverage from dbt_project.yml configuration settings. |
| dbt_readme_helper | 0.1.1 | Logs information about source, seed, & model nodes to log files in pipe-delimited csv format. |
| dbt_artifacts_eon | 0.2.0 | Builds a mart of tables and views describing the dbt project |
| dbt_project_evaluator | 0.5.0 | Highlights areas of a dbt project that are misaligned with dbt Labs' best practices. |
Development Environment Setup ↑
There are two possible environments where development can happen while building in Bose COE environment. Namely:
- Local Development - this can be IDE that is leveraging dbt core
- Cloud IDE - dbt cloud environment
Below are instructions to get setup on both of these environments.
Local Development: VSCode/Atom etc ↑
-
Install Requirements ↑
Install dbt.
Optionally, you can set up venv to allow for environment switching. -
Setup ↑
Set up a profile called
dbt_project_templateto connect to a data warehouse by following these instructions. If you have access to a data warehouse, you can use those credentials.We recommend setting your target schema to be a new schema (dbt will create the schema for you, as long as you have the right privileges). If you don't have access to an existing data warehouse, you can also setup a local postgres database and connect to it in your profile. Open your terminal and navigate to your
profiles.yml. This is in the.dbthidden folder on your computer, located in your home directory.On macOS, you can open the file from your terminal similar to this (which is using the Atom text editor to open the file):
$ atom ~/.dbt/profiles.ymlInsert the following into your
profiles.ymlfile and change out the bracketed lines with your own information. Here is further documentation for setting up your profile.dbt_project_template: outputs: dev: type: snowflake account: <account>.<region> # User/password auth user: <email/username> authenticator: externalbrowser role: <insert role> database: <insert db> warehouse: <insert warehouse> schema: DBT_<YOUR_USERNAME> threads: 50 client_session_keep_alive: True query_tag: <insert tag> target: dev
Configuration Key Definition my_project This is defining a profile - this specific name should be the profile that is referenced in our dbt_project.yml target: dev This is the default environment that will be used during our runs. outputs: This is a prompt to start defining targets and their configurations. You likely won't need more than dev, but this and any other targets you define can be used to accomplish certain functionalities throughout dbt.dev: This is defining a target named dev.type: [warehouse_name] This is the type of target connection we are using, based on our warehouse. For Bose COE, snowflake should be used: Can be [env]_EDW_WH or [env]_MARTS_WH for example in DEV environment, the warehouse for EDW layer would be DEV_EDW_WHthreads: 50 This is the amount of concurrent models that can run against our warehouse, for this user, at one time when conducting a dbt runaccount: .1 Change this out to the warehouse's account. Defaulted to Bose COE Snowflake URL user: [your_username] Change this to use your own username that you use to log in to the warehouse. Use Bose® userid of format ABC1234 authenticator: authenticator To use with Bose® Single Sign On, use authenticatoras the value to be prompted for a sign on window from Snowflake. Must use MFA during sign in.role: transformer This is the role that has the correct permissions for working in this project. database: analytics This is the database name where our models will build schema: dbt_[userid] Change this to a custom name. Follow the convention dbt_userid. This is the schema that models will build into / test from when conducting runs locally. -
Clone this repository
$ git@github.com:eon-collective/dbt_snowflake_tutorial_v2.git -
Change into the
dbt_snowflake_tutorial_v2directory from the command line:$ cd dbt_snowflake_tutorial_v2 -
Try running the following generic/common commands one at a time from your command line:
dbt debug- tests your connection. If this fails, check your profiles.yml.dbt deps- installs any packages defined in the packages.yml file.
For project specific commands see the section below dbt Commands: How to Run
Developing in the Cloud IDE ↑
The easiest way to contribute to this project is by developing in dbt Cloud. If you have access, navigate to the develop tab in the menu and fill out any required information to get connected.
In the command line bar at the bottom of the interface, run the following generic/common command:
dbt deps- installs any packages defined in the packages.yml file.
For project specific commands see the section below: dbt Commands: How to Run
dbt Commands: How to Run ↑
Assuming you have your environment setup as described in previous section, you can run the following commands to manage this dbt project during deployment.
To install dependencies: ↑
dbt deps
To compile the dbt project ↑
Generates executable SQL from source model, test, and analysis files. You can find these compiled SQL files in the target/ directory of your dbt project in your development environment. See dbt docs on compile
dbt compile
To get the documentation: ↑
Generate documentation for the project. See dbt docs on docs generate
dbt docs generate --no-compile
NOTE: since this project uses call statements in some of the dbt model that execute directly in the DW (snowflake), if dbt runs any compile step which includes dbt docs generate, existing data in the mart is lost since the call statements execute with delete from {{this}} <where condition>
To view the generated in local IDE, run the following command to start a webserver on default port 8000 and serve the generated documents. See dbt docs on docs serve
dbt docs serve
Loading static/seed files ↑
Builds any .csv files as tables in the warehouse. These are located in the data/seed folder of the project. Otherwise specified in the dbt_project.yml. This materializes the CSVs as tables in your target schema. Note that not all projects require this step since the assummption is that raw data is already in your warehouse. See dbt docs on seed
dbt seed
See dbt docs on seed
Mart build ↑
There are three types of Mart refreshes that get run at different times of the fiscal month. The variables passed to the project determines how the Mart will get refreshed. To understand the three refresh types assume that today is April 14, 2022 (202204):
- Full Refresh - will use the IDs from April 2019 (201904) to April 2022 (202204)
- Monthly Refresh - will use the IDs from January 2022 (202201) to March 2022 (202203)
- Weekly Refresh - will use the IDs from February 2022 (202202) to April 2022 (202204)
dbt run
dbt run --vars '{refresh_type: MONTHLY}'
dbt run --vars '{refresh_type: CUSTOM, fmthid_start: 201801, fmth_id_end: 202201, env: PROD}'
dbt run --full-refresh --vars'{env: PROD}'
Both a HARD & SOFT Full Refresh reloads the previous 36 months of data. A HARD refresh drops and recreates the table whereas a SOFT refresh drops and recreates the data.
List Project resources ↑
List project resource. dbt docs on ls command and reference to resource selection syntax
dbt ls
CRT Models ↑
In the models/onboarding/* folder there is a collection of models with the prefix crt_. These models are a reference/tool to help migrate the data from Redshift to Snowflake.
They simply execute a CREATE TABLE IF NOT EXISTS command to create tables in Snowflake with structures identical to Redshift. Once those tables are converted to DBT models the crt_model
becomes irrelevant because DBT uses the CREATE OR REPLACE command.
Control/Audit Framework ↑
Introduction ↑
Mart Auditor is a dbt package developed to capture runtime information about marts being executed in Bose COE data loads. These runtime metrics can be used to analyze data marts executed in past and for auditing purpose.
Any data mart that is developed in Bose COE project can plug in Mart Auditor package to capture data mart run time metrics. Mart Auditor creates two different tables i.e., Batch and Batch_Loging for all marts in database. These tables can be used to query to know past executions, errors, execution retries etc.
Mart Auditor is a light weight dbt package that inserts records into above tables as mart progresses and creates table or views as part of model creating and execution. There is only one entry per execution of Mart in Batch table and multiple entries one per model in batch_logging table. Association between batch and batch_logging table is: 1 -> M
Batch table structure ↑
To get the DDL Structure from Snowflake, run the command: select get_ddl('table', 'dev_marts.cf.batch'); and it will output the following:
create or replace TABLE BATCH (
BATCHID NUMBER(38,0),
RETRY_CNT NUMBER(38,0),
DBT_MART VARCHAR(16777216),
DBT_MART_REFRESH_TYPE VARCHAR(16777216),
START_DATE TIMESTAMP_NTZ(9),
END_DATE TIMESTAMP_NTZ(9),
DURATION_IN_SECONDS NUMBER(38,0),
LOAD_STATUS VARCHAR(16777216)
);
Batch_logging table structure ↑
To get the DDL Structure from Snowflake, run the command: select get_ddl('table', 'dev_marts.cf.batch_logging'); and it will output the following:
create or replace TABLE BATCH_LOGGING (
BLG_KEY NUMBER(38,0),
BATCHID NUMBER(38,0),
DBT_MART VARCHAR(16777216),
DBT_MART_REFRESH_TYPE VARCHAR(16777216),
DBT_MODEL_NAME VARCHAR(16777216),
SF_TABLE_NAME VARCHAR(16777216),
ROWS_BEFORE NUMBER(38,0),
ROWS_AFTER NUMBER(38,0),
STARTDATE TIMESTAMP_NTZ(9),
COMPLETEDATE TIMESTAMP_NTZ(9),
ERROR_MESSAGE VARCHAR(16777216)
);
Macros developed in Mart Auditor: ↑
Total 4 macros are developed as part of Mart Auditor dbt package. These macros can be called by any dbt mart by deploying into their dbt mart project.
Insert_batch : ↑
This macro inserts a record into batch table as soon as mart stars execution. Status in batch table will be inserted as “Started� along with Mart name, BatchID, StartDateTime etc.
insert_batch_logging : ↑
This macro inserts a record into batch_logging table as soon as any model in dbt mart stars executing. It will insert all the information about models that we execute as part of data mart. Important information like model name, mart name, rows in table/view before model executes, model execution start date time and model execution status as “Started� will be captured into table.
update_batch_logging: ↑
This macro is developed to validate the model executed by data mart. If a model in mart is executed successfully then this macro updates the previously inserted record in batch_logging table with updated values like status will be updated as completed, rows in table after model executes, end date time for model and Table or View name that macro creates in database.
Update_batch : ↑
Once all the models in data mart are executed successfully then this macro updates batch table record by changing status to completed and captures information like duration in seconds, retry_count etc. If any of the model in the mart is failed then this macro updates batch table with failed status and increase the retry count value by one.
Steps to use Mart Auditor in your dbt mart: ↑
Note: Mart Auditor is available in bitbucket at following location:
Bitbucket Location:
-
Add Mart Auditor package in packages.yml file like below example:
-
Execute
dbt depscommand to deploy mart auditor package into your dbt mart -
You can find mart auditor deployed into your dbt packages folder of your project
-
You can call above macros in your dbt_proejct.yml file with below example:
-
Insert_batch(dbt_mart_name, dbt_mart_refresh_type) : This macro accepts two parameters for Mart_name and Mart_refresh_type. Mart_refresh_type is optional parameter and if you don’t provide refresh type parameter then it will insert NULL value into batch table. Call this macro at on-run-start section of dbt_project.yml.
-
insert_batch_logging (dbt_mart_name, dbt_model_name, table_name, refresh_type): This macro accepts four parameters for Mart_name, model_name, table_name and refresh_type. Table_name and refresh_type are optional parameters where table name takes model name and refresh_type takes null value by default if you don't supply. Call this macro at pre-hook section of dbt_project.yml. Pass current model name to the macro with keyword “this�.
-
Update_batch_logging (dbt_mart_name, dbt_model_name, table_name, refresh_type): This macro accepts four parameters for Mart_name, model_name, table_name and refresh_type. Table_name and refresh_type are optional parameters where table name takes model name and refresh_type takes null value by default if you don't supply. Call this macro at post-hook section of dbt_project.yml
-
update_batch (dbt_mart_name, dbt_mart_refresh_type) : This macro accepts two parameters for Mart_name and Mart_refresh_type. Mart_refresh_type is optional parameter and if you don’t provide refresh type parameter then it will insert null value into batch table. Call this macro at on-run-end section of dbt_project.yml.
-
Dataops Queries for this auditing this mart run are provided in the analysis folder with the file name mart_auditor_dataops.sql
Testing ↑
Continuos Integration: Bitbucket Pipelines ↑
Credentials ↑
Credentials in a bitbucket pipe are passed through Repository variables. See more on Bitbucket pipeline and environment variables: User-defined variables.
The following variables have been configured with DEV service account details.
| Custom Variable | Variable Description |
|---|---|
| DBT_TARGET_TYPE | Type of DW for dbt to use : default is snowflake |
| DBT_TARGET_URL | snowflake url: . |
| DBT_USER | snowflake username for authentication |
| DBT_PASSWORD | snowflake password for authentication |
Most parameters are configured as a secret in Bitbucket and cannot be viewed.
Bitbucket Pipe Setup ↑
This dbt project CI pipe is configured as step in a shared bitbucket pipeline yml file (at root level of this repo). The following is the settings for this mart.
- step:
name: 'dbt_project_template_DBT - Compile'
trigger: manual
script:
- echo "Starting build for dbt_project_template_DBT"
- echo "$DBT_PROJECT_BITBUCKET_PUB_CERT" > ~/.ssh/authorized_keys
- mkdir ~/.dbt
- touch ~/.dbt/profiles.yml
- 'echo -e "dbt_project_template:\n outputs:\n dev:\n type: $DBT_TARGET_TYPE\n account: $DBT_TARGET_URL\n\n user: $DBT_USER\n password: $DBT_PASSWORD\n\n role: $DBT_ROLE\n database: DEV_MARTS\n warehouse: $DBT_WH\n schema: dbt_dbt_project_template\n threads: $DBT_NUM_THREADS\n client_session_keep_alive: True\n query_tag: dbt\n target: dev\n " >> ~/.dbt/profiles.yml'
- cd dbt_project_template/
- ls -ltr
- dbt deps
- dbt compile
- echo "Done build for dbt_project_template_DBT"dbt Tests ↑
Deployment ↑
Continuous Deployment: dbt_runner ↑
This project can be run in Bose COE TMC environment using the dbt_runner template. This template is available on all environments and leveraged during from SIT to Deployment phase of a build project. The dbt_runner artifcat/template is responsible for:
- Cloning the latest code from specified branch and url in a repo
- Setting up a profiles.yml for the tasks run based on the provided credentials.
- Execute a
dbt depsto get the project prepared for any commands passed - Run the provided dbt commands - send notification incase of error during this run
- At job end, upload the dbt artifact files to S3 for technical lineage harvesting. The following files are uploaded to S3:
manifest.json,catalog.json (if available)and therun_results.json
The three Bose COE environments available for building and deployment of dbt projects:
| ENVIRONMENT | PURPOSE | Branch used |
|---|---|---|
| DEV | System Integration Testing | feature/[more_descptive_name] |
| TEST | User Acceptance Testing | feature/[more_descptive_name] |
| PROD | Deployment (Go Live) | main |
All developer unit testing are done on the local/Cloud IDE. See Setting development environment
Bose® corp leverages “continuous deployment” or “direct promotion” for branching strategy. The diagram below illustrates the branching strategy as used at Bose® for dbt project development.
graph TD
A((main)) -- clone --> B{FeatureA branch}
B -- pull request --> A
B -- fork --> D>builder1 branch]
D -- pull request--> B
B -- fork --> E>builder2 branch]
E -- pull request --> B
A -- clone --> F{FeatureB branch}
F -- pull request --> A
F -- fork --> G>builder3 branch]
G -- pull request--> F
F -- fork --> H>builder4 branch]
H -- pull request --> F
Credentials ↑
The dbt_runner in Bose COE TMC uses Credentials Vaulting Framework to store credentials for various projects. dbt_runner needs two main credentials to run. The following credentials are used to run this project across all Bose COE Environments
| \ | DEV | TEST | PROD |
|---|---|---|---|
| Bose COE_SNOWFLAKE_CredVaultKey | dev/snowflake/mart/dbt_project_template |
test/snowflake/mart/dbt_project_template |
prod/snowflake/mart/dbt_project_template |
| Bose COE_BUGTRACKER_CredVaultKey | N/A- Not Applicable |
N/A- Not Applicable |
bose/prod/operations/alerts/bug_tracker_service |
Alerts ↑
The dbt_runner template has a notification/alerting mechanism incase of any WARNING, ERROR or FATAL messages. To configure notification in dbt_runner for this project;
- In the advanced parameter
Email_Notifyparameter, provide the valuetrue - In the advanced parameter
Bose COE_BUGTRACKER_CredVaultKeyparameter, provide a valid credential key provided by the Bose COE platform team. See the values to use per environment in above section
Job Names ↑
The following is a list of dbt_runner tasks needed to run this mart successfully end to end
| JOB_NAME | JOB_DESCRIPTION | JOB_RUN_PARAMETERS | NOTES |
|---|---|---|---|
| DBT_RUNNER-MCA-SEED | Load static files with seed | dbt seed |
run one time during project deployment/initialization or incase of a change in the csv files within the project |
(Re-)Loading seeded files ↑
Incase of changes to the static files that are hosted within the dbt project in the data or seeds folder (varies with the project as specified by the seed-paths config in dbt_project.yml). Below are the steps that should be followed to reload the seeded data once this project has been deployed to PRODUCTION.
- Create a new feature branch from main
- Checkout feature branch to IDE
- Replace the existing files in the data/ref_data_seeds folder.
- Commit the changes to seed files and push to remote feature branch
- Create a Pull Request from feature branch to main
- Merge the feature branch into main
- Run the TMC Task
DBT_RUNNER-MCA-SEEDto refresh the Ref Data for MCA project.- If no new files are provided but Seed data has to be refreshed, then only Run the TMC Task
DBT_RUNNER-MCA-SEED
- If no new files are provided but Seed data has to be refreshed, then only Run the TMC Task
(Re-)Loading static S3 files ↑
Incase of changes to the static files that hosted within Bose COE S3 folders Below steps should be followed to reload the snowflake data once this project has been deployed to PRODUCTION.
- Load the new files to S3 production bucket: check in
macros/init_project.sqlfor precise S3 data locations. - Run the TMC Task
DBT_RUNNER-MARTS-GTSi_EMEA-INIT_PROJECTto refresh the Ref Data for UMM_EMEA.- If no new files are provided but Seed data has to be refreshed, then only Run the TMC Task
DBT_RUNNER-MARTS-GTSi_EMEA-INIT_PROJECT
- If no new files are provided but Seed data has to be refreshed, then only Run the TMC Task
Orchestration ↑
To run this project end to end, the following diagram illustrates all the steps and dependencies and their orchestration flow
Below is the orchestrated steps to execute the full refresh run for this project
Below is the orchestrated steps to execute the openweek run for this project
Below is the orchestrated steps to execute the closeweek run for this project
Support Team Structure ↑
Below is the support team structure for this project:
| Team | Contact Information |
|---|---|
| Build Team | EON Collective - Talend to dbt Migration project |
| Run Team | run-team@bose |
| Source Team | source-team@bose |
| Consumer Team | consumer-team@bose |
Service Level Agreements (SLA) ↑
This project utilizes the following SLA schedule.
| PRIORITY LEVEL | TARGET RESPONSE TIME | TARGET RESOLUTION TIME | SLA CLOCK |
|---|---|---|---|
| 1 - Critical | 30 Minutes | 4 Hours | 24 x 7 |
| 2 - High | 40 Hours | 1 Business Day | In Region |
| 3 - Medium | N/A | 5 Business Days | In Region |
| 4 - Low | N/A | 10 Business Days | In Region |
Troubleshoot/FAQ ↑
Known Issues ↑
Existing data is lost or deleted after running dbt docs generate without the --no-compile flag.
This is a known issue, since the project uses call statements that execute on the DW directly, if dbt internally executes a compile step, the call statements are executed. To avoid this behavior, use the --no-compile flag with the dbt docs generate command. See documentation on dbt-docs-generate with --no-compile flag and dbt execute
If existing data is lost, please rerun the data mart build using the FULLREFRESH plan and tasks.
Debugging ↑
See dbt docs on Debugging errors - this has a comprehensive list and categorization of common errors seen on a typical dbt project. Note the common pitfalls section also
Additional Notes ↑
This project builds on the tutorial which is located in dbt fundamentals course
The raw data was sourced from
s3://dbt-tutorial-publicand loaded into Snowflake. The raw data is stored as CSV files in a public S3 bucket.
The transformed data is stored in the RAW database. For best practices, the transformed data should be stored in a separate database in the data warehouse.
Rules were defined in a .sqlfluff file which can be found in the models directory. These rules enforce a consistent code style in all models.
Resources ↑
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the chat on Slack for live discussions and support
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices
- Check out dbt Developer blog on git cherry pick
Cutover Plan ↑
TODO: ../etc/CUTOVER_PLAN.md




