In today's data-driven world, having access to up-to-date information is crucial. However, manually refreshing tables can be time-consuming and error prone. Here comes dynamic tables: a powerful option that automates data updates, ensuring your analyses are always based on the latest information.
What are Dynamic Tables?
Think of dynamic tables as self-updating spreadsheets. You define a query that transforms data from one or more base tables, and a built-in refresh process takes care of the rest. Whenever the underlying data changes, the dynamic table automatically updates itself (Based on the target Lag), reflecting the latest information.
Example Analytics Use Case: -
A global retail company’s data analyst needs to calculate the Quarter-to-date order amount and the number of customers in each country for market research. The required data is loaded in Snowflake through multiple ETL (Extract Transform and Load) pipelines with different refresh frequencies. The analyst accessed the data via Role-based access control and developed a SQL query to get the desired aggregated data by joining different tables (i.e., customer, order, nation). Not to worry about CDC (Change Data Capture), pipeline building, and dependency management; the dynamic table has solved the problem.
Dynamic Table (Syntax): -
CREATE DYNAMIC TABLE REPORT.QUARTERLY_CUSTOMER_AND_ORDER_BY_COUNTRY
TARGET_LAG = 'DOWNSTREAM' -- Other options seconds | minutes | hours | days (Specifies the lag for the dynamic table)
WAREHOUSE = SF_US_DE_WH_01
INITIALIZE = ON_CREATE --Other option ON_SCHEDULE (Specifies the behavior of the initial refresh of the dynamic table)
REFRESH_MODE = INCREMENTAL --Other options FULL or AUTO (Specifies the refresh mode for the dynamic table)
AS
select
nat.N_NAME as Country_Name,
YEAR(O_ORDERDATE) as Year,
QUARTER(O_ORDERDATE) as Quarter,
count (distinct C_NAME) as Number_Of_Unique_Customer,
sum(ord.o_totalprice) as total_order_amount
FROM STN.STN_NATION nat
inner join
STN.STN_CUSTOMER cus
on cus.C_NATIONKEY=nat.N_NATIONKEY
inner join
STN.STN_ORDERS ord
on cus.c_custkey = ord.o_custkey
group by 1, 2, 3 order by 1,2,3;
DAG View of Dynamic Tables
Dynamic tables can be chained together to create a Directed Acyclic Graph (DAG), allowing you to use the output of one table as the input for another. This enables you to build complex data transformations in a modular and efficient way.
In the given DAG, a new dynamic table (YTD_ORDER_BY_COUNTRY) with a Target Lag of 1 minute was created on top of the QUARTERLY_CUSTOMER_AND_ORDER_BY_COUNTRY Dynamic table. This allows Snowflake to detect changes in the underlying dynamic and base tables every minute.
The Refresh Process Explained
The magic behind dynamic tables lies in the automated refresh process. This process works in two key steps:
- Change Detection: Snowflake analyzes the dynamic table’s underlined query and creates a DAG based on dependencies. To identify incremental changes to the base tables, Snowflake creates Stream in the backend to capture RowID, insert/update/delete. (As this is just 3 metadata columns it is results in a small Storage cost).
- Merging Updates: Only the changes are incorporated into the dynamic table, not the entire dataset. This keeps the refresh process efficient and minimizes resource consumption.
When we loaded new records to the order table for today's order date and refreshed the Dynamic table, it only added 24 new records (i.e. aggregated records for 24 countries).
When a source record was deleted for today’s order date and it only affected one record in the Dynamic table (Quarterly dataset), Snowflake deleted that record and added a new one.
Balancing Freshness and Cost
When setting up a dynamic table, you get to define the ideal "freshness" for your data, also known as the target lag. This tells the system how far behind the base tables the data in the dynamic table can be.
Here is the trade-off:
- High Freshness (Low Target Lag): Get the most recent data, ideal for real-time analytics. This, however, requires more frequent refreshes and consumes more compute resources.
- Lower Freshness (Higher Target Lag): Reduce refresh frequency and save on resources. This might be suitable for scenarios where near real-time data is not critical.
The key is to find the sweet spot between data freshness and cost-effectiveness.
Here are different cost components to a Dynamic table: -
- Compute Cost: -
- Usage of the Virtual Warehouse defined in the table DDL
- Cloud Services compute to identify changes in underlying base objects and whether the virtual warehouse needs to be invoked.
- Storage cost: - Active, Time Travel, fail-safe storage, cloning, replication, and metadata to keep track of Source table record changes.
Use Cases: -
- The analyst wants to process data sourcing from various tables in Snowflake without depending on the Data Engineering team for help building ETL pipelines.
- Continuously merge CDC data into target tables for batch and streaming.
- Generating Slowly Changing Dimension tables without depending on stored procedures and ETLs (Extract Transform and Load).
- Continuous data aggregation and transformation from multiple source tables.
- Building functional subsets by filtering data from one table into separate tables on low latency.
- Materializing results of complex views (SQLs on top of one or more tables) for improved performance
Advantages of Dynamic Table: -
- Simplified data engineering: Reduces complexity and manual effort compared to traditional methods like streams and tasks.
- Automated data freshness: Maintains desired data recency without manual intervention. Dynamic Tables adapt to changing latency needs. Easily adjust data freshness from hours to seconds with a simple ALTER statement.
- Auto Incremental Update: Snowflake leverages Stream to track source table changes and performs incremental updates to the dynamic table.
- Efficient query performance: Pre-computes complex queries for faster access and analysis.
- Flexible data pipelines: Supports both batch and streaming data processing within a unified framework.
- Reduced development time: Streamlines the ETL process and accelerates time-to-market for insights.
- Improved data quality: Ensures data consistency and accuracy through automated updates.
- Cost-effective: Optimizes resource utilization by automating data management tasks.
How to choose between Dynamic Table Vs. Stream & Task Vs. Materialized View
Dynamic Iceberg Tables: Blending Flexibility and Performance
- Dynamic Iceberg tables combine the best of both worlds: the flexibility of dynamic tables and the efficiency of Snowflake-managed Iceberg tables. They allow you to store massive datasets in cloud storage like S3 or Azure Blob Storage while enjoying the power of Snowflake for data transformation and analysis. These tables automatically update, optimize performance, and handle complex data pipelines with ease.
Snowflake Dynamic Tables offer a powerful and efficient way to build and manage data pipelines. By automating incremental updates, optimizing performance, and providing robust observability, Dynamic Tables empower data teams to focus on deriving insights rather than managing infrastructure. Embrace the simplicity and scalability of Dynamic Tables to transform your data operations and drive business value.
AUTHOR - FOLLOW
Asish Patro
Senior Manager, Data Engineering
Topic Tags
Next Topic
The Omni Channel Experience for Life Sciences: Unlocking New Frontiers in HCP Engagement
Next Topic