Maximize Efficiency with Databricks Serverless SQL Transition

Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Databricks Serverless SQL is a robust, fully managed platform capable of powering next-generation analytics and data applications. It provides a unified platform, especially for collaborating with data analysts and data scientists.

Before delving into the serverless architecture, let’s first understand the control pane and compute pane, which is the operational foundation of Databricks.

Control Pane: This encompasses all the backend services managed by Databricks within its Azure subscription. Additionally, it stores Databricks SQL queries, notebooks, and various other workspace configurations, all of which are encrypted at rest to enhance security.
Compute Pane: It contains the various clusters of compute resources available to process the data.

Serverless Compute Compared with Classic Compute

The above image highlights the workings of the Serverless and Classic compute planes.

Features	Serverless compute plane	Classic compute plane
Location of Compute Resources	The compute layer stays within the Databricks account	The compute layer stays within the customer’s Azure
Public IP address for nodes	No	No, if enabled with secure cluster connectivity otherwise there is public IP for each VM
Unallocated VMs costs are covered by	Databricks	Customer pays Azure
Data Protection	Operates within network boundary of workspace, bolstered by extra layers of security to ensure isolation	Natural isolation as compute stays in customer environment

What are the primary hurdles we often encounter with the classic compute plane?

Reduced Productivity: Analysts, data engineers, and data scientists are unproductive because they must wait for the cluster start-up.
Always-on: Customers often maintain the warehouse as active with a fixed cluster to avoid start-up delays. However, this approach is inefficient and costly.
Limited Scalability: Fixed Cluster sizes pose a bottleneck as data volume and processing needs grow. Dynamic scaling can be challenging and may lead to over or under-provisioning.
Infrastructure Overhead: Require platform administrators to provision, configure, and manage clusters. However, this process can be time-consuming and error-prone, mainly due to the diverse user requirements and fluctuating workloads.
Added Cost: The warehouse remains operational with fixed clusters even during low activity or no usage, leading to cost inefficiency.
Resource Contention: Sharing the same cluster among multiple users or applications can lead to resource contention, resulting in poor performance or longer query execution time.

Why should we consider transitioning to serverless compute?

Increased Productivity: Serverless SQL provides on-demand instant compute to perform SQL workloads. This approach is efficient as there is no wait time for cluster start-up.
Improved Performance: Serverless compute incorporates optimization techniques like query caching, automatic indexing, and adaptive query execution, ultimately reducing latency and improving productivity.
Auto Scaling: In serverless, scaling is automatic based on the workload. Customers need not be concerned about over- or under-provisioning compute resources.
Managed Infrastructure: Databricks fully manages the infrastructure, handling tasks such as provisioning, scaling clusters, performance monitoring, and ensuring high availability.
Reduced Cost: Customer only pay for the compute resources utilized in their workload. This approach is cost-efficient as running an idle cluster is unnecessary.

What are the essential requirements for implementing Serverless SQL?

Databricks workspace must be the premium plan.
The workspace must be in the supported region (Azure Databricks regions).
If Azure storage data is used as a source and is protected by Azure Storage firewall, then additional configuration should be set to permit access from a Serverless compute node.

Guidelines to Implement Serverless SQL

Step 1: Create a Serverless warehouse

Click Create SQL Warehouse and complete the following details.
- Name
- Cluster Size
- Auto Stop Time Limit
- Scaling Option
- Type Serverless

The advanced option is optional with details such as tags and unity catalog (if enabled in the workspace).
Provide access to the serverless SQL
- Search for the AD group or the user and add
- Select the level of permission required (can use and can manage)

The successful creation of a Serverless SQL workspace will look as follows:

Step 2: Use the Serverless option for data analysis

Click on the SQL Editor option in the sidebar
In the New query window, select the serverless compute (users can see other options as well based on the permission)

Analysts can start running their queries, and the access is instantaneous.

Step 3: Use Serverless SQL in Databricks Workflows

Click on the Workflows in the sidebar.

Click on Create Job and complete the following details
- Task Name
- Type – (Notebook, SQL, Python Script, etc.)
  - Based on the type, options vary
  - To start with, I have selected SQL as the type
- SQL task – (Query, Dashboard, Alert, File)
- SQL query
- SQL warehouse
- Other optional things are – Parameters, Notifications, Retries, and Duration threshold.

Serverless SQL does come with the following limitations:

Policies: There is no support for establishing cluster policies or spot instance policies.
Metastores: Using external hive metastores is not supported.
Pubic IP: Serverless SQL has no public IP address and does not require or use secure cluster connectivity.
Disk Storage: Disks in serverless SQL are short-lived and do not require customer-managed keys.

Conclusion

Databricks Serverless SQL provides an efficient, scalable, cost-effective, and reliable platform to perform data analysis and handle analytical workloads. By eliminating the need to manage infrastructure coupled with its robust security and governance features, it stands out as a compelling choice for data teams. The serverless plane outperforms the classic plane in various aspects, making it a desirable option for modern analytics.

AUTHOR - FOLLOW
Srinivasan Shanmugasundaram
Senior Manager, Data Engineering, Tredence Inc.

Topic Tags

Databricks

Serverless SQL

Next Topic

Migrating to Serverless SQL with DBT

Next Topic

Exploring Databricks Serverless SQL and Understanding the Benefits of Transitioning

Like the blog

Table of contents

Like the blog

Table of contents

Serverless Compute Compared with Classic Compute

What are the primary hurdles we often encounter with the classic compute plane?

Why should we consider transitioning to serverless compute?

What are the essential requirements for implementing Serverless SQL?

Guidelines to Implement Serverless SQL

Step 1: Create a Serverless warehouse

Step 2: Use the Serverless option for data analysis

Step 3: Use Serverless SQL in Databricks Workflows

Serverless SQL does come with the following limitations:

Conclusion

Migrating to Serverless SQL with DBT

Migrating to Serverless SQL with DBT

recommended articles

Thank you for a like!

Share this article

Industries

Services

Solutions

Blogs

Data & AI 101

Client Success

Life at Tredence

Careers

Contact us

CSR Framework

Certifications

Follow us on