UC Managed Tables in Databricks: Performance, Governance & Secure Sharing

Databricks

Date : 04/22/2025

Databricks

Date : 04/22/2025

UC Managed Tables in Databricks: Performance, Governance & Secure Sharing

Discover how UC Managed Tables in Databricks enhance performance, enable secure data sharing with Delta Sharing, and simplify governance. Learn key features, integration tips, and migration steps

Shakti Prasad Mohapatra

AUTHOR - FOLLOW
Shakti Prasad Mohapatra
Senior Solutions Architect

UC Managed Tables in Databricks: Performance, Governance & Secure Sharing
Like the blog

Table of contents

UC Managed Tables in Databricks: Performance, Governance & Secure Sharing

  • What is UC Managed Table?
  • Key Benefits
  • Deep Dive into Features
  • Open API Integration Example (Product Pricing Comparison Use Case)
  • Use Case: Real-Time Product Pricing with External API Integration
  • Step-by-Step Implementation in Databricks
  • Performance Optimization (Predictive, Statistics, Liquid Clustering)
  • Migration Path & Myths Debunked
  • Do you want to know how to migrate External tables to UC Managed Table to get all indigenous benefits?

Table of contents

UC Managed Tables in Databricks: Performance, Governance & Secure Sharing

  • What is UC Managed Table?
  • Key Benefits
  • Deep Dive into Features
  • Open API Integration Example (Product Pricing Comparison Use Case)
  • Use Case: Real-Time Product Pricing with External API Integration
  • Step-by-Step Implementation in Databricks
  • Performance Optimization (Predictive, Statistics, Liquid Clustering)
  • Migration Path & Myths Debunked
  • Do you want to know how to migrate External tables to UC Managed Table to get all indigenous benefits?
UC Managed Tables in Databricks: Performance, Governance & Secure Sharing

UC Managed Tables are Databricks’ flagship table format. In the recently concluded Technical Kickoff (TKO), databricks revealed some of the coolest features of UC Managed table and future roadmap. Below are a few highlights of the same.

What is UC Managed Table?

Databricks supports different types of tables as highlighted below:

Foreign tables:

  • These are tables created in foreign catalogs like in Hive Meta store or Glue and federated with databricks.
  • Databricks can’t write to these tables, and they are read only. They can be read or written by Third Party clients though
  • They don’t use native databricks benefits like automated performance optimization or faster ETL/ELT
  • The only benefit is they can be governed within UC and are discoverable and accessible all within one place

External tables:

  • These are created within databricks with a set location
  • Its file system-based catalog source of truth so anyone having access to the file system will be able to bypass the governance.
  • Databricks can read data from and write data into the table
  • But no automated performance optimization

Managed tables:

  • Created within databricks without specifying any location
  • These are created within the storage locations in customer-managed cloud storage
  • Reads and writes are feasible within databricks and from Third Party clients as well
  • The biggest benefit it gets is automated performance optimization powered by data intelligence platforms.

Key Benefits

  • UC governs access for all clients
  • Storage location is being managed by UC but within Customer managed location.
  • Allows reads and writes using Third Party clients using open APIs via single copy of data within Unity Catalog
  • Automated optimization Powered by Databricks Data intelligence platform.
  • Continuous improvement in Databricks Roadmap.

Deep Dive into Features

  • UC Managed tables are stored in Open formats such as Parquet, Delta, Iceberg with metadata on the top
  • They can be accessed by different clients via open catalog APIs (UC Rest APIs, Iceberg REST Catalog APIs)
  • They can be shared across organizations using Delta Sharing via single copy being maintained within UC.

Delta Sharing in Action

Ref: What is Delta Sharing? — Azure Databricks | Microsoft Learn

With Delta sharing managed table data can be shared with Third Party clients using Secure and reliable Delta Sharing Protocol using single copy being managed and governed by UC.

As shown in the diagram, the objects which can be shared are

  • Tables
  • AI Models
  • Files
  • Functions
  • Views

Let’s now dive into a simple example of how to create and consume the delta share securely.

Step 1: Enable Delta Sharing (If Not Already Enabled)

In Databricks, enable Delta Sharing in the workspace settings.

Step 2: Create a Delta Share

CREATE SHARE my_data_share;

Step 3: Add a Delta Table to the Share

ALTER SHARE my_data_share
ADD TABLE my_catalog.my_schema.my_table;

Step 4: Grant Access to a Recipient

CREATE RECIPIENT my_client
WITH TOKEN 'client_generated_token';
GRANT SELECT ON SHARE my_data_share TO RECIPIENT my_client;

Access Delta Share (Python) — Consumer Side — Data Receiver

Install Delta Sharing Client

pip install delta-sharing

Read Data from Delta Share (using Pyspark)

from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName("DeltaSharingConsumer") \
.getOrCreate()

df = spark.read.format("deltaSharing") \
.load(profile_file + "#share_name.schema_name.table_name")

df.show()

Open API Integration Example (Product Pricing Comparison Use Case)

  • Delta Lake tables can also be accessed via Unity REST APIs
  • Iceberg tables can be accessed using Iceberg REST APIs
  • Both of these with single copy of data within UC. Clients authenticate themselves to UC. They use short-lived storage credentials created only for themselves to access only the table which is being shared.
  • UC Managed Tables allows third party writes which is one of its kind.

Here are the steps for accessing external data via a generic OpenAPI (REST API)

Install Required Library (if not available)

%pip install requests

Use Case: Real-Time Product Pricing with External API Integration

Business Objective:

An e-commerce company wants to:

  • Pull real-time competitor pricing data from a third-party pricing intelligence API.
  • Combine it with their internal product catalog.
  • Recommend competitive pricing using business rules or ML models.
  • Push results to dashboards or downstream systems.

Generic OpenAPI Request Example in Databricks: example mock pricing API

GET https://api.pricingintel.com/v1/competitor-price?product_id=12345

Returns:

{
"product_id": "12345",
"competitor_prices": [
{"store": "Amazon", "price": 99.99},
{"store": "Walmart", "price": 97.49},
{"store": "Target", "price": 98.75}
]
}

Step-by-Step Implementation in Databricks

Step 1: Simulate Internal Product Catalog :

from pyspark.sql import SparkSession

# Sample internal product catalog
products = [
{"product_id": "12345", "product_name": "Wireless Mouse", "our_price": 100.00},
{"product_id": "67890", "product_name": "Bluetooth Headphones", "our_price": 150.00},
]

df_products = spark.createDataFrame(products)
df_products.display()

Step 2: Call External Pricing API for Each Product

import pandas as pd

# Convert Spark to Pandas for row-wise API call
product_list = df_products.toPandas()
pricing_records = []

for _, row in product_list.iterrows():
ext_data = fetch_competitor_prices(row["product_id"])
for competitor in ext_data["competitor_prices"]:
pricing_records.append({
"product_id": row["product_id"],
"product_name": row["product_name"],
"our_price": row["our_price"],
"competitor": competitor["store"],
"competitor_price": competitor["price"]
})

# Create Spark DataFrame from combined results
df_pricing = spark.createDataFrame(pd.DataFrame(pricing_records))
df_pricing.display()

Step 3: Combine External and Internal Data:

import pandas as pd

# Convert Spark to Pandas for row-wise API call
product_list = df_products.toPandas()
pricing_records = []

for _, row in product_list.iterrows():
ext_data = fetch_competitor_prices(row["product_id"])
for competitor in ext_data["competitor_prices"]:
pricing_records.append({
"product_id": row["product_id"],
"product_name": row["product_name"],
"our_price": row["our_price"],
"competitor": competitor["store"],
"competitor_price": competitor["price"]
})

# Create Spark DataFrame from combined results
df_pricing = spark.createDataFrame(pd.DataFrame(pricing_records))
df_pricing.display()

Step 4: Analyze & Recommend Pricing Strategy

from pyspark.sql.functions import avg, col

# Calculate average competitor price
df_avg_price = df_pricing.groupBy("product_id", "product_name", "our_price") \
.agg(avg("competitor_price").alias("avg_competitor_price")) \
.withColumn("recommended_price", col("avg_competitor_price") - 0.50) # Undercut by 50 cents

df_avg_price.display()

Step 5: Push Results to Downstream Dashboard or Table

# Save results as Delta Table
df_avg_price.write.format("delta").mode("overwrite").saveAsTable("recommended_product_pricing")

Secure API Key Using Databricks Secrets:

API_KEY = dbutils.secrets.get(scope="openapi-secrets", key="pricing-api-key")

Performance Optimization (Predictive, Statistics, Liquid Clustering)

Predictive Optimization:

Predictive optimization continuously learns in the backend with the following metrices

  • Telemetry — which gives information about what is the table, most popular query, columns used in the query, common predicates being used
  • Once telemetry data is collected then model evaluation takes place to improve performance.
  • Once model evaluation is done then it runs optimization using serverless infrastructure

Below is a pictorial representation of the entire cycle how predictive optimization works.

Supported optimizations (GA)

  • Compaction — to merge the small files into one to reduce scanning
  • Clustering for new data which is coming in
  • VACUUM to clear the unneeded files from older versions.

Automated Statistics (public preview)

Without Automated Statistics, challenges being faced are

  • User must manually run the ANALYZE for query optimization stats
  • Users also need to ensure columns which needed data skipping are part of first 32 columns

With automatic statistics

  • Query optimization stats are collected and maintained automatically
  • And with automated intelligent collection of data stats on required columns, clients should see a 33% improvement in query execution.

Liquid Clustering

Delta Lake liquid clustering replaces table partitioning and ZORDER to simplify data layout decisions and optimize query performance

  • Liquid clustering provides flexibility to redefine clustering keys without rewriting existing data, allowing data layout to evolve alongside analytic needs over time
  • Liquid clustering applies to both Streaming Tables and Materialized Views.
  • It’s widely adopted with 3000+ customers using Liquid clustering monthly
  • 170PB+ Data are written into liquid tables monthly
  • There are 2PB+largest active liquid tables at present
  • It’s available for both managed and external tables
  • Much easier to use in UC Managed Table

With external tables there must be explicit mention of columns to be clustered by

Ex: CREATE TABLE TBL1 CLUSTER BY date, product_id

Whereas for UC Managed Table, automatic liquid clustering is available

CREATE TABLE TBL1 CLUSTER BY AUTO

This will take care of

  • Clustering key selection
  • Clustering new data on write
  • Background clustering
  • No need to select which columns to be clustered with as it will be automatically taken care of based on access pattern

Low Latency Queries (GA)

  • Cache Delta Metadata for UC manages tables
  • Reduce expensive cloud IOs

Migration Path & Myths Debunked

Myth

Customer objection: External tables are more open since I can point directly to the storage path and Managed tables are more closed as their paths are GUIDs

Fact

  • Path-based access is insecure since access is given directly to the path bypassing the governance
  • Error prone. Since the same path if used by 2 different tables can corrupt the data in case the tables are being loaded at the same time which might corrupt the data.
  • On the other hand, UC Managed Mables are completely open in terms of secure access via Delta sharing, Open APIs etc.
  • The path for UC Managed Table is being managed within customer storage accounts with only a single copy of data being governed by UC.

Do you want to know how to migrate External tables to UC Managed Table to get all indigenous benefits?

Databricks has a new capability in Private Preview which allows seamless migration from external to Managed table with minimal to zero downtime

Run a single alter command as below

ALTER TABLE catalog.schema.external_table SET MANAGED

Characteristics:

  • No data movement or duplication
  • Path-based code from DBR continues to work just that it will be automatically redirected to UC metadata
  • Minimal downtime (<1min) for creation of delta log creation in UC managed location
Shakti Prasad Mohapatra

AUTHOR - FOLLOW
Shakti Prasad Mohapatra
Senior Solutions Architect


Next Topic

Azure Databricks Lakeflow: The Ultimate Guide to Effortless Data Orchestration



Next Topic

Azure Databricks Lakeflow: The Ultimate Guide to Effortless Data Orchestration


Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

×
Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article
×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.