What is Unity Catalog and Why is it a Game Changer?

Data Governance

Date : 09/19/2023

Data Governance

Date : 09/19/2023

What is Unity Catalog and Why is it a Game Changer?

Delve into Unity Catalog's role in overcoming data governance challenges. Discover its security model, catalog organization, and roles for asset management.

Maulik Divakar Dixit

AUTHOR - FOLLOW
Maulik Divakar Dixit
Director, Data Engineering,
Databricks Champion
Databricks MVP

About Unity Catalog

Like the blog

Table of contents

What is Unity Catalog and Why is it a Game Changer?

Now this is Significant for Two Reasons
Summarizing Unity Catalog

Like the blog

Table of contents

What is Unity Catalog and Why is it a Game Changer?

Now this is Significant for Two Reasons
Summarizing Unity Catalog

About Unity Catalog

Previous to Unity Catalog, Databricks workspaces each had their own Hive metastore where the metadata was stored and accessible only from within the workspace. The objects in the workspace were constrained to that workspace.

One can argue that we can create external tables in each of the Databricks workspaces that point to common data in the datalake. Although the approach works, there still is a need to know where the data exists in a datalake (the exact path) and create an external table in the local metastore to make it available in the workspace. Further ACLs on objects defined at workspace level are restricted to that workspace object only and cannot be managed centrally

Let’s understand this with a practical example. Typically, we have different functions and divisions within an organization. Let’s use Marketing and Supply Chain as an example. Each of them has its own workspace so different teams can work and there is a separation of development and deployment and objects between the functions. Possibly both have their own datalake accounts as well.

Both these functions need internal sales data to derive business-specific KPIs. If the Marketing function has built pipelines to pull data from the source and cleanse it, the Supply chain function may not know that such an object exists and even if it does there are additional complications of creating a mount point to the Marketing datalake and giving the right level of access through ACLs. This is a challenge.

Databricks Unity Catalog enables the creation of a centralized metastore where objects are stored and accessible across all workspaces that are attached to this centralized metastore solution.

Databricks Unity Catalog

The methods to access objects in Unity Catalog are different as well. Before Unity Catalog, there was a two-level structure of database and object name to access data.

However, with Unity Catalog we have 3 levels to access the object i.e., catalog, database and objects, creating a path not only for easier access but additional governance capability.

databricks object hierarchy

Today’s capability allows users to create a catalog by division/function and organize objects within it by creating multiple databases under it.

Additionally, assuming there is browse access on all objects (use catalog/use schema) users are able to see all objects without having access to the data.

Great. So now we know what Unity Catalog capabilities are, but how can we make our workspaces (new and existing) attach to Unity Catalog? Databricks has delivered a new solution to centralize users, groups, and service principals, attaching the workspaces to the Unity Catalog and assigning access to the workspace through the account console.

Now this is Significant for Two Reasons

You can have all your users, groups, and service principals synced up to the centralized account console through a SCIM connector.
Unity Catalog creates governance to the attached workspaces to the metastore and assigns users, groups, principals with access to the workspace from the account console which previously had to be managed per workspace and brought up a significant administration overhead.

Summarizing Unity Catalog

Central metastore to store all Databricks objects
Ability to attach multiple workspaces to central metastore
A three-level hierarchy of objects and access can be provided at any of the levels of the hierarchy
A centralized account console to sync up users, groups and principals from Identity management solution
Ability to attach workspaces to Unity Catalog and assign users, groups, and principals from the central account console

The Above Features Enable the Following Benefits

Centralized governance for data
Built-in data search and discovery
Fine-grained access control of data at catalog, database and object level
Automated lineage of objects, notebooks, workflows

Now that you know what Unity Catalog and its benefits are, it is time to deep dive into the setup of Unity Catalog.

So, stay tuned for the next chapter, to learn how we can organize objects in the Unity Catalog.

Maulik Divakar Dixit

AUTHOR - FOLLOW
Maulik Divakar Dixit
Director, Data Engineering, <br>Databricks Champion<br>Databricks MVP

Topic Tags

Data Governance

Databricks Unity Catalog

security model

Data Migration

Next Topic

Introducing Unity Catalog: Governing Data with Databricks

Continue reading

Next Topic

Introducing Unity Catalog: Governing Data with Databricks

Continue reading

our categories

Telecom, Media, Technology

Travel & Hospitality

Healthcare & Life Sciences

Banking & Financial Services

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.

recommended articles

AI Data Governance - Leading the Way to Enterprise-Scale AI Success

Blog

AI Data Governance - Leading the Way to Enterprise-Scale AI Success

Effective Unity Catalog Migration - A Guide

Blog

Effective Unity Catalog Migration - A Guide

Databricks Cluster Types Explained in Unity Catalog

Blog

Databricks Cluster Types Explained in Unity Catalog

×

Thank you for a like!

Stay informed and up-to-date with the most recent trends in data science and AI.

Share this article

×

Ready to talk?

Join forces with our data science and AI leaders to navigate your toughest challenges.