As data requirements have become more complex, centralized systems such as data warehouses and data lakes, which have served us well over the years, have been lacking in many aspects.
Demand for an advanced system—one that blends the best of data warehouses and lakes while making the process more decentralized, flexible, and agile—has been rising globally.
Among the options, the adoption of data mesh is growing, not as a replacement for its predecessors but as an evolution of their strengths. The global data mesh market was valued at USD 1.2 billion in 2023 and is projected to expand at a CAGR of 16.4% from 2023 to 2028 (Source: MarketsandMarkets).
Before we discuss how data mesh benefits organizations, let us understand what data mesh is.
What is Data Mesh and How it Differs from Data Fabric
Data mesh refers to a decentralized data architecture that aligns data management with business departments, ensuring high-quality, well-documented, and easily accessible data across the organization. The concept of data as a product, self-serviceability, and continuous adaptability is central to the theme of data mesh.
Data mesh facilitates the decentralization of data ownership. Instead of one central team managing everything, several departments or micro-teams assume responsibility and ownership for their data.
Netflix, a global streaming company, has adopted such an approach, allowing its data engineering and analytics teams to control data streams. This enables Netflix to obtain insights more quickly without requiring approvals from various departments. (Source: Netflix)
Data mesh enables the elimination of data silos. Data is spread across various systems in many organizations, making integration and sharing difficult. Data mesh ensures that rules and standards are shared to enable collaboration while allowing teams to maintain control over their data.
Zalando, a European e-commerce company with vast customer and product data sets, found this approach particularly useful. By decentralizing ownership, the company improved not just its efficiency but also the accuracy and usability of analytics (Source: BARC Research).
Scalability is perhaps the most important argument in favor of data mesh. As organizations grow, so do their data challenges. Traditional architecture struggles to keep up, leading to delays and inefficiencies. Data mesh is built for scalability, enabling seamless growth by empowering teams to own their data and providing a framework for collaboration.
JP Morgan Chase benefited from adopting the data mesh framework by decentralizing ownership and enabling different departments to manage their own data products. This approach helped the company reduce bottlenecks and improve data quality, enabling faster data-driven decision-making (Source: JP Morgan Chase).
Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks, introduced data mesh in 2019 to address the limitations of traditional centralized data architecture. The data mesh model is based on the core concepts of department-oriented ownership, data as a product, self-service infrastructure, and federated governance and focuses on decentralization and empowering department teams (Source: ThoughtWorks). |
Why Data Mesh
Why data mesh? Its creator, Zhamak Dehghani, recognized that data lakes and centralized models faced limitations when operating at the micro level, although they were used in several organizations. Data mesh enables more scalable and flexible data management by decentralizing ownership and empowering domain experts.
Data lakes also suffer from data quality and discoverability issues. Furthermore, domain experts were not part of the data lake framework and had little authority or say in helping end-users make sense of the data. Data mesh largely fills this gap, if not entirely.
Data Mesh vs. Data Fabric
If you ask any data engineer about the latest industry buzz, data mesh is likely to come up. Dig deeper, and you will hear another term being tossed around: data fabric.
Data fabric focuses primarily on integration. It enables data integration, governance, and orchestration through an automated virtual layer that spans the entire ecosystem. Data fabric differs from data mesh in one key aspect: it does not offer decentralization of ownership. Instead, it focuses on creating a unified real-time view of all data.
For example, a global bank using a data fabric could integrate customer data from an on-premise system in one region with cloud data from another region, thus creating a unified source of truth.
Data Mesh vs. Data Fabric: A Quick Summary
Aspect |
Data mesh |
Data fabric |
Key focus |
Decentralized ownership of data |
Integration and unification of data |
Approach |
Domain-oriented and distributed |
Centralized, with a virtual automated layer |
Ownership |
Owned by individual domains |
Centralized governance without decentralization |
Primary goal |
Empower teams with domain-specific data |
Provide a unified, real-time view of data |
Example use case |
Autonomous teams managing their own datasets |
Integration of data on-premises and on cloud |
Both data mesh and data fabric provide distinct solutions to data management challenges. However, their choice depends on an organization’s specific structure, goals, and scale.
Key Principles of Data Mesh
Data mesh addresses the limitations of traditional centralized systems by encouraging decentralized, department-oriented data management. This paradigm shift rests on four core principles: department ownership, data as a product, self-serve data platforms, and federated computational governance.
Let us look at each in detail:
1. Department Ownership
Department ownership is a key aspect of the data mesh framework. A data mesh creates a decentralized access and ownership system instead of one central team 'owning' the data.
Teams closest to the data—the ones most likely to use it—are often in the best position to understand the subtleties. Allowing teams to manage, process, and serve their own data helps reduce bottlenecks.
Picture a large e-commerce company. Under a data mesh architecture, its product team owns data related to product listings, inventory, and pricing, while the marketing team maintains data on campaigns, customer engagement, and conversion rates.
Each team ensures its data is accurate, up-to-date, and ready for analysis, allowing them to respond quickly to market changes or customer needs.
2. Data as a Product
A key tenet of a data mesh framework is treating data as a product. This requires a shift in mindset, from perceiving data as a byproduct of business operations to regarding it as a valuable resource capable of resolving problems for both the organization and its customers.
Traditionally, data has been treated as a byproduct of business processes, a passive resource generated and stored primarily for compliance or historical purposes. Data as a product reverses this narrative.
Data as a product refers to the sourcing, unifying, and storing of data with the same diligence and attention a company bestows on any customer-facing product. Adopting the data as a product mindset will help create a pan-organization data mesh framework.
Just as product managers focus on refining product features, those implementing a robust data mesh model must take a user-centric approach, where the users are internal stakeholders rather than external customers.
This includes a deep focus on data quality, accuracy, and ease of access and use. You also need to pay attention to documentation and clearly describe every dataset. This may involve developing APIs that return real-time transaction data, with associated metadata indicating timestamps, location, and type of transactions.
Data security and regulatory compliance are also the primary components of a robust data mesh framework. Data mesh security must adhere to privacy regulations such as GDPR or PCI DSS. Role-based access control ensures that only authorized users can view sensitive transaction details.
3. Self-serve Data Platform
Data mesh uses the principles of department-oriented design to implement a self-serve data platform. A properly built and implemented data mesh model offloads the responsibility of data pipelines and infrastructure management from central teams to individual departments and replaces it with a shared foundation of capabilities.
According to Zhamak Dehghani, one of the originators of the data mesh concept, a key disadvantage of department-oriented design is that it often leads to redundant efforts across departments unless teams plan carefully.
Unless planned, each department could duplicate similar tools, processes, and infrastructure, which would be a source of inefficiency and inconsistency. Data mesh centralizes such department-agnostic infrastructure components in a shared platform.
This centralized platform focuses on data management. It consists of engines to build and run data pipelines, tools for storage and streaming, and shared services that provide monitoring, access control, and compliance.
The individual departments fully own and control their respective data. At the same time, the central platform provides tools that allow departments to build and implement their custom Extract, Transform, Load (ETL) pipelines specific to their needs.
Thus, the architecture strikes a balance between centralization and decentralization. The departments benefit from a standardized foundation but can innovate and customize as required.
In an e-commerce company, the product department might use the shared platform to transform product catalog data for analytics. In contrast, the marketing department creates pipelines for customer engagement data, tailoring workflows to their specific needs.
Data mesh combines shared, department-agnostic infrastructure with department-level autonomy to ensure an organization's scalability, consistency, and agility. This architecture empowers teams to assume data ownership while leveraging centralized capabilities to avoid redundant effort and maintain cohesion.
4. Federated Computational Governance
The data mesh model gives greater autonomy to teams for their own data which makes governance more important than ever. While teams have the flexibility to manage their own data, they still follow shared rules and standards. Departments may be autonomous, but overarching governance policies ensure compliance with regulations and standards. A robust federated computational model may determine how sensitive information, such as PII, should be treated; departments handling such information should, therefore, have the same security measures.
Consider the following analogy: The data department is a team in a relay race. Every runner has his or her own responsibilities, and the baton has to be passed without a break. The race gets compromised if one of the runners drops the baton because of misalignment.
Similarly, in a data mesh, standardized governance ensures smooth data flow across the organization.
By balancing independence with alignment, the data mesh can achieve agility and cohesion, thereby unlocking the full potential of distributed data in organizations.
Benefits of Data Mesh
Organizations are increasingly adopting data mesh owing to its decentralized structure and greater autonomy.
Data mesh provides the following advantages to organizations:
1. Accelerated Time-to-Insights
In the traditional centralized data model, obtaining insights can feel like an endless wait, hindered by the lack of cross-functional collaboration. Teams request data from a central group, wait in a queue, and hope that what they receive aligns with their needs.
Data mesh flips this process, enabling teams to access and use data independently. This accelerates the path from raw data to actionable insights.
ABN AMRO's implementation of a data mesh is a well-known example of how a data mesh model accelerates time to insights. One of the largest banks in the Netherlands, ABN AMRO, adopted the data mesh framework to address the growing challenges of managing its complex and diverse data ecosystem.
Through data mesh principles, ABN AMRO successfully transformed its approach to data management, allowing for quicker and better decision-making throughout the enterprise.
The bank aligned its data architecture with business departments such as retail banking, corporate banking, and risk management. Each department was responsible for owning, managing, and serving its data. Thus, the teams closest to the data held complete accountability for its quality, relevance, and timeliness.
For example, the risk management department developed data products that provided near real-time credit risk assessments. Decision-makers across other departments, such as corporate banking, used this data to make speedy and correct lending decisions.
ABN AMRO has significantly reduced the time it takes to generate insights. Decision-makers no longer have to wait weeks for reports; instead, they access up-to-date data directly from the relevant departments.
This upgrade to the data mesh framework helped the bank during the COVID-19 pandemic. It could quickly analyze customer behavior and roll out targeted relief measures, such as loan deferrals, in record time (Source: The Wall Street Journal).
2. Cost Efficiency
Data mesh architecture can provide cost efficiency to the organization by decentralizing data management and allowing real-time data processing. This has the potential to optimize resources, reduce operational expenses, and enhance scalability.
Data mesh architecture usually relies on cloud platforms. Resources can be scaled up or down in case of an increase or decrease in demand. This cloud-first approach and flexibility optimize resource utilization by helping you avoid overprovisioning.
Saxo Bank, a Denmark-based investment bank specializing in online trading and investment, adopted a data mesh framework to achieve cost efficiency.
Handling large datasets is a common challenge in the financial services industry, and Saxo Bank faced the same issue. In addition to the high cost of data management, the bank’s centralized data infrastructure caused delays in accessing actionable insights.
The bank adopted the data mesh model to address this issue. It transitioned from batch processing to a real-time pipeline using data streams. Under the old approach, batch processing jobs were often scheduled at off-peak hours, resulting in underutilized resources and extended wait times for insights.
Through data mesh-enabled real-time pipelines, the bank could analyze its data as it was being generated, avoiding delays and reducing the need for large, overnight jobs that required vast computing resources. With cloud flexibility, Saxo Bank could easily allocate additional computing power when required (Source: Business Today).
3. Technical Debt Reduction
A decrease in technical debt is an outcome of the decentralization of data. But what is technical debt, and how does it accumulate?
Technical debt refers to the hidden cost of extra work resulting from opting for quick, short-term solutions instead of more effective, long-term ones in software development or system design.
It builds up when code, architecture, or processes are not optimized, causing inefficiencies, maintenance issues, and the need for future refactoring.
The shift in narrative and outlook toward treating data as a product encourages adopting modern development practices such as continuous integration and deployment (CI/CD), which mitigates the risks associated with technical debt.
To fully grasp the depth of such a relationship, let us examine how CI/CD enables all data-as-a-product practices within each stage of the data lifecycle: ingestion, processing, quality validation, and deployment.
CI/CD pipelines automate software development and delivery. Data pipelines are complex systems that ingest, transform, and deliver data for downstream applications. Unlike traditional software systems, they depend on both code dependencies and a strong dependency on the data itself.
CI continuously integrates new changes into a central repository, validating the pipeline's integrity and transformations through automated tests. The CD takes this further by automatically deploying changes into production environments once they pass rigorous testing.
When applied to data as a product, CI/CD frameworks ensure that every data product—be it a dataset, a transformation pipeline, or an API—is always deployable, discoverable, and error-free.
At Spotify, CI/CD of data pipelines involves automated checks on its music metadata ingest pipelines. Running tests on millions of metadata entries about artists helps Spotify ensure the integrity of its music catalog before updating production, preventing errors that could misclassify songs or genres (Spotify Engineering).
Netflix also uses CI/CD to deploy updates to its recommendation model. Each change to the recommendation engine is tested on a small cohort of users before being rolled out to the entire user base. A new data pipeline introducing suboptimal predictions is automatically rolled back to the previous version (Source: ResearchGate).
As data generation volume and complexity grow rapidly, an increasing number of organizations are turning to data mesh as a data management model.
Data Mesh Use Cases
Data mesh is used across a range of industries and end-users. Let us look at some of the specific use cases of data mesh in organizations.
1. Real-time Fraud Detection
Decentralized ownership of transaction data allows fraud detection teams to process and analyze transactions as they happen. Using department-specific pipelines, they can flag unusual patterns in real time without waiting for central processing.
2. Operational Monitoring and Efficiency
Data mesh is applied in operations-heavy environments to track real-time performance metrics such as equipment uptime or delivery schedules. Thus, issues can be solved proactively by empowering department-specific teams to own and act on their operational data.
3. Dynamic Pricing Models
Data mesh decentralizes data pipelines that provide real-time data for pricing models used in dynamic pricing systems, such as travel or retail. Department teams can independently update and optimize models for their specific market segments.
4. Regulatory Compliance and Reporting
Compliance teams use data mesh to simplify regulatory reporting by owning department-specific datasets, ensuring that data is accurate and timely. Federated governance ensures consistency and compliance with legal standards.
5. Customized Business Intelligence Dashboards
Custom dashboards are vital in monitoring performance and enabling decision-makers to access tailored insights.
Traditional centralized systems often need more time to update dashboards due to bottlenecks in processing and delivering relevant data. Data mesh overcomes this by allowing individual teams to own their data and build views as needed.
6. Machine Learning (ML) Projects
Data mesh simplifies data preparation for ML by standardizing data across departments. Centralized data systems often become bottlenecks for data scientists who have to wait to gain access to specific datasets. Data mesh gives departments ownership and control of their datasets, allowing faster access to datasets for ML experimentation.
The use cases above demonstrate data mesh's versatility and impact across various organizations.
Best Practices for Transitioning to Data Mesh
Transitioning to the data mesh framework requires proper planning and organizational readiness.
Here are the best practices to follow for the transition:
1. Start with Small Pilot Projects
Begin with a department that offers clear and immediate business value, such as marketing, sales, customer analytics, fraud detection, or supply chain optimization, rather than implementing data mesh across the entire organization. Focusing on high-impact areas first allows for a more effective and manageable rollout.
A pilot project allows the organization to test the data mesh model, learn from the challenges, and refine its approach before scaling.
2. Invest in Self-serve Data Platforms
Build infrastructure that enables your teams to independently manage their own data pipelines, reducing reliance on central IT teams. This would include the development of ingestion, transformation, and visualization tools.
You also need to focus on enabling seamless integration of data mesh with existing data lakes, warehouses, or cloud storage. This self-service platform allows different teams to democratize their data management.
3. Develop Strong Governance Policy
Establish a federated governance model where each department adheres to organization-wide data security, privacy, and interoperability standards.
Key elements include standardizing metadata schemas and API protocols for seamless data sharing, enforcing compliance with regulations such as GDPR or HIPAA, and setting up a central committee to oversee governance policies. This balance of autonomy and oversight ensures consistency and prevents silos.
4. Alignment of People and Processes
Collaboration between technical and business teams is required for a data mesh transition. Department teams should be encouraged to establish goals with data engineers, analysts, and product managers.
They should regularly receive training, playbooks, or guidance on best practices for managing and governing data products. The leadership team should take the initiative to embrace and promote the new approach to data architecture.
If you are considering a transition to a data mesh model, start with small pilot projects and create strong governance and privacy policies. Strong leadership and open communication can help ensure a seamless shift for your organization.
Embracing Data Mesh for Scalable Future
Data mesh represents a pragmatic approach to data management that enables teams to discover and unlock the maximum value of their data assets. It decentralizes ownership by treating data as a product, enhancing accountability, scalability, and agility—key qualities needed in today’s fast-paced business environment.
Data mesh architecture provides a sustainable framework for democratizing data access without compromising governance and security. It is built on a foundation of department-driven design, self-serve infrastructure, and federated governance, creating a robust and resilient data ecosystem.
Tredence helps businesses of all sizes to build and deploy data mesh models effectively and successfully. Its expert team understands your unique data requirements and helps you transition to a data mesh framework in an agile manner.
Contact Tredence today to tap into the full potential of your data and enhance your decision-making with data-driven insights.
FAQs
1: What is a data mesh?
Data mesh is a decentralized data management approach where business domains own and manage their data as a product, enabling scalability, flexibility, and better alignment with business objectives.
2. What are the key principles of a data mesh model?
The core principles of data mesh include domain ownership, treating data as a product, providing self-serve infrastructure for data management, and ensuring federated governance to maintain consistency and security across the organization.
3. What are the primary challenges of adopting a data mesh framework?
Adopting the data mesh framework can be challenging due to cultural resistance to change, technical complexity of setting up infrastructure, and the risk of creating new silos.
AUTHOR - FOLLOW
Editorial Team
Tredence