The client, a Fortune 500 MRO distributor, faced significant challenges in real-time data processing and machine learning. To address these issues, they sought to build a data product-oriented Data Mesh using AWS and Databricks. This initiative aimed to improve data accessibility, processing efficiency, and governance across their new architecture.
Deploying Databricks for real-time data processing and adopting a Data Mesh Architecture on AWS to enhance operational agility and scalability for a Fortune 500 MRO distributor.
In the AWS environment, Tredence implemented a robust Databricks architecture using a hub-and-spoke model:
The spoke was built and managed by the Tredence team in collaboration with the client’s platform hub team. The goal was to construct a scalable backend infrastructure to support Databricks' Platform while adhering to best-in-class standards.
According to the design:
Developed a library of 70+ reusable functions to standardize data processing.
Implemented a micro application for handling late-arriving data, ensuring timely processing.
Created audit tables with integrated alters to reconcile data post-ingestion downstream.
Implemented a cutting-edge architecture focused on domain-specific insights and overall data management.
Enabled API-based job deployment through GitHub Actions, enhancing deployment efficiency.
Enhanced Tredence Accelerator to support Near Real-Time (NRT) and streaming data processing.
Designed and deployed a Notebook-based tool for flexible, user-driven data exports based on distribution rules.
Developed dashboards on Databricks for insightful data visualization and enhanced decision-making.
Achieved 4x cost and runtime reduction by migrating workflows to Databricks and AWS during POC.
Operationalized 10+ Near Real-Time data pipelines to support continuous and efficient data processing.
Automated the setup of 6 Shared Databricks Workspaces, improving resource management and collaboration.