A large global hospitality and entertainment firm specializes in hotels, resorts, dining services, sports venues, gaming, and retail. Due to the diversity of data sources, the firm's technical environment lacked a unified perspective on transactional data. Additionally, data quality challenges inhibited effective data consumption by business units.
To address these issues, Tredence developed a modern data platform utilizing Databricks and Alation to enhance data quality, categorization, and governance.
The customer needed a consolidated view of transactional data, which was dispersed across various systems, hindering effective reporting. There were data quality challenges, and the existing design wasn't scalable enough for large-scale analytical solutions or data storage. Additionally, there was a lack of data governance and stewardship.
The objective was to develop a fault-tolerant, uniform, and scalable ingestion framework that would be easy to configure.
To achieve this, Tredence utilized Databricks and AWS services to build a unified data platform, and Alation to enhance data governance, quality, and accuracy.
The platform is primarily built around AWS services, with some exceptions such as Databricks, Terraform, and Alation:
Tredence collaborated with the customer to develop and build a modern data platform utilizing AWS cloud services and Databricks.
Tredence designed medallion architecture with three layers: raw, core, and presentation layer. The process began with ingesting data from the landing zone in S3, including various file patterns (CSV, JSON, XML, Text) from various source systems into the raw layer. This layer effectively flattens various files into a row and column format. The core layer uses a star schema to simplify the data model and accelerate aggregation. The final layer, known as the presentation layer, includes several materialized views based on the consumption requirements of specific groups.
The solution architecture for the platform is shown in the diagram below:
Configurable end-to-end ingestion architecture
Scalable automated data consumption pipelines
Minimal costs for operations and maintenance