Img-Reboot

Scaling Media Mix Models and Unifying Data with AWS at a Global Athletic Footwear and Apparel Supplier

Summary

A major global athletic footwear and apparel supplier sought to enhance their Media Mix Models and establish robust data pipelines to efficiently manage and analyze large volumes of data from various sources.

  • Challenge 1: Integrate disparate data from both internal and external sources to enable comprehensive data modeling and actionable insights.
  • Challenge 2: Address the inconsistency in data frequency across sources, which required manual intervention and hampered timely analysis.

Goal

Tredence expanded its Media Mix Models globally, creating robust data pipelines that span a wide range of data sources. By leveraging PySpark, Tredence consolidated data into uniform frames on Databricks on AWS, integrating seamlessly with S3. Automated Delta Lake pipelines guaranteed dependable data delivery for downstream applications.

Approach

The project streamlined data collection and transformed processes for the client’s data science team. Our solution included:

  • Data Collection:
    • Aggregating data from various sources, such as CSV files, AWS S3 buckets, internal tables, and third-party applications.
    • Storing CSV files from third-party applications in S3 buckets.
  • Data Transformation:
    • Reading the stored data from S3 buckets.
    • Transforming the data using Databricks on AWS according to specific requirements.
  • Data Storage:
    • Storing the transformed data back into S3 as Delta Tables.
  • Data Consumption:
    • Enabling the client's Data Science Team to conduct further analysis and processing using the newly created tables.
    • This architecture provided a streamlined approach to data ingestion, transformation, and storage, utilizing AWS S3 and Databricks on AWS to create a robust data pipeline for the client’s data science team.

Solution Architecture

AWS Services Used

  • Amazon S3

Key Benefits

  • Successfully scaled media mix models across multiple geographies for comprehensive marketing analysis.
  • Unified Data Ingestion: Leveraged PySpark (Databricks API) to standardize data frames and integrated them into AWS S3, simplifying data ingestion from diverse sources.
    • Automated Data Pipelines: Implemented Delta Lake on AWS Databricks and developed scripts to automate data pipelines, ensuring consistent data flow for downstream applications.

Results

Icon Boost

Efficient Data Pipelines:
Developed robust pipelines to efficiently manage and process extensive data volumes from varied sources.

Icon Boost

Improved Data Consistency:
Ensured reliable and consistent data delivery to downstream applications, enhancing the accuracy and timeliness of insights.

Icon Boost

Optimization Opportunities Identified:
Analyzed and optimized client's data sources and collection methods for improved data quality and reduced redundancies.

Talk To Us