A major hospitality client had their vital revenue management dashboards running on Python Bokeh, which required significant overhead for data processing, transformation, and plot rendering. This setup resulted in challenges with user experience and scalability.
Given that the report inputs came from S3 sources and PostgreSQL, AWS serverless was a natural choice for processing and reporting. Additionally, using AWS Glue and other managed services offered benefits such as faster processing, robust logging, and efficient error handling.
To design and build a cost-effective solution that could extract, transform, and load daily report data for up to 8,000 properties. The aim was to ensure that all data is fully refreshed within a span of 6 hours to enable real-time analysis.
The process started with focused data extraction, listing all input sources were listed along with their data sizes for database capacity planning. Tredence designed and developed ETL pipeline using AWS serverless architecture with managed services to load data from multiple sources into snowflake.
The solution architecture included a common glue job, an initial job to load historic data, and four other glue jobs for daily data load. The common job generated a list of entities with the latest data available for extraction and triggered daily jobs using Lambda. This job created a file and placed it in an S3 location to trigger subsequent jobs via Lambda. The four individual daily jobs then transformed the input data into four different reports consumed by the final dashboard plots. In case of any failures, a secondary Lambda job was triggered to send email alerts using SNS.
Airflow was employed for robust scheduling and workflow execution, with triggers based on the arrival of a file in the final S3 source bucket. Airflow also provided monitoring features such as logging, error handling, and alerting. Snowflake tables were created for final data storage, and seven Tableau dashboards were designed and developed with live connections to provide up-to-date information on the properties. The entire process was scheduled to run four times a day, with a full data refresh performed from all sources.
Observed steady performance with the onboarding of additional properties, decreasing the total execution time by 25x at a minimal cost compared to the Bokeh dashboard.
Onboarded up to 3000 properties on to the pipeline, and their dashboards are ready for use.
Observed consistent performance improvements while onboarding more properties, achieving a 25x reduction in total execution
Total estimated cost for 8000 properties is only $1000 per month