[ Case Study ]
Data Revamp: From Chaos to Efficiency with Airflow ETL, 6x Query Boost, and 98.8% Resource Savings
Customer Request
BPMobile was grappling with an increasingly cluttered data storage system.
The data extraction process became very long, and daily data could not be downloaded in a day. Frequent errors in the database interfered with the work.
Project Goal
Our primary objectives were to streamline BPMobile's data storage system, hasten the data retrieval process, and substantially reduce the occurrence of errors during data collection.
Challenges faced
-
Transitioning all ETL pipelines from conventional Python scripts to Apache Airflow.
-
Navigating the cluttered, inefficient state of the client's previous data storage system.
Solution and Technologies
We have offered our code base: Migrated all ETL pipelines from Python scripts to Airflow, adding all Airflow benefits.
Migrated the existing DWH by changing the Redshift cluster and moving the heaviest data sources to Redshift Spectrum.
Implemented data pipelining to calculate ML model using AWS Batch and Docker.
This change not only streamlined processes but also enhanced efficiency.
Conclusions on the Project
We executed a comprehensive redesign of BPMobile's Data Warehouse, resulting in a 50% reduction in storage costs and amplifying query speeds sixfold. Our revamped ETL processes accelerated raw data collection by a factor of 12. Notably, previous pipeline instability issues were entirely resolved.
The introduction of a specialized data pipeline for ML model calculations led to an impressive resource-saving