top of page

[ Case Study ]

Data Revamp: From Chaos to Efficiency with Airflow ETL, 6x Query Boost, and 98.8% Resource Savings

iMac Pro Front View Mockup (2).png

Customer Request

BPMobile was grappling with an increasingly cluttered data storage system.


The data extraction process became very long, and daily data could not be downloaded in a day. Frequent errors in the database interfered with the work.

Project Goal

Our primary objectives were to streamline BPMobile's data storage system, hasten the data retrieval process, and substantially reduce the occurrence of errors during data collection.

Challenges faced

  1. Transitioning all ETL pipelines from conventional Python scripts to Apache Airflow.
     

  2. Navigating the cluttered, inefficient state of the client's previous data storage system.

Solution and Technologies

case_BP.jpg

We have offered our code base: Migrated all ETL pipelines from Python scripts to Airflow, adding all Airflow benefits.

Migrated the existing DWH by changing the Redshift cluster and moving the heaviest data sources to Redshift Spectrum.

Implemented data pipelining to calculate ML model using AWS Batch and Docker.


This change not only streamlined processes but also enhanced efficiency.

Conclusions on the Project

rm355-pf-s73-card-laptop-01-mockup.png

We executed a comprehensive redesign of BPMobile's Data Warehouse, resulting in a 50% reduction in storage costs and amplifying query speeds sixfold. Our revamped ETL processes accelerated raw data collection by a factor of 12. Notably, previous pipeline instability issues were entirely resolved.

The introduction of a specialized data pipeline for ML model calculations led to an impressive resource-saving

98,8%

bottom of page