The customer's data storage had a chaotic structure, the data extraction process became very long, and daily data could not be downloaded in a day. Frequent errors in the database interfered with the work.
We have offered our code base:
Migrated all ETL pipelines from Python scripts to Airflow, adding all Airflow benefits
Migrated the existing DWH by changing the Redshift cluster and moving the heaviest data sources to Redshift Spectrum
Implemented data pipelining to calculate ML model using AWS Batch and Docker
Saved storage costs by 50% and increased query performance by 6 times. ETL processes were implemented, which reduced the raw data collection time by 12 times. Resource savings of about 95%