Overview
This project was delivered as a freelance engagement (~30 hours total) to modernize the data architecture and address critical issues from manual pipeline runs and on-premises infrastructure.
The problem
Processing relied on Python scripts executed manually, backed by a local PostgreSQL database. That setup had several limitations:
⚠️ No automation or orchestration for pipelines.
⚠️ High operational risk from manual execution.
⚠️ Limited scalability and maintainability.
⚠️ No centralized, reliable cloud environment.
⚠️ Weak handling of credentials and sensitive variables.
⚠️ Limited traceability and monitoring of runs.
The solution
The architecture was modernized with a cloud migration and orchestrated pipelines:
-
Database migration
- Move from local PostgreSQL to Cloud SQL (PostgreSQL 14) on GCP
- Higher availability, scalability, and reliability
-
Orchestration with Airflow (Cloud Composer)
- DAGs to automate pipelines
- Daily, weekly, and monthly schedules
- Robust, reusable workflows
-
SFTP integration
- Ingestion over SFTP
- End-to-end automation of collection and loading
-
Secrets and sensitive variables
- GCP-native secret storage integrated with DAGs
-
Standards and governance
- Code and pipelines aligned to best practices
- Better observability and traceability
Results
- 100% automated runs, removing manual steps.
- Stronger security for credential management.
- Scalable, reliable infrastructure on GCP.
- Fewer operational errors and less rework.
- Better visibility via Airflow.
- More consistent, predictable ingestion.
- A solid foundation for future platform evolution.