• Developed and maintained highly scalable ETL pipelines using Databricks and dbt, ensuring data integrity and seamless integration across multiple data sources.
• Designed and optimized data workflows, reducing processing times by 30% through the use of Apache Spark and Databricks, resulting in improved data pipeline efficiency.
• Orchestrated complex workflows using Apache Airflow to automate daily pipeline operations, increasing system reliability by 25% and minimizing system downtime.
Key aspects of the pipelines:
• Utilized dbt macros to standardize transformations and ensure code reusability.
• Implemented Liquid Clustering to optimize data query performance, particularly in large datasets.
• Applied Partitioning strategies to enable faster data retrieval and processing, improving overall pipeline efficiency.
Leveraged Delta Lake for handling data versioning and ensuring data consistency in a scalable manner.