• Overhauled the team’s user activity reports generation and maintenance process, built and maintained backend pipeline and packages, identified and resolved 20+ tickets..
• Refactored the dedicated report generation spark jobs into a single overarching spark job to produce all reports, while the vendor requirements are represented in JSON configuration files as inputs.
• Identified and resolved 5+ production-breaking discrepancies.
• Condensed 7000 lines of code down to 1500 lines, reducing system complexity by over 75%, with the use of Scala, Spark, Python, and Airflow.
• Shortened team’s new report onboarding time from over 20 days down to just under 3 days.
• Increased job runtime and decreased cost by 20%, by migrating to Spark 3.4.0 and AWS EMR 6.12.