Experienced Software Engineer with expertise in architecting and implementing real-time and batch data processing pipelines. Proven ability to design and scale data-intensive distributed applications, and enable analytics to drive business growth.
Experience
2023 — Now
2023 — Now
London, England, United Kingdom
2022 — 2023
2022 — 2023
Singapore
• Scaled the in-house Customer Profiling product to handle a high volume of data for the largest telecom company in the Philippines. In streaming mode, the system can now process an impressive 600,000+ events per second. Additionally, in batch mode, it can efficiently handle over 25+ terabytes of data daily. This significant scalability improvement ensures that it can effectively handle the large data loads required by the telecom company, enabling comprehensive customer analysis and profiling at a massive scale.
• Designed and led the implementation of Customer 360 for one of the first digital banks in the Philippines. Onboarded 100+ banking domain profile attributes for comprehensive customer analysis.
• Collaborated with cross-functional teams to architect and implement robust, real-time data processing pipelines using AWS, Flink, and Kafka, to consume banking transactions for the first digital banks in the Philippines.
• Designing and creating the roadmap for Customer Profiling V2 with support for Spark Structured Streaming, language-agnostic transformation, and containerized execution.
• Leading the design and roadmap creation of a data observability product for the internal platform. It will feature a data catalog, data lineage, intelligent attribute onboarding hints, and platform monitoring. These capabilities aim to minimize efforts during product outages and enhance overall data management efficiency.
2019 — 2022
2019 — 2022
Singapore
• Authored Cadenz Profiles, a dynamic customer profiling product that offers both batch and real-time capabilities. It enables automated intelligent marketing and service decisions. This innovative solution significantly improved the efficiency of onboarding profile attributes, increasing the rate to 10 attributes per day per engineer.
• Authored a set of versatile transformation modules, data sources, and data sinks that can be easily integrated into the Cadenz Profile. These components enable seamless interaction with various object stores, databases, and data stores. The primary goal of this effort was to simplify the integration process with different data storage systems, allowing for greater flexibility and compatibility.
• Led the implementation of an Early Warning System for a major bank in India, utilizing our in-house customer 360 product. The system effectively flagged suspicious transactions and collected user feedback. This initiative resulted in a notable reduction in duplicate loans obtained by individuals through multiple subsidiaries.
• Implemented a versatile framework that mirrors Kinesis events to a Kerberized Kafka, enriching them with additional metadata and ensuring exactly once processing. This solution enables seamless integration between the two systems while maintaining data integrity and reliability.
• Supervised a team to reimplement & revamp a 7-year-old Analytics platform with spark and AWS
• Developed a config-driven generic framework on Apache Spark which supports customizable and pluggable SQL-based data transformation to be able to onboard attributes faster
2016 — 2019
2016 — 2019
Chennai, Tamil Nadu, India
• Implemented an AWS-based Data Lake, utilizing event-based EMR job creation and data processing on Spark. Implemented role-based security for table and column-level access control, ensuring data protection and privacy.
• Collaborated with a team to build an Intelligent decision support system to gain a competitive advantage in pursuing opportunities in the government and public sector which gain insights from historical trends of tenders, bids, competition, and various factors. Worked on designing the complete architecture of the Data Gathering stage which also involves implementing a crawling and parsing engine
• Focused on performance optimization of Hive queries on the Azure platform. Conducted comparative analysis between HDInsight offerings and Azure Data Lake Analytics, exploring different file formats and compression techniques for improved efficiency.
• Collaborated with a team to add new features and tests to a generic MapReduce-based Ingestion and Extraction Framework. Also, Kerberized both frameworks to work on Kerberized Cluster
Education
National Institute of Technology Nagaland