I'm a Software Engineer specializing in distributed systems and backend engineering, currently building real-time data processing platforms at ZoomInfo that handle 1M+ events daily across a 145M entity database.
Experience
2024 — Now
San Mateo, California, United States
• Architected and developed microservices for a real-time entity resolution platform processing 1M+ Kafka events daily across a 145M company database, implementing event-driven data pipelines with Spring Boot that orchestrate data ingestion, profiling, scoring, and persistence to BigQuery and Solr.
• Designed and implemented pluggable profiling engine with multiple specialized profilers, enabling extensible attribute normalization and scoring logic for company data attributes across 150+ data sources.
• Contributed backend features across 8 production repositories serving multiple engineering teams, ensuring code quality through comprehensive testing and cross-team code reviews.
• Improved pipeline validation speed by 50% by building an Airflow-orchestrated validation framework using Scala, processing large-scale Parquet/ORC datasets from S3, querying Snowflake for validation rules, and integrating Slack alerting for real-time anomaly detection across ETL workflows.
• Reduced production incident investigation time by 40% by designing and implementing a distributed traceability system in Scala that captures data lineage, scoring algorithms, and decision rationale across 150+ data sources; architected CDC pipeline using Apache Hudi with Parquet serialization to S3, orchestrated via Airflow.
2023 — 2024
Chicago, Illinois, United States
• Built scalable image processing pipeline using Apache Beam on AWS EMR to process millions of images from S3, designing schema and ETL workflows to populate metadata warehouse in AWS Redshift
• Evaluated and prototyped zero-shot object detection models, presenting technical findings to 20+ engineers; developed proof-of-concept using PyTorch demonstrating production viability for computer vision use cases
• Developed model fine-tuning pipeline for Pix2Struct vision transformer achieving 93% precision on specialized document detection tasks (license plates, odometers, VINs), implementing training infrastructure and evaluation frameworks
2021 — 2022
Mumbai, Maharashtra, India
• Developed 20+ production ETL pipelines integrating vendor APIs with Salesforce and SAP using Airflow, Python, Docker, and PySpark, supporting customer retention initiative that improved retention by 30%
• Improved system reliability by 50% by building monitoring infrastructure with Airflow to track 50+ GCS storage locations, implementing automated alerting system with customized email notifications to stakeholders
• Designed and implemented SQL-based analytics pipelines on Snowflake using DBT for data transformations, enabling sales KPI reporting for Indian market in collaboration with senior leadership
2019 — 2021
2019 — 2021
Mumbai, Maharashtra, India
• Architected and built ELT pipelines from Salesforce to BigQuery using Google Cloud DataFlow, designing data models and implementing data quality validation framework to support customer targeting systems (25% accuracy improvement)
• Optimized large-scale data aggregation processing terabytes of user activity data on BigQuery, improving data pipeline efficiency by 40% through query optimization and partitioning strategies • Developed Python-based image deduplication system processing binary-encoded property images, implementing similarity detection algorithms and hierarchical batching to improve database quality by 50%
• Built end-to-end ML pipeline automation using Airflow, Python, Docker, and PySpark, orchestrating data migration, feature engineering, model prediction, and retraining workflows, reducing manual intervention by 66%
• Implemented serverless data processing using Google Cloud Functions to extract and export daily prediction results from BigQuery GCS, accelerating downstream system consumption by 50%
Education
North Carolina State University
Master's degree
University of Mumbai