New York, NY
Architected a Spark-based stream processing system for AWS CloudTrail audit log ingestion from Kafka, implementing precise batch-hour offset computation to minimize cross-boundary data and load results into our in-house data lakehouse.
Led the Azure Activity Logs ingestion modernization, replacing a legacy batch pipeline with a real-time Kafka-based streaming architecture enabling Spark summarization — significantly reducing Snowflake dependency and storage costs.
Engineered a distributed metadata service to solve event ordering challenges in high-throughput streaming pipelines, enabling Spark SQL to compute accurate batch windows for summarization; now a foundational component for all data pipelines migrating off Snowflake to Spark, delivering significant cost and performance improvements.
Resolved critical data integrity failures in a high-volume ingestion pipeline by implementing fault-tolerant chunked writes with checkpointing; independently identified and fixed a long-standing partial-read bug through a custom record-level parser.
Designed Kafka consumer offset bootstrapping infrastructure for new customer onboarding, enabling accurate event replay boundaries and unblocking downstream pipeline deployments at scale.
Drove migration of Kafka clusters to Kubernetes, enabling self-service platform management; engineered a cohort-based topic migration strategy with data-volume-aware segmentation, custom tooling, and observability metrics to ensure zero-disruption rollout.
Led end-to-end Elastic Kubernetes Service (EKS) audit log pipeline modernization to deprecate Snowflake: designed virtual topic epoch switching, full task lifecycle orchestration (startup, shutdown, config-driven restarts), and reliable chunked Kafka writes.
2024 — 2025
San Francisco Bay Area
2023 — 2025
Mountain View, California, United States
Led a complex GCP infrastructure migration project for over 300 customers, involving multiple backend components, ensuring zero data loss through an automated approach, reducing migration time by 500%.
Integrated Terraform into the migration strategy, handling diverse use cases and preventing configuration corruption, achieving a 50% reduction in manual intervention, and ensured deployment consistency.
Developed new CLI capabilities in Go with new API endpoints and modified existing ingestion services in Java, increasing ingestion efficiency..
Leveraged Google Pub/Sub-based audit logs to improve system speed by 100%, as well as availability and scalability. Enhanced data ingestion architecture in Java, supporting advanced security features and reducing alert response time from hours to seconds.
Designed and implemented an Azure Active Directory events processor in Rust to read from customers’ Azure event hubs, enhancing data integration capabilities.
Built a Snowflake connection pool watchdog monitor using thread channels to validate connections to Snowflake from the pipeline's backend Rust component, restarting the respective K8s pods if the ODBC driver connection hangs, ensuring continuous data flow, and eliminating on-call intervention.
Designed and implemented a config poller in Java to ensure consistency between the database and the configuration store, addressing a critical reliability issue.
Developed a stateful-retry mechanism in Java to handle AWS key deletion issues improving system resilience and reducing on-call incidents by half.
Helped develop automation testing for our GCPv2 ingestion service, increasing coverage to over 80%.
Addressed several data race issues in a key component in Java, ensuring stable log ingestion for integrations, and improving system stability by 60%.
Regularly engaged with SQL, Snowflake, Kafka, along with AWS and GCP services (IAM, EC2, S3, RDS, Pub/Sub) for robust system development.
2022 — 2022
San Jose, California, United States
Implemented a Kubernetes-based Event Driven Autoscaling system for an internal service to scale K8s pods automatically based on number replicas required, thereby reducing maintenance overhead.
Built a Kafka pipeline with Redpanda to transition an existing ingestion pipeline from AWS Kinesis to Kafka, thereby reducing overall costs and making data ingestion cloud agnostic.
Wrote unit tests, worked on bug fixes, monitored K8s logs thoroughly before shipping code.
Deployed the code across all regions including US, EU, and AU production environments.
Education
Purdue University
Bachelor of Science - BS
Tagore International School (Vasant Vihar)