Interested in streaming and batch processing.
Sunnyvale, California, United States
Project: AI powered Data Understanding
Building ADK agents and A2A for Data Governance.
Using LLMs and Supervised Learning to label data and infra assets across google to enable Data Governance.
Used Few shot examples and prompt engineering to build LLM annotators for labelling code and policy artifacts.
Worked with expert human labelers to create golden datasets and Evals.
Researching Agentic Labelling flows using Gemini Reasoning models and internal MCP alternatives.
Project: AI Chatbot
Created an AI chatbot to answer customer queries using Retrieval Augmented Generation on top of Google Gemini LLM.
Built the knowledge base pipelines for RAG by ingesting documentation, code, tickets and incidents to an Embeddings index.
Built a custom team benchmark to generate a numeric eval score. Few permutations of models, knowledge corpus and context window size were rated to get the best score.
2018 — 2024
Sunnyvale, California, United States
Tech lead for a data engine primarily used for embeddings pipelines.
Built from the ground up in C++ with a custom self service control plane. We offer real time, micro batch and batch processing for Gmail, Google Photos, Gpay and Location. We enable processing on over 700 Trillion rows and over an Exabyte of data.
General Responsibilities
Acted as Tech Lead for 4 devs. Scoped out projects by working with Product and Clients. Broke down projects into incremental milestones. Worked with developers to execute and land milestones.
Owned Client engagement as an Infra Tech Lead. Led Client Office hours meetings to review designs and bugs.
Project: Batch Processing Service
Implemented next-gen batch processing scheduler using Topological ordering and reduced CPU cost by up to 50%.
Designed detailed cost attribution and processing filters so clients could understand and reduce expenses.
Implemented UIs, load shedding, isolation and automation to allow clients to operate in a self service manner.
Project: Embeddings pipelines for Google Photos
Scaled our batch processing system from 5 to 10 Million+ qps to enable Launch of Photos Memories feature.
Indexed 250 Trillion rows for the Photos app search feature.
Designed aggregation pipelines for Gpay to reduce dashboard freshness latency from 2 days to 5 mins.
Miscellaneous Projects
Improved stream processing SLO from 99% to 99.7% by improving load distribution while maintaining high utilization.
Evolved our processing system to support multiple namespaces by sharding each layer in our distributed monolith.
Onboarded Gpay as a client. They used the sharded system I built for data aggregation and executive dashboards.
San Francisco Bay Area
Developed a fraud detection engine using Java 8, Spring MVC and Cassandra (Required high write throughput)
Built React components for our Merchant Configuration portal. Developed REST api using Node + Express backend
Improved maintainability by implementing Behavior Driven Development using Mocha, Expect and Rewire
Technologies: React | ES6 | NodeJs | Express| Mocha | Java | Spring Boot | Cassandra | AWS ( RDS, EC2)
San Francisco Bay Area
Optimized Cassandra driver install speed by 85% by delivering binaries for all platforms via the Python Package Index
Worked on a framework to automatically create Jenkins jobs by parsing Github repos. Setup builds for Python, C++ drivers.
Technologies: Docker | Python | Git | Jenkins | Shell Scripting
2013 — 2016
Bangalore
Developed a Data pipeline using Kafka and Cassandra to stored aggregated data
Designed a JavaScript framework for embedding chat clients that decreased client on boarding time by 70%
Implemented JAX-RS RESTful API using Jersey that served user roles used to render UI components
Technologies : Java | Jersey | JPA | MySQL | Kafka | Cassandra| ES5| Jasmine| BackboneJs | Grunt
Education
2007 — 2011
Indian Institute of Technology, Bombay
Bachelor's degree
2007 — 2011
2016 — 2018
San José State University
Master’s Degree
2016 — 2018