I am a software engineer at LinkedIn. I have a BS and MS in Computer Science from the Electrical Engineering and Computer Science (EECS) department at University of California, Berkeley. I have experience in data / ML infrastructure engineering.
Owned and optimized performance of the Abacus external data connector suite supporting all major cloud provider storage offerings and SaaS database services such as Snowflake, Databricks, and BigQuery
•
Increased throughput, decreased memory footprint, and enabled parallel scaling of the distributed batched inference service
•
Designed and implemented asynchronous LLM APIs / bots which allowed multiple users to have ChatGPT- like group conversations with Abacus LLMs in communication platforms such as Slack and Microsoft Teams
•
Created PySpark-based incremental datasets from database services for ETL into product
•
Built periodic full execution of cloud-hosted Jupyter notebooks for data science using scripting inside Kubernetes deployments
•
Implemented functionality to create / update custom ML models using customer-provided python code
Worked on the RISELab Cloudburst project with Professor Joseph Hellerstein
•
Deployed pre-trained ML models on the Cloudburst serverless framework to provide autoscaling resource allocation for inference functions
•
Built a space efficient algorithm for asynchronously aggregating and reporting keys across distributed server nodes in Cloudburst with high and low frequencies of access.
•
Worked on Ground project: www.ground-context.org with graduate student Vikram Sreekanti, in Professor Joseph Hellerstein's group
•
Added support for non-primitive datatype inputs for Ground
•
Created Ground implementation using Git repositories and functionality to construct a time-complexity efficient RESTful API for accessing Ground objects
Created an automated health report generator to monitor CPU and Memory usage of Apache Flink pipelines by the Uber AthenaX streaming analytics platform
•
Implemented a Python-based metric scraping framework for analyzing and storing timeseries data for AthenaX
•
Analyzed the resource usage efficiency of different types of AnthenaX pipelines