SWE at Celonis | MS CS grad at Columbia University
I'm interested in distributed databases and machine learning and have around 3 years of experience at building high volume data systems that serve ML and data science applications.
Reduced API latency by 40% by optimizing SQL queries and further identified pagination changes that result in 95% improvement.
•
Profiled Kubernetes pod production usage patterns based on CPU and memory metrics and identified 15% cost reduction opportunity.
•
Designed and implemented an error framework that differentiates between user and system error that helped reduce noise in alerts ultimately improving developer productivity.
Architected and executed a Dynamic Error Classification System.
•
Allows us to dynamically categorize any error in the system with a readable error message to improve UX and control retry behaviour based on error.
•
Decreased time to deploy an error classification change from multiple hours to 1 minutes.
•
Removed dependency on engineers and code changes to classify errors. Now Product Managers and Support staff can handle errors.
•
Cut down errors displayed to user reduced by 50% for specific sources.
Designed and implemented a feature in our job scheduler (Handyman) to automatically schedule jobs based on resource needs across machines with different hardware resources (RAM, disk storage).
•
This was mainly implemented to support ingestion jobs that required downloading multi-GB files. These jobs were automatically scheduled to run on nodes with large disk storage and automatically rerun on same node to continue ingesting same file without re-downloading.
Built a Destination Cost Recommendation Framework.
•
Automatically collects metadata statistics about data warehouses being used in Hevo and stores it on a data lake.
•
Automatically calculates these statistics and makes recommendations for the users to reduce the cost of using the warehouse with Hevo.
Improved ingestion rate by 8x for Google Analytics Connector by sampling data volume and intelligently distributing workload across parallel jobs.
Designed and implemented an autonomous and robust integration with Kafka as a source.
•
Designed to scale out and scale in when it detects high data in source. Used linear regression and source data retention thresholds to automatically expand to accommodate extra data and scale in to save costs.
Integrated Firebolt as a Destination.
•
Tackled ambiguous requirements, early documentation to deliver Firebolt on time by using library greps, debugging tools.
•
Delivered Firebolt integration first in the market, giving Hevo an advantage and exclusive partnership deals.
•
Added new features such as Parquet support and new key types in our Mapping component.
Optimized sideline events flow.
•
Reduced the time taken for each sidelined event to be visible to the users by at least 2x (5+ minutes to 1 minute).
•
Added visibility for users to understand the state of the events as soon as possible.