Software Engineer @ Nvidia ♦ ex-Amazon Robotics Data Science intern
I am a Software Engineer with strong Machine Learning and Data Science background. At Shoreline.io, I design and implement semantic search solutions for website and remediation runbook catalog using large language models and embedding techniques.
Designed and applied four comprehensive GPT prompt-based test metrics to assess search results and embedding models
•
Evaluated embedding models(Ada, Multiqa), chunked and embedded runbooks into FAISS and sqlite databases using langchain
•
Established a Git workflow for automated embedding files generation and Docker image creation for deployment
•
Implemented a responsive search endpoint for 800 runbooks with associated commands, achieving 500 ms response time
•
Developed server-specific custom search index workflow, allowing users to perform searches on their private runbooks using an optimistic concurrency control approach, caching with S3 storage and Redis background workers
•
Mentored and supervised two interns through onboarding, collaborating to successfully launch search endpoint on production
End-to-end data pipelines for ticketing tools(Pagerduty, Opsgenie, Servicenow)
•
Created and configured multiple Airbyte connectors(TypeScript) and data streaming scripts for data extraction, transformation, daily synchronization, and loading into PostgreSQL, providing user with near-instance data access
•
Unified diverse data sources schema and migrated 14000 tables using Flask and SQLAlchemy, optimizing codebase clarity • Implemented interactive UI data visualization reports(React.js), transforming backend SQL queries into filter-enabled graphs • Enabled runbook generation utilizing GPT-3.5 and prompt techniques to auto-create Generative AI incident remediation
Designed and implemented data preprocessing and modeling pipeline using AWS SageMaker for dynamically chute mapping update on sortation floor to replace expensive runtime simulator
•
Embedded high dimensional destination distribution data with station spatial location into multi-channel tensors and scatter images with aggregation to extract floor information, built surrogate models with Multi-layer Perceptron that improved 28.8% MAE and 0.527 R2 from Baseline to predict package throughput evaluation metrics
•
Compared approaches performance with PCA reduced Linear Regression, MLP, CNN and pretrained networks, interpreted features measures floor congestion, and developed recommendations for further modeling improvement
Co-lead of the FairVision Cam2 subteam. Utilizing Generative Adversarial Network(GAN) to remove image dataset biases in the training data of image recognition and classification algorithms.