Experience
2025 — Now
• Worked with stakeholders to define real agricultural problems and translate them into clear use cases.
• Collected and cleaned data from FAO, USDA, and research sources to create reliable training datasets.
• Designed the data pipeline for ingestion, validation, and versioning so models are always trained on trusted data.
• Trained and fine-tuned domain LLMs, optimizing accuracy, cost, and latency for production-ready deployment.
• Mentored a graduate student in data preprocessing, pipeline design, and model evaluation, speeding up delivery and knowledge transfer.
2024 — 2025
2024 — 2025
• Built an XGBoost model to predict federal contract bids, improving forecast accuracy by 20% and guiding pricing decisions.
• Engineered and validated interpretable features to help pricing teams understand and justify model predictions.
• Developed PySpark ETL pipelines for automated ingestion, reducing data latency by 60%.
Tuned hyperparameters and compared XGBoost with baseline models to justify the final model choice.
2024 — 2025
2024 — 2025
Boulder, Colorado, United States
Project 1: Hours were spent by researchers on reviewing speech data and segmenting trials. The goal was to make a Streamlit dashboard powered by IBM Watson to segment trials and reduce the review time.
• Developed a Streamlit dashboard with IBM Watson Speech-to-Text to transcribe and segment 100+ learning trials, enabling faster identification of spoken number responses and reducing manual review time by an estimated 60%.
Project 2: The goal was to train an attention-based LSTM to convert between numbers, words, and visual blocks, helping compare how children learn numbers with how well a model can mimic that learning.
• Provided data-driven insights to advance the study of human cognition and the development, education, and learning of children, leading to publications.
• Trained an attention-based LSTM model to translate between Arabic numerals, number words, and visual blocks under three conditions: (1) numerals + words, (2) blocks + words, and (3) all three combined.
Project 3: Make a data analysis pipeline enabling analysis of how colors versus stickers influence how quickly new learners pick up typing on a keyboard.
• Engineered data workflows for trial segmentation and gaze metric extraction (fixation area, duration, switching), resulting in high-quality derived datasets for downstream statistical modeling.
• Boosted frame-level classification accuracy of egocentric video data by 30% using a Vision Transformer, enabling precise AOI tracking.
2020 — 2023
Project 1: The goal was to develop backfill pipelines to process late-arriving/missing subscriber data and figure out the source of latency, ensuring accurate and timely campaign execution.
• Extracted late-arriving records from upstream systems, processed, and analyzed payloads to categorize errors by type, enabling targeted fixes.
• Conducted multi-day analysis to identify patterns and error sources, informing preventive measures and improving pipeline reliability.
Project 2: Developed a data pipeline to generate clean, reliable datasets for weekly leadership reporting and business analysis.
• Built an ETL pipeline to extract data from PostgreSQL tables, aggregate, and transform weekly data, ensuring accurate numbers for senior leadership reporting to stakeholders
• Optimized pipeline performance to efficiently process growing historical data in HDFS, improving processing speed and resource utilization while maintaining data accuracy.
Project 3: Built a PySpark ETL job for automated monitoring to ensure reliable data processing every day and alerting otherwise.
• Developed PySpark transformations to calculate record counts and other metrics, ensuring daily data consistency and correctness across inputs and outputs.
• Reduced manual monitoring efforts and improved data reliability by 45%, enabling faster detection of issues and minimizing potential disruptions.
2020 — 2020
Kochi, Kerala, India
Improved epilepsy surgery planning by identifying seizure-onset zones from SEEG data and visualizing neural patterns to inform clinical decisions.
• Developed statistical and ML models to localize seizure onset zones from SEEG recordings.
• Built a clinical visualization tool mapping seizure origin nodes, supporting surgical decisions, and authored a peer-reviewed journal publication in the UK.
Education
University of Colorado Boulder
Master of Science - MS
Amrita Vishwa Vidyapeetham