# Sasi Bonu

> AI/ML Engineer @ Network Theory Applied Research Institute | Ex-Wipro | 5 Years in AI, NLP & Large-Scale ETL Systems | Actively looking for new opportunities.

Location: United States, United States
Profile: https://flows.cv/sasibonu

AI/ML Engineer | Data Science Specialist | Research-Driven Problem Solver

I’m Sasi Jyothirmai Bonu, an AI/ML Engineer with 5+ years of experience building machine learning and data science solutions across research, telecom, and agriculture. I enjoy taking real-world data and turning it into models and pipelines that actually help teams make better decisions and move faster.

My work spans end-to-end ML systems, from data ingestion and feature engineering to model training, evaluation, and deployment. I’ve built production ETL pipelines, trained predictive models for pricing and decision support, and developed research-grade ML systems that led to peer-reviewed publications. I’m especially excited about applied multimodal ML and building tools that bridge research and real-world impact.

Key Achievements:
• Built ML models (XGBoost, LSTM, Vision Transformers) for pricing prediction, learning research, and vision-based tracking, improving accuracy by up to 30%
• Developed PySpark ETL pipelines that cut data latency by 60% and improved data reliability by 45%
• Created ML-powered dashboards to automate speech segmentation and trial analysis, reducing manual review time by ~60%
• Published research on medical ML and brain signal analysis, with multiple ongoing academic collaborations

Skills & Tools:
Languages & Data: Python, SQL, PySpark, R
ML/AI: NLP, XGBoost, LSTM (attention), Vision Transformers, supervised & unsupervised learning
Data Engineering: ETL pipelines, HDFS, PostgreSQL, MySQL
Visualization & Apps: Streamlit, Tableau
Domains: Telecom, Research, Agriculture

Let's connect!
I’m actively seeking ML Engineer / Applied Scientist / Data Engineer roles where I can build impactful ML systems at scale. 
Email: sasibonu98@gmail.com
Phone: +1 (720) 341-3315

## Work Experience
### AI/ML Engineer @ Network Theory Applied Research Institute
Jan 2025 – Present
- Worked with stakeholders to define real agricultural problems and translate them into clear use cases.
- Collected and cleaned data from FAO, USDA, and research sources to create reliable training datasets.
- Designed the data pipeline for ingestion, validation, and versioning so models are always trained on trusted data.
- Trained and fine-tuned domain LLMs, optimizing accuracy, cost, and latency for production-ready deployment.
- Mentored a graduate student in data preprocessing, pipeline design, and model evaluation, speeding up delivery and knowledge transfer.

### AI /ML Engineer @ SAMstream
Jan 2024 – Jan 2025
- Built an XGBoost model to predict federal contract bids, improving forecast accuracy by 20% and guiding pricing decisions.
 - Engineered and validated interpretable features to help pricing teams understand and justify model predictions.
 - Developed PySpark ETL pipelines for automated ingestion, reducing data latency by 60%.
Tuned hyperparameters and compared XGBoost with baseline models to justify the final model choice.

### Data Scientist @ DEL Lab
Jan 2024 – Jan 2025 | Boulder, Colorado, United States
Project 1: Hours were spent by researchers on reviewing speech data and segmenting trials. The goal was to make a Streamlit dashboard powered by IBM Watson to segment trials and reduce the review time.
- Developed a Streamlit dashboard with IBM Watson Speech-to-Text to transcribe and segment 100+ learning trials, enabling faster identification of spoken number responses and reducing manual review time by an estimated 60%.

Project 2: The goal was to train an attention-based LSTM to convert between numbers, words, and visual blocks, helping compare how children learn numbers with how well a model can mimic that learning.
- Provided data-driven insights to advance the study of human cognition and the development, education, and learning of children, leading to publications.
- Trained an attention-based LSTM model to translate between Arabic numerals, number words, and visual blocks under three conditions: (1) numerals + words, (2) blocks + words, and (3) all three combined.

Project 3: Make a data analysis pipeline enabling analysis of how colors versus stickers influence how quickly new learners pick up typing on a keyboard.
- Engineered data workflows for trial segmentation and gaze metric extraction (fixation area, duration, switching), resulting in high-quality derived datasets for downstream statistical modeling.
- Boosted frame-level classification accuracy of egocentric video data by 30% using a Vision Transformer, enabling precise AOI tracking.

### Data Engineer (Project Engineer): Data Analytics & AI @ Wipro Limited
Jan 2020 – Jan 2023
Project 1: The goal was to develop backfill pipelines to process late-arriving/missing subscriber data and figure out the source of latency, ensuring accurate and timely campaign execution.
- Extracted late-arriving records from upstream systems, processed, and analyzed payloads to categorize errors by type, enabling targeted fixes.
- Conducted multi-day analysis to identify patterns and error sources, informing preventive measures and improving pipeline reliability.


Project 2: Developed a data pipeline to generate clean, reliable datasets for weekly leadership reporting and business analysis.
- Built an ETL pipeline to extract data from PostgreSQL tables, aggregate, and transform weekly data, ensuring accurate numbers for senior leadership reporting to stakeholders
- Optimized pipeline performance to efficiently process growing historical data in HDFS, improving processing speed and resource utilization while maintaining data accuracy.


Project 3: Built a PySpark ETL job for automated monitoring to ensure reliable data processing every day and alerting otherwise.
- Developed PySpark transformations to calculate record counts and other metrics, ensuring daily data consistency and correctness across inputs and outputs.
- Reduced manual monitoring efforts and improved data reliability by 45%, enabling faster detection of issues and minimizing potential disruptions.

### Research Assistant @ Amrita Institute of Medical Sciences and Research Centre
Jan 2020 – Jan 2020 | Kochi, Kerala, India
Improved epilepsy surgery planning by identifying seizure-onset zones from SEEG data and visualizing neural patterns to inform clinical decisions.
 - Developed statistical and ML models to localize seizure onset zones from SEEG recordings.
 - Built a clinical visualization tool mapping seizure origin nodes, supporting surgical decisions, and authored a peer-reviewed journal publication in the UK.


## Education
### Master of Science - MS in Data Science
University of Colorado Boulder

### Bachelor of Technology - BTech in Electronics and Communications Engineering
Amrita Vishwa Vidyapeetham


## Contact & Social
- LinkedIn: https://linkedin.com/in/sasi-bonu-98878116a
- Portfolio: https://sasibonu.github.io/

---
Source: https://flows.cv/sasibonu
JSON Resume: https://flows.cv/sasibonu/resume.json
Last updated: 2026-04-18