• Formulated data conditioning and feature generation pipelines for insurance claims and policy datasets using Python, Pandas, and NumPy to support risk prediction modeling, reducing data preparation time by 30%.
• Architected insurance risk classification and claims severity prediction models using Scikit-learn and XGBoost, applying feature selection, cross-validation, hyperparameter tuning, and SHAP explainability to interpret underwriting risk drivers.
• Designed and trained deep learning models to detect fraud and predict policyholder behavior using TensorFlow and PyTorch, implementing neural network architectures and batch training to improve fraud detection accuracy by 18%.
• Conducted model experimentation and version-controlled ML development workflows using Git and GitHub, tracking experiments and improving reproducibility of machine learning development pipelines.
• Containerized insurance analytics and fraud detection pipelines using Docker and MLflow, enabling experiment tracking and scalable deployment of underwriting and claims models, supporting 25+ training experiments per month.
• Managed large-scale insurance policy and claims data pipelines using Snowflake and AWS Glue, integrating cloud warehousing with distributed processing workflows for actuarial analytics, processing 8M+ policy and claims records.
• Built NLP pipelines for insurance claim notes using NLTK and ONNX, enabling entity extraction for fraud detection, processing 120K+ documents and testing federated learning for privacy-preserving models.