• Led end-to-end development of a speech deepfake detection system leveraging wav2vec audio encoder and a Mamba state space classifier model; achieved 6.5% Equal Error Rate on an unseen dataset, reaching state-of-the-art performance and enabling financial institutions to proactively detect fraudulent calls in customer support operations.
• Architected and deployed a real-time speech emotion recognition system by modifying wav2vec and applying post-training regularization (batch-level variance maximization, dimension-level covariance minimization), improving six-class accuracy to 88%. Enabled quantification of customer emotional states, enhancing agent empathy and adaptive support to boost customer satisfaction and team professionalism.
• Built Digital signal processing data augmentation pipelines to simulate diverse recording conditions, ran W&B hyperparameter sweeps, and deployed the system on AWS/GCP for scalable, robust performance.
• Analyzed dataset domain shift through signal processing techniques and comparative experiments, generating insights into cross-dataset robustness and failure modes for fraud detection.