I am a Master’s student in Computer Science at Georgia Tech working on large-scale multimodal AI systems across the full stack, from model behavior to inference infrastructure.

Experience

Georgia Institute of TechnologyResearch Assistant

2025 — Now

Atlanta, GA

ICML 2026 submission (first author)

Built a causally controlled audit framework for LLM decision revision, distinguishing belief updating from reputation-driven compliance. Introduced token-level log-odds probing and preference–report divergence analysis across 31K+ trials, revealing systematic expertise-sensitivity failures.

NeurIPS 2026 submission (first author)

Identified a post-retrieval evidence-ignoring failure mode in multimodal RAG, and introduced a retrieval-conditioned auditing framework revealing that matched retrieval success can still hide sharply different evidence-use behavior across VLMs.

NeurIPS 2026 submission (first author)

Developed a distillation framework that teaches compressed vision–language models when to doubt by transferring uncertainty trajectories from large teachers, significantly improving calibration, robustness, and selective prediction under visual corruption.

EMNLP 2026 (in preparation; first author)

Developed a localization-based evaluation framework for event-boundary understanding in LLMs, using temporal negative controls and human calibration to reveal fragile alignment with human temporal segmentation.

Software Engineering (Multi-LLM engine)

Architected a scalable evaluation platform integrating 9+ chat services and 100+ API models, enabling reproducible large-scale reliability audits. Reduced browser automation memory footprint by ~40% via a custom BrowserView layer (vs. Playwright/Selenium) and built a structured LLM-as-a-Judge backend with persistent storage and automated deployment.

Georgia Institute of TechnologyTeaching Assistant

2025 — Now

Atlanta, GA

ME 4710 : Foundations in Machine Learning for Engineers (Fall 2025).

MGT 6655 : Business Data Preparation & Visualization (Spring 2026).

Campus Academic Integrity TA Team supporting OMS Analytics and OMS Cybersecurity (Spring 2026).

Built a privacy-preserving TA Q&A system for MGT 6655 (graduate level, 100+ students) from the ground up, developing an end-to-end Python pipeline to transform Ed Discussion data into RAG and SFT datasets (JSONL) with schema-tolerant parsing and metadata traceability, and deploying a grounded LLM-based assistant with configurable retrieval and embedding backends plus GPU-ready evaluation workflows for scalable, reproducible inference.

GMI CloudMachine Learning Engineer

2025 — 2025

Mountain View, CA

Optimized Flux-Schnell (12B DiT) multimodal inference on H100 by implementing GPU memory persistence, offload strategies, and kernel-level tuning, achieving ~30 images/min and 1–2s latency per request on a single GPU compared to the 10–15× slower baseline.

Designed a multi-GPU–ready inference architecture (NCCL-compatible, ONNX → TensorRT conversion pipeline) and validated linear-scaling behavior on single-GPU prototypes to support future distributed deployment.

Built production-grade serving infrastructure including queueing, heartbeat monitoring, structured logging, GCS integration, and safety filtering, enabling stable long-running operations under high request volume.

Implemented a video super-resolution pipeline (Real-ESRGAN + FastAPI) with PSNR/SSIM evaluation, reducing 5s@24fps clip runtime by ~65% (284s → 100s) when integrated with Wan2.2 text-to-video.

Developed an AI-powered e-commerce try-on service (ComfyUI, Flux-Kontext + Segformer), delivering <5s per image outfit changing, background removal, and style transfer via secure RESTful APIs.

Synthesized research papers and open-source model documentation to produce a technical review of multimodal generation systems, covering text-to-image, text-to-video, and super-resolution/upscaling model families and summarizing key benchmark findings for internal evaluation.

Georgia Institute of TechnologyResearch Assistant

2025 — 2025

Atlanta, GA

Developed a spatiotemporal modeling framework for high-frequency sensor data (947K samples) with large-scale training on HPC infrastructure.

Proposed a physics-informed sequence model with structured inductive biases, achieving strong out-of-distribution generalization across unseen locations (Temp RMSE 0.43 °C; RH RMSE 1.3%).

Built a scalable sparse-to-dense inference pipeline for high-resolution prediction. Resulted in a first-author Q1 journal submission.

Georgia Institute of TechnologyGraduate Student Researcher

2024 — 2024

Atlanta, GA

Reworked the llama.cpp decode path (C++) for multi-request inference by introducing request batching and a concurrency-aware scheduler, improving throughput by 1.5–2.0× while reducing tail latency under load.

Performed system-level profiling over long-horizon generations (10K+ tokens) and 1–16 concurrent requests, identifying KV-cache reads and memory bandwidth pressure as the primary bottlenecks in autoregressive decoding.

Optimized KV-cache access and memory behavior across CPU and GPU paths, including CUDA kernel-level improvements to reduce redundant memory movement and improve memory access efficiency during decoding.

Refined KV-cache reuse and allocation strategy to mitigate fragmentation and stabilize latency, achieving a 30%+ reduction in variance under sustained workloads.

Built a modular benchmarking framework for throughput (tokens/sec), latency, and scaling curves, enabling reproducible evaluation of batching, scheduling, and memory optimization strategies.

Education

Georgia Institute of Technology

Master of Science - MS

Shandong University

Bachelor of Engineering - BE

Bazhong Tanghu Foreign Language Experimental School

Experience+6

Education

Master of Science - MS

Bachelor of Engineering - BE

Science Stream

Experience