# Yupeng T.

> LLMs, Multimodal AI, ML Systems & Inference | M.S. CS @ Georgia Tech | MLE Intern @ GMI Cloud | Research Assistant

Location: Mountain View, California, United States
Profile: https://flows.cv/yupengt

I am a Master’s student in Computer Science at Georgia Tech working on large-scale multimodal AI systems across the full stack, from model behavior to inference infrastructure. My work is centered on a simple goal: building foundation models and AI systems that are reliable, efficient, and deployable at scale, not just impressive on benchmarks.
 
I study how large language models and vision-language models represent information, revise decisions, use evidence, and preserve uncertainty across language, vision, and real-world signals. In parallel, I build the systems required to train, evaluate, and deploy them in practice, including GPU-optimized generation and inference pipelines, large-scale evaluation platforms, and production-grade retrieval and serving infrastructure.
 
I am most interested in problems where frontier modeling and hard systems engineering have to coexist, especially reliable multimodal inference, evidence-grounded generation, scalable evaluation, and robust deployment under real operational constraints. I care about work that is empirically rigorous, technically demanding, and built to survive contact with real data, real workloads, and real-world failure modes.

I am currently exploring opportunities in multimodal applied research, machine learning engineering, and roles where strong engineering meets advanced modeling. I am also open to software engineering and data science positions where this foundation can create impact.

## Work Experience
### Research Assistant @ Georgia Institute of Technology
Jan 2025 – Present | Atlanta, GA
- ICML 2026 submission (first author)
Built a causally controlled audit framework for LLM decision revision, distinguishing belief updating from reputation-driven compliance. Introduced token-level log-odds probing and preference–report divergence analysis across 31K+ trials, revealing systematic expertise-sensitivity failures.

- NeurIPS 2026 submission (first author)
Identified a post-retrieval evidence-ignoring failure mode in multimodal RAG, and introduced a retrieval-conditioned auditing framework revealing that matched retrieval success can still hide sharply different evidence-use behavior across VLMs.

- NeurIPS 2026 submission (first author)
Developed a distillation framework that teaches compressed vision–language models when to doubt by transferring uncertainty trajectories from large teachers, significantly improving calibration, robustness, and selective prediction under visual corruption.

- EMNLP 2026 (in preparation; first author)
Developed a localization-based evaluation framework for event-boundary understanding in LLMs, using temporal negative controls and human calibration to reveal fragile alignment with human temporal segmentation.

- Software Engineering (Multi-LLM engine)
Architected a scalable evaluation platform integrating 9+ chat services and 100+ API models, enabling reproducible large-scale reliability audits. Reduced browser automation memory footprint by ~40% via a custom BrowserView layer (vs. Playwright/Selenium) and built a structured LLM-as-a-Judge backend with persistent storage and automated deployment.

### Teaching Assistant @ Georgia Institute of Technology
Jan 2025 – Present | Atlanta, GA
- ME 4710 : Foundations in Machine Learning for Engineers (Fall 2025).
- MGT 6655 : Business Data Preparation & Visualization (Spring 2026).
- Campus Academic Integrity TA Team supporting OMS Analytics and OMS Cybersecurity (Spring 2026).
- Built a privacy-preserving TA Q&A system for MGT 6655 (graduate level, 100+ students) from the ground up, developing an end-to-end Python pipeline to transform Ed Discussion data into RAG and SFT datasets (JSONL) with schema-tolerant parsing and metadata traceability, and deploying a grounded LLM-based assistant with configurable retrieval and embedding backends plus GPU-ready evaluation workflows for scalable, reproducible inference.

### Machine Learning Engineer @ GMI Cloud
Jan 2025 – Jan 2025 | Mountain View, CA
- Optimized Flux-Schnell (12B DiT) multimodal inference on H100 by implementing GPU memory persistence, offload strategies, and kernel-level tuning, achieving ~30 images/min and 1–2s latency per request on a single GPU compared to the 10–15× slower baseline.
- Designed a multi-GPU–ready inference architecture (NCCL-compatible, ONNX → TensorRT conversion pipeline) and validated linear-scaling behavior on single-GPU prototypes to support future distributed deployment.
- Built production-grade serving infrastructure including queueing, heartbeat monitoring, structured logging, GCS integration, and safety filtering, enabling stable long-running operations under high request volume.
- Implemented a video super-resolution pipeline (Real-ESRGAN + FastAPI) with PSNR/SSIM evaluation, reducing 5s@24fps clip runtime by ~65% (284s → 100s) when integrated with Wan2.2 text-to-video.
- Developed an AI-powered e-commerce try-on service (ComfyUI, Flux-Kontext + Segformer), delivering <5s per image outfit changing, background removal, and style transfer via secure RESTful APIs.
- Synthesized research papers and open-source model documentation to produce a technical review of multimodal generation systems, covering text-to-image, text-to-video, and super-resolution/upscaling model families and summarizing key benchmark findings for internal evaluation.

### Research Assistant @ Georgia Institute of Technology
Jan 2025 – Jan 2025 | Atlanta, GA
- Developed a spatiotemporal modeling framework for high-frequency sensor data (947K samples) with large-scale training on HPC infrastructure. 
- Proposed a physics-informed sequence model with structured inductive biases, achieving strong out-of-distribution generalization across unseen locations (Temp RMSE 0.43 °C; RH RMSE 1.3%). 
- Built a scalable sparse-to-dense inference pipeline for high-resolution prediction. Resulted in a first-author Q1 journal submission.

### Graduate Student Researcher @ Georgia Institute of Technology
Jan 2024 – Jan 2024 | Atlanta, GA
- Reworked the llama.cpp decode path (C++) for multi-request inference by introducing request batching and a concurrency-aware scheduler, improving throughput by 1.5–2.0× while reducing tail latency under load.
- Performed system-level profiling over long-horizon generations (10K+ tokens) and 1–16 concurrent requests, identifying KV-cache reads and memory bandwidth pressure as the primary bottlenecks in autoregressive decoding.
- Optimized KV-cache access and memory behavior across CPU and GPU paths, including CUDA kernel-level improvements to reduce redundant memory movement and improve memory access efficiency during decoding.
- Refined KV-cache reuse and allocation strategy to mitigate fragmentation and stabilize latency, achieving a 30%+ reduction in variance under sustained workloads.
- Built a modular benchmarking framework for throughput (tokens/sec), latency, and scaling curves, enabling reproducible evaluation of batching, scheduling, and memory optimization strategies.

### Independent AI Systems Engineer @ Stealth Startup
Jan 2024 – Jan 2024
- Architected an AI-powered recommendation system that analyzes millions of Amazon product reviews to help users quickly discover the most relevant and high-quality items through semantic search and LLM-based understanding.
- Developed a PySpark ETL pipeline to clean, tokenize, and embed reviews (768-dim via text-embedding-005), storing vectors and metadata efficiently in BigQuery for hybrid semantic retrieval.
- Designed a hybrid retrieval engine (ScaNN + metadata filters) that improved nDCG@3 by +21% (0.85 vs 0.70) and achieved MRR = 0.88, using approximate nearest neighbors (TreeAH + AVQ) with reranking via FastAPI microservice.
- Integrated Google Gemini with LangChain for RAG-based sentiment analysis and feature summarization, achieving 88% accuracy and 4.3 / 5 relevance for explainable recommendations.
- Provisioned scalable infrastructure on GCP (Cloud Run, BigQuery, Cloud Storage) using Terraform, sustaining ~6 s query latency and 92% product-category coverage across 500 test queries.

### Undergraduate Researcher @ Shandong University
Jan 2023 – Jan 2023 | Shandong, China
- Developed an enhanced Bidirectional Rapidly-Exploring Random Tree (Bi-RRT) algorithm for autonomous vehicle path planning in complex parking lot environments.
- Implemented adaptive probabilistic sampling and local trajectory smoothing modules, improving exploration efficiency and reducing curvature in dense obstacle fields.
- Integrated real-time collision detection, dynamic obstacle avoidance, and kinematic feasibility validation for continuous, safe navigation under motion constraints.
- Achieved 2× faster planning speed, ~35% smoother paths, and 15% shorter average trajectory length compared to baseline RRT and RRT*, validated across 100+ randomized test scenarios.

### Kaggle Competitor – American Express Default Prediction @ Shandong University
Jan 2022 – Jan 2022 | Shandong, China
- Ranked 20th out of 4,874 teams (Top 0.4%) in the American Express – Default Prediction global Kaggle competition.
- Developed a weighted ensemble of LightGBM (DART) and GPU-accelerated XGBoost models on 16 GB tabular time-series data covering transactions, balances, delinquencies, and repayments.
- Led model tuning and ensemble strategy, optimizing hyperparameters via grid search and stratified 5-fold cross-validation.
- Designed diverse feature sets—including lag features, rolling statistics, and trend indicators—and trained multiple seeds to boost stability, delivering a compact, high-performing solution that outperformed all baselines.

### Research Intern @ Shandong University
Jan 2021 – Jan 2022 | Shandong, China
- Co-first & corresponding author of a peer-reviewed international conference paper on automatic image colorization.
- Designed a novel lightweight GAN pipeline (U-Net generator + ResNet18 discriminator) and introduced a YUV-channel separation technique, reducing training cost while boosting structural fidelity and perceptual sharpness.
- Stabilized adversarial training with optimized objectives (re-weighted “realness” reliability term and tuned loss balance), improving color fidelity and transfer robustness under diverse textures and scenes.
- Scaled experiments on 4.3K+ natural & animated images in PyTorch with extensive visual comparisons, consistently outperforming baselines in visual quality and detail preservation.

### Publicity Manager — Starlight Art Troupe @ Shandong University
Jan 2019 – Jan 2021 | Shandong, China
- Directed the design and production of 30+ posters, flyers, and digital media assets to promote events, boosting audience turnout by 25% and strengthening brand recognition.
- Managed social media operations and curated engaging content, streamlining workflows and driving a 40% increase in follower engagement over two semesters.
- Coordinated 170+ photo/video shoots and post-production using Photoshop, Canva, Adobe Illustrator, Lightworks, and CapCut, delivering polished outputs on tight timelines.
- Led event planning and promotion with cross-functional teams, fostering community participation and earning the Outstanding Individual Award for Student Organizations at Shandong University.

### Undergraduate Teaching Assistant @ Shandong University
Jan 2020 – Jan 2021 | Shandong, China
- Supported a 400+ student Linear Algebra course, driving grading, records management, and personalized learning support; recognized as Outstanding Teaching Assistant for exceptional dedication to student success.
- Assessed 600+ handwritten assignments with clear, actionable feedback, directly boosting student understanding and measurable performance outcomes.
- Tracked weekly attendance and maintained meticulous, audit-ready academic records, enabling accurate progress reviews and timely academic interventions.
- Answered 30+ student questions weekly via online forums, delivering detailed explanations and real-world examples to clarify key concepts like eigenvalues, matrix operations, and vector spaces.


## Education
### Master of Science - MS in Computer Science
Georgia Institute of Technology

### Bachelor of Engineering - BE in Artificial Intelligence
Shandong University

### Science Stream
Bazhong Tanghu Foreign Language Experimental School


## Contact & Social
- LinkedIn: https://linkedin.com/in/yupeng-tang

---
Source: https://flows.cv/yupengt
JSON Resume: https://flows.cv/yupengt/resume.json
Last updated: 2026-04-17