# Yupeng T. > LLMs, Multimodal AI, ML Systems & Inference | M.S. CS @ Georgia Tech | MLE Intern @ GMI Cloud | Research Assistant Location: Mountain View, California, United States Profile: https://flows.cv/yupengt I am a Master’s student in Computer Science at Georgia Tech working on large-scale multimodal AI systems across the full stack, from model behavior to inference infrastructure. My work is centered on a simple goal: building foundation models and AI systems that are reliable, efficient, and deployable at scale, not just impressive on benchmarks. I study how large language models and vision-language models represent information, revise decisions, use evidence, and preserve uncertainty across language, vision, and real-world signals. In parallel, I build the systems required to train, evaluate, and deploy them in practice, including GPU-optimized generation and inference pipelines, large-scale evaluation platforms, and production-grade retrieval and serving infrastructure. I am most interested in problems where frontier modeling and hard systems engineering have to coexist, especially reliable multimodal inference, evidence-grounded generation, scalable evaluation, and robust deployment under real operational constraints. I care about work that is empirically rigorous, technically demanding, and built to survive contact with real data, real workloads, and real-world failure modes. I am currently exploring opportunities in multimodal applied research, machine learning engineering, and roles where strong engineering meets advanced modeling. I am also open to software engineering and data science positions where this foundation can create impact. ## Work Experience ### Research Assistant @ Georgia Institute of Technology Jan 2025 – Present | Atlanta, GA - ICML 2026 submission (first author) Built a causally controlled audit framework for LLM decision revision, distinguishing belief updating from reputation-driven compliance. Introduced token-level log-odds probing and preference–report divergence analysis across 31K+ trials, revealing systematic expertise-sensitivity failures. - NeurIPS 2026 submission (first author) Identified a post-retrieval evidence-ignoring failure mode in multimodal RAG, and introduced a retrieval-conditioned auditing framework revealing that matched retrieval success can still hide sharply different evidence-use behavior across VLMs. - NeurIPS 2026 submission (first author) Developed a distillation framework that teaches compressed vision–language models when to doubt by transferring uncertainty trajectories from large teachers, significantly improving calibration, robustness, and selective prediction under visual corruption. - EMNLP 2026 (in preparation; first author) Developed a localization-based evaluation framework for event-boundary understanding in LLMs, using temporal negative controls and human calibration to reveal fragile alignment with human temporal segmentation. - Software Engineering (Multi-LLM engine) Architected a scalable evaluation platform integrating 9+ chat services and 100+ API models, enabling reproducible large-scale reliability audits. Reduced browser automation memory footprint by ~40% via a custom BrowserView layer (vs. Playwright/Selenium) and built a structured LLM-as-a-Judge backend with persistent storage and automated deployment. ### Teaching Assistant @ Georgia Institute of Technology Jan 2025 – Present | Atlanta, GA - ME 4710 : Foundations in Machine Learning for Engineers (Fall 2025). - MGT 6655 : Business Data Preparation & Visualization (Spring 2026). - Campus Academic Integrity TA Team supporting OMS Analytics and OMS Cybersecurity (Spring 2026). - Built a privacy-preserving TA Q&A system for MGT 6655 (graduate level, 100+ students) from the ground up, developing an end-to-end Python pipeline to transform Ed Discussion data into RAG and SFT datasets (JSONL) with schema-tolerant parsing and metadata traceability, and deploying a grounded LLM-based assistant with configurable retrieval and embedding backends plus GPU-ready evaluation workflows for scalable, reproducible inference. ### Machine Learning Engineer @ GMI Cloud Jan 2025 – Jan 2025 | Mountain View, CA - Optimized Flux-Schnell (12B DiT) multimodal inference on H100 by implementing GPU memory persistence, offload strategies, and kernel-level tuning, achieving ~30 images/min and 1–2s latency per request on a single GPU compared to the 10–15× slower baseline. - Designed a multi-GPU–ready inference architecture (NCCL-compatible, ONNX → TensorRT conversion pipeline) and validated linear-scaling behavior on single-GPU prototypes to support future distributed deployment. - Built production-grade serving infrastructure including queueing, heartbeat monitoring, structured logging, GCS integration, and safety filtering, enabling stable long-running operations under high request volume. - Implemented a video super-resolution pipeline (Real-ESRGAN + FastAPI) with PSNR/SSIM evaluation, reducing 5s@24fps clip runtime by ~65% (284s → 100s) when integrated with Wan2.2 text-to-video. - Developed an AI-powered e-commerce try-on service (ComfyUI, Flux-Kontext + Segformer), delivering <5s per image outfit changing, background removal, and style transfer via secure RESTful APIs. - Synthesized research papers and open-source model documentation to produce a technical review of multimodal generation systems, covering text-to-image, text-to-video, and super-resolution/upscaling model families and summarizing key benchmark findings for internal evaluation. ### Research Assistant @ Georgia Institute of Technology Jan 2025 – Jan 2025 | Atlanta, GA - Developed a spatiotemporal modeling framework for high-frequency sensor data (947K samples) with large-scale training on HPC infrastructure. - Proposed a physics-informed sequence model with structured inductive biases, achieving strong out-of-distribution generalization across unseen locations (Temp RMSE 0.43 °C; RH RMSE 1.3%). - Built a scalable sparse-to-dense inference pipeline for high-resolution prediction. Resulted in a first-author Q1 journal submission. ### Graduate Student Researcher @ Georgia Institute of Technology Jan 2024 – Jan 2024 | Atlanta, GA - Reworked the llama.cpp decode path (C++) for multi-request inference by introducing request batching and a concurrency-aware scheduler, improving throughput by 1.5–2.0× while reducing tail latency under load. - Performed system-level profiling over long-horizon generations (10K+ tokens) and 1–16 concurrent requests, identifying KV-cache reads and memory bandwidth pressure as the primary bottlenecks in autoregressive decoding. - Optimized KV-cache access and memory behavior across CPU and GPU paths, including CUDA kernel-level improvements to reduce redundant memory movement and improve memory access efficiency during decoding. - Refined KV-cache reuse and allocation strategy to mitigate fragmentation and stabilize latency, achieving a 30%+ reduction in variance under sustained workloads. - Built a modular benchmarking framework for throughput (tokens/sec), latency, and scaling curves, enabling reproducible evaluation of batching, scheduling, and memory optimization strategies. ### Independent AI Systems Engineer @ Stealth Startup Jan 2024 – Jan 2024 - Architected an AI-powered recommendation system that analyzes millions of Amazon product reviews to help users quickly discover the most relevant and high-quality items through semantic search and LLM-based understanding. - Developed a PySpark ETL pipeline to clean, tokenize, and embed reviews (768-dim via text-embedding-005), storing vectors and metadata efficiently in BigQuery for hybrid semantic retrieval. - Designed a hybrid retrieval engine (ScaNN + metadata filters) that improved nDCG@3 by +21% (0.85 vs 0.70) and achieved MRR = 0.88, using approximate nearest neighbors (TreeAH + AVQ) with reranking via FastAPI microservice. - Integrated Google Gemini with LangChain for RAG-based sentiment analysis and feature summarization, achieving 88% accuracy and 4.3 / 5 relevance for explainable recommendations. - Provisioned scalable infrastructure on GCP (Cloud Run, BigQuery, Cloud Storage) using Terraform, sustaining ~6 s query latency and 92% product-category coverage across 500 test queries. ### Undergraduate Researcher @ Shandong University Jan 2023 – Jan 2023 | Shandong, China - Developed an enhanced Bidirectional Rapidly-Exploring Random Tree (Bi-RRT) algorithm for autonomous vehicle path planning in complex parking lot environments. - Implemented adaptive probabilistic sampling and local trajectory smoothing modules, improving exploration efficiency and reducing curvature in dense obstacle fields. - Integrated real-time collision detection, dynamic obstacle avoidance, and kinematic feasibility validation for continuous, safe navigation under motion constraints. - Achieved 2× faster planning speed, ~35% smoother paths, and 15% shorter average trajectory length compared to baseline RRT and RRT*, validated across 100+ randomized test scenarios. ### Kaggle Competitor – American Express Default Prediction @ Shandong University Jan 2022 – Jan 2022 | Shandong, China - Ranked 20th out of 4,874 teams (Top 0.4%) in the American Express – Default Prediction global Kaggle competition. - Developed a weighted ensemble of LightGBM (DART) and GPU-accelerated XGBoost models on 16 GB tabular time-series data covering transactions, balances, delinquencies, and repayments. - Led model tuning and ensemble strategy, optimizing hyperparameters via grid search and stratified 5-fold cross-validation. - Designed diverse feature sets—including lag features, rolling statistics, and trend indicators—and trained multiple seeds to boost stability, delivering a compact, high-performing solution that outperformed all baselines. ### Research Intern @ Shandong University Jan 2021 – Jan 2022 | Shandong, China - Co-first & corresponding author of a peer-reviewed international conference paper on automatic image colorization. - Designed a novel lightweight GAN pipeline (U-Net generator + ResNet18 discriminator) and introduced a YUV-channel separation technique, reducing training cost while boosting structural fidelity and perceptual sharpness. - Stabilized adversarial training with optimized objectives (re-weighted “realness” reliability term and tuned loss balance), improving color fidelity and transfer robustness under diverse textures and scenes. - Scaled experiments on 4.3K+ natural & animated images in PyTorch with extensive visual comparisons, consistently outperforming baselines in visual quality and detail preservation. ### Publicity Manager — Starlight Art Troupe @ Shandong University Jan 2019 – Jan 2021 | Shandong, China - Directed the design and production of 30+ posters, flyers, and digital media assets to promote events, boosting audience turnout by 25% and strengthening brand recognition. - Managed social media operations and curated engaging content, streamlining workflows and driving a 40% increase in follower engagement over two semesters. - Coordinated 170+ photo/video shoots and post-production using Photoshop, Canva, Adobe Illustrator, Lightworks, and CapCut, delivering polished outputs on tight timelines. - Led event planning and promotion with cross-functional teams, fostering community participation and earning the Outstanding Individual Award for Student Organizations at Shandong University. ### Undergraduate Teaching Assistant @ Shandong University Jan 2020 – Jan 2021 | Shandong, China - Supported a 400+ student Linear Algebra course, driving grading, records management, and personalized learning support; recognized as Outstanding Teaching Assistant for exceptional dedication to student success. - Assessed 600+ handwritten assignments with clear, actionable feedback, directly boosting student understanding and measurable performance outcomes. - Tracked weekly attendance and maintained meticulous, audit-ready academic records, enabling accurate progress reviews and timely academic interventions. - Answered 30+ student questions weekly via online forums, delivering detailed explanations and real-world examples to clarify key concepts like eigenvalues, matrix operations, and vector spaces. ## Education ### Master of Science - MS in Computer Science Georgia Institute of Technology ### Bachelor of Engineering - BE in Artificial Intelligence Shandong University ### Science Stream Bazhong Tanghu Foreign Language Experimental School ## Contact & Social - LinkedIn: https://linkedin.com/in/yupeng-tang --- Source: https://flows.cv/yupengt JSON Resume: https://flows.cv/yupengt/resume.json Last updated: 2026-04-17