# Rajeev P. > Principal Engineer, AI @AMD Location: San Francisco Bay Area, United States Profile: https://flows.cv/rajeevp I architect GenAI inference systems at AMD, enabling state-of-the-art LLMs, VLMs, Stable Diffusion model optimization and efficienct inference on NPUs and GPUs. My work spans architecture, software, kernels, and runtime — building the infrastructure and frameworks that enable AMD to deploy next-generation AI models at scale. I lead cross-disciplinary teams, collaborate with top-tier AI companies (Meta, Google, emerging startups), and partner with AMD executives to define the roadmap for GenAI inference innovation. Key contributions: Architected AMD’s LLM inference backend for NPUs/iGPUs, now foundational for deploying LLMs, SD, and VLMs efficiently across Pytorch and ONNX frameworks. Pioneered low-bit inference innovations (3 & 4-bit operators, multi-LoRA, LoRA-based fine-tuning), redefining the performance/efficiency frontier. Filed 20+ patents in AI inference, operator design, and system-level optimizations. Published and submitted research papers on generative AI inference, bridging industry and academic systems research. Mentored and led cross-org innovation teams, ensuring solutions scale beyond individual projects. My mission: build systems, frameworks, and reproducible artifacts that shape AMD’s position as the leader in generative AI inference, from kernel-level operators to industry-scale deployment. Full research record: https://scholar.google.com/citations?hl=en&user=6BI01aMAAAAJ ## Work Experience ### Principal Software Engineer - LLMs, GenAI @ AMD Jan 2023 – Present • Principal Architect for LLM inference on AMD NPUs (Ryzen AI), defining the end-to-end system architecture enabling production deployment of state-of-the-art LLMs on client AI platforms. • Technical owner of the Ryzen AI LLM inference stack, spanning operator/runtime frameworks, performance optimization, PyTorch backend, ONNX Runtime enablement, and Day-0 model deployment. • Pioneered advanced inference techniques (3/4-bit operators, multi-LoRA, speculative decoding, inference forecasting), delivering industry-leading gains in efficiency, throughput, and latency while preserving model fidelity. • Primary technical interface to AMD executive leadership (CVP/EVP) and external AI labs, driving adoption of LLMs, agents, LoRA-fine-tuned models, and VLMs across Ryzen AI. • Recognized by CEO Lisa Su (2024) for delivering the first LLM on AMD Ryzen at Computex 2024; work later presented by the CEO at Computex 2025. • Inventor on 20+ patents and published researcher in ML systems and inference optimization; built and led applied AI teams, mentoring PhDs, hiring interns, and scaling research into production systems. ### Machine Learning Architect @ Rivian Jan 2022 – Jan 2023 • Architected silicon-aware ML inference optimization pipelines for autonomous driving, spanning fine-tuning, quantization, and model transformation to enable efficient execution on power and compute constrained hardware. • Drove hardware–software co-design for perception and fusion ML models, aligning compute constraints for real-time inference. • Invented and deployed a novel low-power inference optimization technique for self driving ML models; patent filed, demonstrating early leadership in hardware aware ML optimization. • Influenced early processor architecture (Rivian RAP) by translating ML workload characteristics into system-level compute requirements, shaping silicon design decisions upstream. ### Senior Member of Technical Staff @ AMD Jan 2021 – Jan 2022 * AMD Acquires Xilinx in Feb 2021. * Promoted to Senior Staff in 2022. * Architected and implemented FP32 Super-Resolution CNN and Perceptron accelerators on AI-Engines and FPGA fabric, including configurable compute kernels and programmable non-linear activation engines, enabling efficient mapping of large NLP models (BERT, Transformers) to custom hardware. * Developed end-to-end ML model deployment tools for heterogeneous CPU+FPGA+AI-Engine systems, including automatic kernel generation, performance analysis, and codegen pipelines; presented innovations at TVMCon 2021. * Model analysis and optimization of state-of-the-art ML models (DLRM, Transformer, Transformer-Transducer, Depth Estimation, SRCNN); architect and develop operators and software on custom accelerators ### Staff Design Engineer @ Xilinx Jan 2016 – Jan 2021 | San Jose, California * Architected and delivered high-performance heterogeneous accelerators for 5G Beamformer and HPC applications, including a 16-antenna Massive MIMO beamformer (1 GBps) and FP32 N-Body solver achieving 2 TFLOPs on a single ACAP. Demonstrated at Xilinx Developers Forum; solutions deployed in field and released on GitHub for HPC community. (Patent granted for on-chip memory access optimization) * Pioneered sparse neural network inference techniques, including structured sparse data compression/decompression for block-sparse ResNet-50 on ACAP, enabling competitive performance and efficient FPGA utilization. (Patent pending) * Developed FPGA prototyping systems and automated software tooling for MAC/FEC IP validation, including multi-SLR FPGA integration, C++ runtime software, and Python-based codegen and verification tools—reducing design and analysis time 3x and adopted across multiple hard-IP teams. * Led micro-architecture design, system integration, and runtime software development for AI-Engine and FPGA IPs; mentored junior engineers and interns, establishing design and verification best practices. * Published and presented at ISSCC 2020: co-authored “A Versatile 7nm Adaptive Compute Acceleration Platform Processor.” Skills & Tools: FPGA/ACAP Architecture, AI Engine, Python, C++, Verilog, HLS, Vivado, Vitis, RTL Linting, CDC, codegen pipelines, system integration, sparse NN inference ### Senior Member Of Technical Staff @ Inphi Corporation Jan 2010 – Jan 2016 | Westlake Village, California Rambus acquires Inphi’s memory division. Promoted to Senior Staff Engineer; lead software/validation responsibilities. ### Staff Engineer @ Inphi Corporation Jan 2010 – Jan 2016 | Westlake Village, CA • Operated in a fast-paced start-up, scaling from pre-IPO execution to full-scale production while delivering critical memory subsystem platforms. • Architected and delivered memory interface training software and systems for enterprise servers using Load-Reduced DIMM SoCs, enabling reliable operation of up to 784GB DDR memory at scale. • Led end-to-end architecture, validation, and deployment of software platforms for characterization and functional validation of high-speed memory buffer ICs, accelerating silicon bring-up and product readiness. • Designed and owned Zynq MPSoC-based FPGA validation platforms for DDR4/DDR5 subsystems, spanning RTL design, embedded software, and customer-facing tooling. • Built high-performance signal-processing and parallel compute frameworks for time/frequency-domain analysis, anomaly detection, and hardware-in-the-loop memory training, reducing field bring-up time and failure rates by ~4× while mentoring and scaling engineering talent. ### Graduate Teaching Assistant @ Dept. of ECE, North Carolina State University Jan 2009 – Jan 2009 | Raleigh, NC Teaching assistant for "Fundamentals of Logic Design" for sophomore students at Department of Electrical and Computer Engineering, North Carolina State University. ### Independent Research @ Centre for Efficient, Scalable & Reliable Computing, North Carolina State Univeristy Jan 2009 – Jan 2009 | Raleigh, NC 1. Investigated retention time aware allocation in DRAMs by the use of software profiling. Developed scheduling algorithms in C for optimal memory allocation. 2. Designed and implemented FPGA based SDRAM memory controller for prototyping the hypothesis. ### Intern at Infineon Technologies, Germany @ Infineon Technologies Jan 2007 – Jan 2008 | Warstein, Germany • Architect, build systems for analysis of Insulated Gate Bipolar Transistors (IGBT) under special operating conditions. ### Intern @ ECIL Jan 2006 – Jan 2006 | Hyderabad, India •Designed PIC micro-controller assembly code for position measurement sensor systems. ## Education ### Graduate Coursework (SCPD) in Artificial Intelligence Stanford University ### Master of Science in Computer Engineering North Carolina State University ### Bachellor of Technology in Electronics & Communication Engineering (Honors) SASTRA UNIVERSITY ### Master of Science - MS in Computer Engineering North Carolina State University ## Contact & Social - LinkedIn: https://linkedin.com/in/rajpatwari --- Source: https://flows.cv/rajeevp JSON Resume: https://flows.cv/rajeevp/resume.json Last updated: 2026-04-12