# Dr. Shahzad Ahmad > ML/AI Accelerators for Deep Learning and Small LLMs Location: San Jose, California, United States Profile: https://flows.cv/drshahzadahmad Experienced Design Engineer with a demonstrated history of working on the design of Embedded Hardware and Software Systems for Digital Signal Processing, High Performance Computing. Skilled in Embedded Systems, Software Programming, Hardware Architectures ,Hardware Modeling and Synthesis , DSP Algorithm Modeling and Implementation, HPC and Hardware Implementation using Intel/AMD FPGAs and tool chains ## Work Experience ### FPGA AI Software Development Engineer @ Altera Jan 2024 – Present | San Jose, California, United States At Altera, I focus on implementing AI models for optimized FPGA deployment in various fields, including wireless communication and robotics. My role involves doin performance analysis and the development of custom hardware accelerators, ensuring efficient edge and PCIe-based platform implementations. I have experience in high-level design flows and deep learning applications. Overall extensively skilled and experienced in taking Algorithmic/Application specifications to FPGA Implementation targeting high performance acceleration. ### FPGA Software Development Engineer @ Intel Corporation Jan 2021 – Jan 2026 | San Jose, California, United States ### Senior Software Engineer @ Xilinx Jan 2017 – Jan 2021 | San Jose Design of Hardware IPs for Digital Signal Processing(DSP), High Level Synthesis (HLS) Libraries for FPGA Implementation, Hardware Accelerators and HPC using FPGAs. ### Assistant Professor @ FAST-NUCES Jan 2014 – Jan 2017 | Lahore Pakistan I worked as assistant professor in electrical engineering department of National University of Computer and Emerging Sciences Lahore. I taught courses like "Embedded System Design, Computer Aided Design of Digital Systems, Digital System Design, etc. " My research interest includes System Level Hardware/Software Design and modeling techniques, Hardware Design and RTL Architectures for Digital Signal Processing Systems, Video and Image processing,re-configurable hardware design, GPS/GNSS communication receiver etc. ### Intern in System Level Design Group at Cadence Design Systems @ Cadence Design Systems Jan 2013 – Jan 2013 Working with C-to-Silicon High Level Synthesis Tool R&D team. C-to-Silicon is a high level synthesis tool that can synthesize RTL description for hardware implementation starting from untimed C/C++ and SystemC. ### Research Assistant @ Politecnico di Torino Jan 2013 – Jan 2013 | Turin Area, Italy Working on a European project called FASTCUDA. The stated purpose of this project is "FASTCUDA is a platform that provides the necessary software tools, hardware architecture, and design methodology to efficiently adapt CUDA (a parallel-computing architecture and API driven by the GPU industry, with wide adoption in many diverse fields ranging from molecular dynamics, to computational chemistry, to image or video processing, etc.) into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are automatically partitioned into two groups: some are compiled and executed in parallel software, while the remaining are synthesized and implemented in hardware. A modern low power FPGA provides the processing power (via hundreds of embedded micro-CPUs) and the logic capacity for the execution and communication needs of all software and hardware components." ### PhD Student Worked on Model-based High Level Synthesis in MicroElectronics Group @ Politecnico di Torino Jan 2010 – Jan 2013 I worked on model-based high level hardware/software synthesis and design space exploration techniques. I am tried to exploit capabilities offered by high hardware level synthesis at higher level of abstraction during modeling. I incorporated information in higher level models that can be exploited latter in the design flow for different multiple purposes like, hardware software partition ,hardware/software trade-off analysis, low power hardware synthesis, automatic design space exploration and automatic incorporation of standard low power techniques and Hardware IPs for high level synthesis. ### Visiting Researcher @ Grenoble INP - Institut polytechnique de Grenoble Jan 2012 – Jan 2012 Worked on the FPGA implementation of image processing algorithms. More specifically on the design of non-linear image processing kernels using high level hardware synthesis techniques and a memory optimization framework. ### Student @ UET Lahore Jan 2003 – Jan 2006 | Lahore Pakistan ## Education ### Doctor of Philosophy - PhD in Electronics Engineering Politecnico di Torino ### Masters in Wireless and Communication and Related Technologies Politecnico di Torino ### Bachelor's degree in Electrical and Electronics Engineering UET Lahore ## Contact & Social - LinkedIn: https://linkedin.com/in/drsab --- Source: https://flows.cv/drshahzadahmad JSON Resume: https://flows.cv/drshahzadahmad/resume.json Last updated: 2026-04-11