# Hao(Frank) Wu > Software Engineer, CPU+SoC Architect Location: San Francisco Bay Area, United States Profile: https://flows.cv/haofrank - Adept at linux kernel, OS, computer architecture, SoC architecture, performance modeling (cycle-accurate CPU/SoC architecture simulation), algorithms, C/C++/x86/Python/Java/Go/SQL/Bash scripting languages - Experience in machine learning, cache side channel attacks, computer security, computer networks, distributed systems etc. ## Work Experience ### Staff Software Engineer (Kernel + LLM) @ NIO Jan 2023 – Present | San Jose, CA - hypervisor, virtualization and kernel related design work - pmu and spe sharing support - task affinity&cpu_mask and sched balancing work - vm-exit profiling design and performance measurement - suspend-to-ram related support - various passthrough device support and config (coresight, cpu&cache topology etc.) - various shmem driver unification work - LLM infra related backend-op/engine/serving layers implementations and optimizations for inference - emulation - s32g multicore IPC emulation - sysbus ufs device emulation support + sel4 ufs driver support for qemu ufs device ### SW Engineer/SoC + Platform Performance Architect (Google Cloud) @ Google Jan 2020 – Jan 2023 | Sunnyvale, CA - Currently working on the Google SoC project. Being responsible for the Google SoC performance simulation and keeping critical collaboration between US teams and the Israel CI2 team to get Cedar (2nd gen SoC) POR study results and Cedar tape-out. - Initiated the whole 1st version Google Mesh Simulator under gem5. Implemented the important Cedar features in the Mesh XPs (input VCs, route computation, dual channel support, RNI support, E2E tests, output-unit connections, code-base refactoring etc.). Mentored and ramped up multiple fellow workers; worked on other necessary features together (arbiter connections, credit-links, the bypass feature etc.). Generated study results (performance, mesh utilization) for meeting reviews from the US and CI2 performance/architecture/design teams. - Took repsonsibility for most of the initial mesh studies and verifications under gem5 (Cypress (1st gen SoC) mesh utilization correlation under various SLC hit rates, mesh back-pressure scheme study, gem5 arbitration scheme, buffer-size and link-latency verifications, gem5 request transaction laddar charts correlation, mesh req-latency avalanche point study, 3-cycle vs 2-cycle per hop latency study etc.). - Contributed to most of the initial simulator infrastructure development in gem5, include but not limited to stats collection protobuf framework, mesh performance/utilization visualization automation, etc. - Platform performance related projects: engaged in many aspects of the system architecture design for Google's internal services and cloud platforms (e.g. computing units, servers, storage, networking, accelerators etc.) by utilizing computer architecture, OS, perf modeling, data analysis skills. 1. CCX-aware scheduling (study of the IPC, QPS, kernel scheduling latencies, query latency, throughput differences for high-tier Google workloads). 2. Silent Data Corruption project. 3. Hyperthreading performance and efficiency studies for various workloads under AMD/Intel processors. ### CPU Architect @ Intel Corporation Jan 2016 – Jan 2020 | Santa Clara, CA - Currently working on a multi-purpose companion core project for trending workloads under Advanced Architecture Group - Took responsibility for CPU OOO/EXE performance modeling in the C++ simulator for the evolutional next generation core (NGC) development; worked with other CPU architects and design/validation engineers to guide the NGC design - Developed the path-finding features (mostly OOO/EXE units) into the CPU simulator (dual-dest uops study, port-ganging vs dual-dispatching study, double-banked PRF+freelists, EXE units hibernation study etc.); collected and analyzed performance data to give insights to the micro-architecture team to understand the benefits and trade-offs of the newly proposed micro-architectures ### CPU Performance Validation Architect/SW Engineer (SMI acquired by Intel) @ Intel Corporation Jan 2015 – Jan 2016 | Santa Clara, CA - Maintained infrastructures and scripts; developed and ameliorated infrastructures: bug filing automation, dashboard label/ticket linking automation, data collection automation, benchmark performance drift tracking tools - Performed Shasta CPU RTL debugging for OS boot; debugged and fixed psim simulator bugs (codec, decoder, front-end, scheduler, MMU, etc.); monitored test regressions for functional and performance issues - Developed new features into psim simulator codes (scheduler with different cancellation policy etc.), measured benchmark data with different psim/JIT configurations (different size BBR scheduler, ins blocks w/wo pairing and packing, etc.), added new features into disassembler, added new features into trace-driven-mode, collected performance data for presentation to the micro-architecture team. ### Graduate Research Assistant @ Princeton University Jan 2011 – Jan 2015 | Princeton, New Jersey, United States - Did a thorough performance measurement(e.g. IPC, Cache Miss Rate etc.) of a new secure cache design (Newcache) as data cache, L2 cache and instruction cache for carefully selected cloud server benchmarks under gem5 - Reconstructed representative RSA instruction cache side-channel attacks(towards libgcrypt 1.5.3 under Linux using normal 8-way SA L1 I-cache), and did experiments to evaluate Newcache's secure mechanism as instruction cache - Reconstructed the hooking functions for each operation of the Square-and-Multiply implementation of RSA(towards libgcrypt 1.5.3 under Linux), trained an SVM classifier, and used the classifier to do operation classifications, the accuracy of which represents a metric of vulnerability of different cache configurations - Visited the Institute of Parallel and Distributed Systems (IPADS) in Shanghai for 3 months, and tried to extend the I-cache side channel attacks to ARM Trustzone with TrustKernel OS developed by IPADS - Extended representative side-channel techniques to GUI-related shared-libraries (libgtk, libX11) under Linux - Studied on secure processor designs like Bastion, Intel SGX, ARM Trustzone, etc, and tried to implement Bastion on gem5 - Built a regression framework and a regression report generator, which helps the group to store and compare the history running-time of benchmarks with different LLVM backend optimization techniques - Implemented a theoretic offline x86 pass (similar to a compiler pass), by using induction variable expansion to do dynamic instruction renaming optimization - Tried to extend the implementation of the renaming-architecture under gem5, the main idea of which is to transform the work usually done by compiler optimizations, such as induction variable expansion, into simple hardware component and evaluate the benefits ### Graduate Teaching Assistant @ Princeton University Jan 2011 – Jan 2015 | Princeton, New Jersey, United States Teaching Assistant for COS333: Advanced Programming Techniques by Prof. Brian Kernighan - Spring 2013, Princeton University - Held office hours, graded programming assignments, held weekly status meetings for 10 different software engineering projects for web applications Teaching Assistant for COS217: Introduction to Programming Systems by Prof. Aarti Gupta - Spring 2015, Princeton University - Held office hours, studied course materials, did course projects, gave 1-hour precepts twice a week, graded programming assignments ### Undergrad Research Assistant @ University of Michigan Jan 2010 – Jan 2011 | Ann Arbor, Michigan, United States - Designed and manufactured a control box for miniature Knudsen pumps - Selected Siargo's flow sensors and Freescale's pressure sensors to measure the flow rates and pressures of the pumps - Assembled development board, microchip, voltage dividers, buffers, signal amplifiers, AC-DC converter and voltage scale-up circuit to support sensors, LCD display, and to supply Knudsen pumps with working voltage from 0V to 9.6V - Reconstructed the whole internal circuits of a remote-controllable power-supply GW-INSTEK-GPS-1850D - Did embedded programming for the DAC and ADC ports of the microchip dsPIC33FJ16GS502 ### OGX member @ AIESEC SJTU Jan 2008 – Jan 2009 - Matched Exchange Participants (EPs) in Mainland China with foreign companies that provide international intern positions - Make short videos to introduce AIESEC and our Local Committee (LC) SJTU and strengthen the relations between LCs and companies of different countries - Recruited excellent and diligent students as our EPs ### Technical Support @ Parametric Technology Corporation Jan 2008 – Jan 2008 - Translated technical and tutorial documents concerning Pro/E, Arbortext, Windchill and other softwares for customers and staff - Wrote an application program with VBscript which can automatically generate rebate letters from letter template for PTC's global resellers ## Education ### Master of Science - MS in Electrical Engineering (quit PhD) Princeton University ### Bachelor of Science - BS in Electrical and Computer Engineering Shanghai Jiao Tong University ### Bachelor of Science - BS in Electrical Engineering University of Michigan ## Contact & Social - LinkedIn: https://linkedin.com/in/icelegend --- Source: https://flows.cv/haofrank JSON Resume: https://flows.cv/haofrank/resume.json Last updated: 2026-04-12