Research: Optimizing neural networks, especially transformer models (BERT) on edge hardware through heterogeneous computing (CPU+GPU) and quantization.
Thesis: Low-Latency BERT Inference for Heterogeneous Multi-Processor Edge Devices
•Research focused on accelerating edge inference through heterogeneous computing.
•Developed a genetic algorithm method for optimizing the assignment of neural network operations to CPU/GPU in edge SoC. Used HiKey970 development board for testing.
•Combined heterogeneous computing with quantization. Developed an algorithm to optimize quantization configuration for pareto-optimal accuracy and latency.
•Developed an ARM Compute Library implementation of BERT for latency measurements.