• Designed and implemented a time-sequenced Bird’s-Eye-View (BEV) model for multi-task perception, enhancing object detection, drivable space segmentation, velocity estimation, and trajectory prediction, achieving a >10% AP improvement in near-range detection.
• Developed a transformer-based post-fusion model to refine multi-sensor (LiDAR, camera, radar) detection results, enabling obstacle-level refinement and outputting motion states.
• Applied generative models (VAE, conditional diffusion models) for high-precision lane detection and VQ-VAE for efficient HD map data compression.
• Optimized backbone architectures including ResNet and Transformers, implemented custom variants with tailored hyperparameters to maximize performance.
• Deployed models on embedded systems via ONNX/TensorRT conversion, accelerating inference through quantization and enhancing accuracy via Quantization-Aware Training (QAT).