Hung-Hao Chen1, Chia-Hung Wang1, Hsueh-Wei Chen1, Pei-Yung Hsiao2, Li-Chen Fu1
and Yi-Feng Su3, 1National Taiwan University, Taiwan, 2National University of Kaohsiung,
Taiwan, 3Automotive Research and Testing Center (ARTC), Taiwan
The current fusion-based methods transform LiDAR data into bird’s eye view (BEV) representations or 3D voxel, leading to information loss and heavy computation cost of 3D convolution. In contrast, we directly consume raw point clouds and perform fusion between two modalities. We employ the concept of region proposal network to generate proposals from two streams, respectively. In order to make two sensors compensate the weakness of each other, we utilize the calibration parameters to project proposals from one stream onto the other. With the proposed multi-scale feature aggregation module, we are able to combine the extracted regionof-interest-level (RoI-level) features of RGB stream from different receptive fields, resulting in fertilizing feature richness. Experiments on KITTI dataset show that our proposed network outperforms other fusion-based methods with meaningful improvements as compared to 3D object detection methods under challenging setting.
Machine Learning, 3D Object Detection, Data Fusion, Autonomous Driving.