TY - JOUR
T1 - Data collection through translation network based on end-to-end deep learning for autonomous driving
AU - Zhang, Zelin
AU - Ohya, Jun
N1 - Publisher Copyright:
© 2021, Society for Imaging Science and Technology
PY - 2021
Y1 - 2021
N2 - To avoid manual collections of a huge amount of labeled image data needed for training autonomous driving models, this paper proposes a novel automatic method for collecting image data with annotation for autonomous driving through a translation network that can transform the simulation CG images to real-world images. The translation network is designed in an end-to-end structure that contains two encoder-decoder networks. The forepart of the translation network is designed to represent the structure of the original simulation CG image with a semantic segmentation. Then the rear part of the network translates the segmentation to a real-world image by applying cGAN. After the training, the translation network can learn a mapping from simulation CG pixels to the real-world image pixels. To confirm the validity of the proposed system, we conducted three experiments under different learning policies by evaluating the MSE of the steering angle and vehicle speed. The first experiment demonstrates that the L1+cGAN performs best above all loss functions in the translation network. As a result of the second experiment conducted under different learning policies, it turns out that the ResNet architecture works best. The third experiment demonstrates that the model trained with the real-world images generated by the translation network can still work great in the real world. All the experimental results demonstrate the validity of our proposed method.
AB - To avoid manual collections of a huge amount of labeled image data needed for training autonomous driving models, this paper proposes a novel automatic method for collecting image data with annotation for autonomous driving through a translation network that can transform the simulation CG images to real-world images. The translation network is designed in an end-to-end structure that contains two encoder-decoder networks. The forepart of the translation network is designed to represent the structure of the original simulation CG image with a semantic segmentation. Then the rear part of the network translates the segmentation to a real-world image by applying cGAN. After the training, the translation network can learn a mapping from simulation CG pixels to the real-world image pixels. To confirm the validity of the proposed system, we conducted three experiments under different learning policies by evaluating the MSE of the steering angle and vehicle speed. The first experiment demonstrates that the L1+cGAN performs best above all loss functions in the translation network. As a result of the second experiment conducted under different learning policies, it turns out that the ResNet architecture works best. The third experiment demonstrates that the model trained with the real-world images generated by the translation network can still work great in the real world. All the experimental results demonstrate the validity of our proposed method.
UR - http://www.scopus.com/inward/record.url?scp=85111585990&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111585990&partnerID=8YFLogxK
U2 - 10.2352/ISSN.2470-1173.2021.17.AVM-115
DO - 10.2352/ISSN.2470-1173.2021.17.AVM-115
M3 - Conference article
AN - SCOPUS:85111585990
SN - 2470-1173
VL - 2021
JO - IS and T International Symposium on Electronic Imaging Science and Technology
JF - IS and T International Symposium on Electronic Imaging Science and Technology
IS - 18
M1 - 115
T2 - 2021 3D Imaging and Applications, 3DIA 2021
Y2 - 11 January 2021 through 28 January 2021
ER -