TY - JOUR
T1 - Single Camera Face Position-Invariant Driver’s Gaze Zone Classifier Based on Frame-Sequence Recognition Using 3D Convolutional Neural Networks
AU - Lollett, Catherine
AU - Kamezaki, Mitsuhiro
AU - Sugano, Shigeki
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/8
Y1 - 2022/8
N2 - Estimating the driver’s gaze in a natural real-world setting can be problematic for different challenging scenario conditions. For example, faces will undergo facial occlusions, illumination, or various face positions while driving. In this effort, we aim to reduce misclassifications in driving situations when the driver has different face distances regarding the camera. Three-dimensional Convolutional Neural Networks (CNN) models can make a spatio-temporal driver’s representation that extracts features encoded in multiple adjacent frames that can describe motions. This characteristic may help ease the deficiencies of a per-frame recognition system due to the lack of context information. For example, the front, navigator, right window, left window, back mirror, and speed meter are part of the known common areas to be checked by drivers. Based on this, we implement and evaluate a model that is able to detect the head direction toward these regions having various distances from the camera. In our evaluation, the 2D CNN model had a mean average recall of 74.96% across the three models, whereas the 3D CNN model had a mean average recall of 87.02%. This result show that our proposed 3D CNN-based approach outperforms a 2D CNN per-frame recognition approach in driving situations when the driver’s face has different distances from the camera.
AB - Estimating the driver’s gaze in a natural real-world setting can be problematic for different challenging scenario conditions. For example, faces will undergo facial occlusions, illumination, or various face positions while driving. In this effort, we aim to reduce misclassifications in driving situations when the driver has different face distances regarding the camera. Three-dimensional Convolutional Neural Networks (CNN) models can make a spatio-temporal driver’s representation that extracts features encoded in multiple adjacent frames that can describe motions. This characteristic may help ease the deficiencies of a per-frame recognition system due to the lack of context information. For example, the front, navigator, right window, left window, back mirror, and speed meter are part of the known common areas to be checked by drivers. Based on this, we implement and evaluate a model that is able to detect the head direction toward these regions having various distances from the camera. In our evaluation, the 2D CNN model had a mean average recall of 74.96% across the three models, whereas the 3D CNN model had a mean average recall of 87.02%. This result show that our proposed 3D CNN-based approach outperforms a 2D CNN per-frame recognition approach in driving situations when the driver’s face has different distances from the camera.
KW - convolutional neural networks
KW - driver monitoring
KW - gaze classification
UR - http://www.scopus.com/inward/record.url?scp=85136343186&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136343186&partnerID=8YFLogxK
U2 - 10.3390/s22155857
DO - 10.3390/s22155857
M3 - Article
C2 - 35957412
AN - SCOPUS:85136343186
SN - 1424-8220
VL - 22
JO - Sensors
JF - Sensors
IS - 15
M1 - 5857
ER -