TY - GEN
T1 - Benchmark datasets for fault detection and classification in sensor data
AU - De Bruijn, Bas
AU - Nguyen, Tuan Anh
AU - Bucur, Doina
AU - Tei, Kenji
N1 - Publisher Copyright:
© Copyright 2016 by SCITEPRESS- Science and Technology Publications, Lda. All rights reserved.
PY - 2016
Y1 - 2016
N2 - Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of benchmark sensor datasets. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the ground truth, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: A dataset of 280 temperature and light subsets of data from 10 indoor Intel Lab sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 Smart Santander sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.
AB - Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of benchmark sensor datasets. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the ground truth, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: A dataset of 280 temperature and light subsets of data from 10 indoor Intel Lab sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 Smart Santander sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.
KW - Benchmark dataset
KW - Data quality
KW - Fault tolerance
KW - Sensor data
KW - Sensor data labelling
UR - http://www.scopus.com/inward/record.url?scp=84971393455&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84971393455&partnerID=8YFLogxK
U2 - 10.5220/0005637901850195
DO - 10.5220/0005637901850195
M3 - Conference contribution
AN - SCOPUS:84971393455
T3 - SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks
SP - 185
EP - 195
BT - SENSORNETS 2016 - Proceedings of the 5th International Confererence on Sensor Networks
A2 - Ahrens, Andreas
A2 - Postolache, Octavian
A2 - Benavente-Peces, Cesar
PB - SciTePress
T2 - 5th International Confererence on Sensor Networks, SENSORNETS 2016
Y2 - 19 February 2016 through 21 February 2016
ER -