Data stream clustering for low-cost machines

Christophe Cérin*, Keiji Kimura, Mamadou Sow


研究成果: Article査読

2 被引用数 (Scopus)


Nowadays, the operations performed by the Internet of Things (IoT) systems are no more trivial since they rely on more sophisticated devices than in the past. The IoT system is physically composed of connected computing, digital, mechanical devices such as sensors or actuators. Most of the time, each of them incorporates a logical arithmetic unit that can pre-compute or compute on the device. To extract value from the data produced at the edge, processing power offered by cloud computing is still utilized. However, streaming data to the cloud exposes some limitations related to the increased communication and data transfer, which introduces delays and consumes network bandwidth. Clustering data is one example of a treatment that can be executed in the cloud. In this paper, we propose a methodology for solving the data stream clustering problem at the edge. Data Stream clustering is defined as the clustering of data that arrive continuously, such as telephone records, multimedia data, sensors data, financial transactions, etc. Since we use low-cost and low-capacity devices, the objective is, given a sequence of points, to construct a good clustering of the stream using a small amount of memory and time. We propose a ‘windowing’ scheme, coupled with a sampling scheme to respect the objective. Under the experimental conditions, experiments show that the clustering solutions can be controlled, with difficulties for time-stamped data but not for random data or data with well-delimited clusters. The main advantage of our schema is that we are clustering data “on the fly” with no knowledge or assumption regarding the available data. We do not assume that all the data are known before a treatment batch by batch. Our schema also has the potential to be adapted to other classes of machine learning algorithms.

ジャーナルJournal of Parallel and Distributed Computing
出版ステータスPublished - 2022 8月

ASJC Scopus subject areas

  • ソフトウェア
  • 理論的コンピュータサイエンス
  • ハードウェアとアーキテクチャ
  • コンピュータ ネットワークおよび通信
  • 人工知能


「Data stream clustering for low-cost machines」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。