SIRSYN: Improved Synthetic Sampling for Imbalanced Data Using SIR

Akiyoshi Sutou*, Jinfang Wang

*この研究の対応する著者

研究成果: Conference contribution

抄録

Imbalanced data, where class labels in a training dataset are significantly skewed, often reduces the prediction accuracy for minority classes when using traditional algorithms. To address this, various oversampling techniques like SMOTE have been proposed, but they often generate minority data in majority class regions. We propose an improved method, SIRSYN, which applies the Sampling Importance Resampling (SIR) method to existing synthetic data generation techniques. This reduces the risk of generating data in inappropriate locations. Using 60 imbalanced datasets from the KEEL repository, we compared SIRSYN with 13 existing methods. SIRSYN achieved superior performance, G-means and F1 scores, indicating its effectiveness in enhancing oversampling techniques for imbalanced classification tasks.

本文言語English
ホスト出版物のタイトルProceedings - 2024 IEEE International Conference on Knowledge Graph, ICKG 2024
編集者Huajun Chen, Anna Fensel, Xingquan Zhu, Roger Wattenhofer, Xindong Wu
出版社Institute of Electrical and Electronics Engineers Inc.
ページ342-351
ページ数10
ISBN(電子版)9798331508821
DOI
出版ステータスPublished - 2024
イベント15th IEEE International Conference on Knowledge Graph, ICKG 2024 - Abu Dhabi, United Arab Emirates
継続期間: 2024 12月 112024 12月 12

出版物シリーズ

名前Proceedings - 2024 IEEE International Conference on Knowledge Graph, ICKG 2024

Conference

Conference15th IEEE International Conference on Knowledge Graph, ICKG 2024
国/地域United Arab Emirates
CityAbu Dhabi
Period24/12/1124/12/12

ASJC Scopus subject areas

  • 人工知能
  • 計算理論と計算数学
  • コンピュータ サイエンスの応用
  • 情報システム

フィンガープリント

「SIRSYN: Improved Synthetic Sampling for Imbalanced Data Using SIR」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル