Abstract
Mining high dimensional outliers is not fully resolved for its dimensional particularity. The existing full space based methods can find distinct outliers and neglect those hidden in some subspaces. Subspace based approaches can detect most outliers that are apparent in low dimensional spaces, while missing the invisible outliers in subspaces. This paper proposes a novel two-phase inspection model. The first phase measures neighbor's density in subspaces to find low dimensional outliers. The second phase evaluates deviation degree of neighbors in connected subspaces. The undiscovered outliers appear a fast dispersion and scatter more than its neighbors. We analysis two-phase results statistically, and merge into one score for each object. The outliers are expressed with top score objects. The evaluation on synthetic and real data sets shows that our proposal outperform state of the art algorithms in high dimensional outlier issue.
Original language | English |
---|---|
Title of host publication | International Conference on Information and Knowledge Management, Proceedings |
Publisher | Association for Computing Machinery |
Pages | 57-62 |
Number of pages | 6 |
Volume | 2014-November |
Edition | November |
DOIs | |
Publication status | Published - 2014 Nov 3 |
Event | 7th PhD Workshop in Information and Knowledge Management, PIKM 2014, in Conjunction with the ACM CIKM 2014 Conference - Shanghai, China Duration: 2014 Nov 3 → … |
Other
Other | 7th PhD Workshop in Information and Knowledge Management, PIKM 2014, in Conjunction with the ACM CIKM 2014 Conference |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 14/11/3 → … |
Keywords
- Connected subspace
- Dimensional projection
- High dimension
- Outlier score
ASJC Scopus subject areas
- Business, Management and Accounting(all)
- Decision Sciences(all)