Duong Van Hieu. A combination of graph-based and cell-based clustering techniques for big datasets. Doctoral Degree(Information Technology). King Mongkut's University of Technology North Bangkok. Central Library. : King Mongkut's University of Technology North Bangkok, 2016.
A combination of graph-based and cell-based clustering techniques for big datasets
Abstract:
Big dataset analysis is more challenging to data scientists. They cause technological challenges when setting up a big data project in terms of choosing the right platform technologies and suitable algorithms. This study proposes a fast outlier detection algorithm for big datasets (Cell-RDOS) and two clustering algorithms for big datasets on a limited memory computer (Cell-MST-based and Weighted Cell-MST-based). The Cell-RDOS algorithm is a combination of cell-based algorithms and a revised version of the ranking-based outlier detection algorithm with various depths (RDOS). The Cell-RDOS algorithm can produce the same results compared to the RDOS algorithm, and reduce up to 99% executing time of the RDOS algorithm when working with big datasets. The proposed clustering algorithms are combinations of cell-based algorithms, MST-based algorithms, and K-means clustering algorithm. These two proposed algorithms outperform many other algorithms in terms of less used memory, accuracy level, and speed. Firstly, they can reduce up to 99% required memory size compared to the previous methods such as Similarity-based, Decision-theoretic Rough Set and MST-based algorithms.