Gloriana Joseph Monko and Masaomi Kimura, Shibaura Institute of Technology, Japan
This research presents an advanced methodology for estimating the epsilon and minimum samples parameters in the DBSCAN clustering algorithm using a Stratified Sampling and Grid-Search approach. Our method showcased notable improvement in eps estimation precision across nine diverse datasets compared to conventional techniques. By accounting for dataset variations in structure and density, stratified sampling leads to superior cluster formations. The k-nearest distance graph further refines these relationships, ensuring a comprehensive understanding of data densities. Additionally, our method underscores the importance of each dataset's unique stratum, providing holistic insights. We also introduced a Grid-Search technique for MinPts estimation with the help of silhouette score, challenging traditional rule-of-thumb settings. Our approach suggests setting MinPts flexibly, considering the dataset's specific attributes and has proven its efficacy by enhancing clustering results, with implications for both SS-DBSCAN and traditional DBSCAN frameworks. This study highlights the potential of parameter estimation in optimizing clustering outcomes and computational efficiency.
Epsilon selection, MinPts determination, statified sampling, grid-search, SS-DBSCAN.