Seksan Kiatsupaibul. An efficiency comparison of distance measures in K-Nearest Neighbors imputation for spatial data. (). King Mongkut's University of Technology North Bangkok. Central Library. : , 2022.
An efficiency comparison of distance measures in K-Nearest Neighbors imputation for spatial data
Abstract:
The spatial data was used in several areas. But the geography, topography and measuring machine limitation
cause the missing data problem in any spatial data set and unable to analyze in association and causation. The
objective of this study is to compare the efficiency of distance measures for the K-Nearest Neighbors (KNN)
imputation method. The distance measures under this study include 1) Euclidean distance (ECD) 2) Manhattan
distance (MHA) 3) Canberra distance (CBR) 4) Sorensen distance (SRS) 5) Euclidean distance of Cartesian
coordinates (EOC) and 6) Arc Length by Harversine function (ALH). The comparisons are done across the different
sizes of missing values and across the different values of the constant K (Number of nearest neighbor). The study is
performed on simulated data sets whose distributions replicate that of an actual hourly air quality data set retrieved
from Department of Pollution Control. The statistical hypothesis tests show that the efficiencies corresponding to
the missing data imputation among distance measures and the numbers of nearest neighbors are significantly
different. The results show that the distance measures can be divided into 3 groups. The first group consists of ECD,
MHA, and CBR, the second group consists of SRS and the third group consists of EOC and ALH. The efficiencies
between the three groups are significantly different, but the efficiencies within group are similar. In addition, the
efficiency of the first group is highest, followed by the third and second group respectively. Furthermore, the results
also show that the number of nearest neighbors causes the difference in the efficiency. It is found that the higher
number of nearest neighbors, the higher efficiency until the optimal number of nearest neighbors that the efficiency
will be lower. The interaction between the distance measures and the number of nearest neighbors also results in the
difference in the efficiency.
King Mongkut's University of Technology North Bangkok. Central Library
Address:
BANGKOK
Email:
library@kmutnb.ac.th
Created:
2022
Modified:
2025-09-16
Issued:
2025-09-16
บทความ/Article
application/pdf
BibliograpyCitation :
In King Mongkut's University of Technology North Bangkok Faculty of Applied Science, Thai Statistical Association (TSA) and Statistics Cooperative Research Network (Statistics CRN). The Proceeding of International Conference on Applied Statistics (ICAS 2022) (pp.173-178). Bangkok : King Mongkut's University of Technology North Bangkok