Malaysian Journal of Mathematical Sciences, January 2020, Vol. 14, No. 1


New Approaches to Normalization Techniques to Enhance K-Means Clustering Algorithm

Dalatu, P. I. and Midi, H.

Corresponding Email: dalatup@gmail.com

Received date: 25 December 2017
Accepted date: 25 October 2019

Abstract:
Clustering is fundamentally one of the leading origins of basic data mining tools, which makes researchers believe the normal grouping of attributes in datasets. The main aim of clustering is to ascertain similarities and arrangements with a large dataset by partitioning data into clusters. It is important to note that distance measures like Euclidean distance, should not be used without the normalization of datasets. The limitation of using both Min-Max (MM) and Decimal Scaling (DS) normalization methods is that the minimum and maximum values may be out-of-samples when the dataset is unknown. Therefore, we proposed two new normalization approaches to overcome attributes with initially large magnitudes from overweighing attributes with initially smaller magnitudes. The two new normalization approaches are called New Approach to Min-Max (NAMM) and New Approach to Decimal Scaling (NADS). To evaluate the performance of our proposed approaches, simulation study and real data applications are considered. However, the two proposed approaches have shown good performance compared to the existing methods, by achieving nearly maximum points in the average external validity measures, recorded lower computing time and clustering the object points to almost all their cluster centres. Consequently, from the results obtained, it can be noted that the NAMM and NADS approach yielded better performance in the data preprocessing methods, which down weight the magnitudes of large values.

Keywords: Normalization, k-means, simulation, clustering

  



Indexing



















SCImago Journal & Country Rank

Flag Counter