OPTIMISING MACHINE LEARNING TECHNIQUES FOR IRREGULAR SAMPLING
DOI:
https://doi.org/10.52152/5943t304Keywords:
Machine Learning, Random Forest, Simple Linear Interpolation, XGBoostAbstract
This study examines how simple linear interpolation (SLI) and natural-neighbourinterpolation (NNI) affect machine learning model performance on irregularly sampled commercial data. Seoul bike-sharing rental datasetispre-processed with SLI and NNI to manage missing values and inconsistencies. The performance of SLI and NNI isthen evaluated by constructing various machine learning models, including XGBoost, Random Forest, k-nearest neighbors(KNN) and Stacking model. Results show that SLI consistently improved the accuracy, particularly in the stacking model, as demonstrated by the area under the receiver operating characteristic(AUC) and kolmogorov-smirnow(KS) statistics. Conversely, NNI had more variable outcomes, occasionally reducing performance. The findings underscore the critical role of data pre-processing throughout machine learning, particularly in domains where data irregularities are prevalent, thereby providing empirical support for employing interpolation methods to improve both model reliability and accuracy. Eventually, findings uncovered by this study empirically support data pre-processing for business data modelling, highlighting the critical role of data pre-processing in optimising the performance of machine learning models.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Lex localis - Journal of Local Self-Government

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.