OPTIMISING MACHINE LEARNING TECHNIQUES FOR IRREGULAR SAMPLING

Zhenyu Xu, Negar Riazifar

doi:10.52152/5943t304

Authors

Zhenyu Xu, Negar Riazifar

DOI:

https://doi.org/10.52152/5943t304

Keywords:

Machine Learning, Random Forest, Simple Linear Interpolation, XGBoost

Abstract

This study examines how simple linear interpolation (SLI) and natural-neighbourinterpolation (NNI) affect machine learning model performance on irregularly sampled commercial data. Seoul bike-sharing rental datasetispre-processed with SLI and NNI to manage missing values and inconsistencies. The performance of SLI and NNI isthen evaluated by constructing various machine learning models, including XGBoost, Random Forest, k-nearest neighbors(KNN) and Stacking model. Results show that SLI consistently improved the accuracy, particularly in the stacking model, as demonstrated by the area under the receiver operating characteristic(AUC) and kolmogorov-smirnow(KS) statistics. Conversely, NNI had more variable outcomes, occasionally reducing performance. The findings underscore the critical role of data pre-processing throughout machine learning, particularly in domains where data irregularities are prevalent, thereby providing empirical support for employing interpolation methods to improve both model reliability and accuracy. Eventually, findings uncovered by this study empirically support data pre-processing for business data modelling, highlighting the critical role of data pre-processing in optimising the performance of machine learning models.

OPTIMISING MACHINE LEARNING TECHNIQUES FOR IRREGULAR SAMPLING

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

INDEXED BY

Latest publications

Information

Language

Subscription