Prediction of Compressive Strength of SCBA-Based Concrete Using KNN
In response to global emission reduction goals set by the Paris Agreement (2015), the construction industry has adopted strategies to reduce cement consumption by using sustainable materials. Sugarcane bagasse ash (SCBA), an agro-waste by-product, has shown its potential for use as a sand or cement replacement in concrete. However, its nature and complex interactions across mixed designs make predicting its compressive strength challenging using conventional approaches. This study presents the use of k-nearest neighbors (KNN), a machine-learning model for predicting the compressive strength of concrete incorporating SCBA as a sand or cement replacement. Three databases, Web of Science, Scopus, and Google Scholar, were used in the collection of 844 experimental data points from published articles between 2015 and 2025. Independent variables include SCBA usage, SCBA replacement level, cement content, fine and coarse aggregate contents, water-to-binder ratio, superplasticizer dosage, and curing age, while compressive strength served as the dependent variable. Prior to model development, SCBA usage, the only categorical variable, was encoded using one-hot encoding. Feature scaling using StandardScaler was applied to ensure consistency in distance-based features. The dataset was split into 75%-25% train-test subsets, and 5-fold cross-validation was carried out to enhance model generalization and robustness. Hyperparameter tuning was optimized using Optuna to enhance predictive accuracy. The optimized KNN model achieved a coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE) values of 1, 0.329, and 0.063 in the training phase and 0.81, 7.378, and 5.674 in the testing phase, respectively. Optuna improved the result of the mean cross validated coefficient of determination from 0.78 to 0.84 upon hyperparameter tuning. While the results indicate KNN capability of capturing localized patterns in the heterogeneous SCBA concrete dataset, robust ML models such as eXtreme Gradient Boosting and Random Forest may be used to capture enhanced generalization of the dataset. KNN serves as a reliable baseline model for data-driven prediction of compressive strength, highlighting the potential of simple, interpretable machine-learning models in supporting sustainable concrete design aligned with global climate mitigation goals.