Pavement temperature is a critical factor in winter road maintenance as it directly affects operational decisions related to de-icing, antiicing, and other safety measures. Accurate forecasting of pavement temperature enables road agencies to optimize maintenance strategies, reduce operational costs, and improve roadway safety outcomes. This study proposes a novel machine-learning algorithm, termed LSTMCNN, which integrates convolutional neural networks (CNNs) with long short-term memory (LSTM) networks for pavement temperature prediction. The proposed model enables the LSTM component to capture sequential dependencies, whereas the CNN component extracts local and spatial features embedded in time-series temperature records. Therefore, the proposed model can effectively identify long-range temporal relationships while uncovering localized or spatial features within the dataset. The input data—comprising pavement, atmospheric, and soil temperatures—were obtained at the entrance of a tunnel where a multivehicle pile-up due to black ice had occurred previously. The proposed LSTM-CNN model achieved an average prediction error of 0.61 ℃ and was benchmarked against other well-established machine-learning models, including Transformer and standalone LSTM architectures. The results show that the proposed algorithm delivers statistically superior predictive performance. The LSTM-CNN approach offers significant potential for enhancing the efficiency and effectiveness of winter road maintenance operations.
본 연구는 다양한 천연염재로 염색조건을 달리하여 염색한 견직물로 준비된 동일색조의 2-배색 100종에 대하여 주관적 색채감성 요인구조를 규명하고, 동일색조 유형과 유/무채색 색조, 물리적 색채특성 및 배색변인의 객관적 변 인들이 색채감성요인에 미치는 영향을 분석하였으며, 인공지능 기계학습 기반의 Random Forest를 이용하여 색채감 성요인 예측모델을 제안하였다. 연구 결과로서 천연염색 견직물의 동일색조 2-배색에 대한 색채감성요인으로 ‘유쾌 함’, ‘클래식’, ‘소프트’, ‘모던’의 4개 감성이 추출되었는데, 각 요인은 단색의 물리적 색채특성, 동일배색 유형, 유채 색/무채색, 정량적 배색 변인을 포함한 객관적 색채 변인으로부터 유의한 영향을 받음이 확인되었다. Random Forest 를 이용하여 동일색조 2-배색의 색채감성요인 별로 수립한 예측 모델에서 요인 ‘유쾌함’과 ‘소프트’ 예측모델의 예측 성능이 가장 우수하였으며, 색채감성요인 예측 모델에서 변수 중요도와 대체선형모델의 구조를 통해 요인 ‘유쾌함’ 은 색채 밝기 관련 변인, 요인 ‘소프트’는 색채 진하기 관련 변인의 영향력이 가장 큰 것으로 파악되었다. 또한 실험 값과 예측값 간 높은 상관성을 확인함으로써, 인공지능 기계학습 알고리즘 Random Forest를 천연염색직물의 색채감 성예측에 활용할 수 있을 것으로 기대되었다.
The application of machine learning in concrete technology has expanded rapidly, yet its reliability is often constrained by limited experimental data, heterogeneous testing conditions, and inconsistencies across published studies. This study investigates the integration of machine learning and synthetic data augmentation to predict the compressive strength of concrete incorporating biochar as a partial replacement for cement. An experimental dataset was compiled from peer-reviewed journal articles indexed in Web of Science, focusing on biochar-modified concrete mixtures. Input variables included cement content, fine and coarse aggregates, biochar dosage, water to binder ratio, superplasticizer content, and curing age, with compressive strength as the target variable. Extreme Gradient Boosting was adopted due to its strong performance on nonlinear tabular data. Model performance was evaluated using the mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (R²), alongside five-fold cross-validation. Hyperparameter optimization was performed using Optuna. To address data scarcity, a synthetic dataset of 1000 samples was generated using ChatGPT. the large language model approach relied solely on natural language prompts. Only feature definitions and the target variable were provided, without exposing the original data or implementing data generation algorithms. Three modeling strategies were examined. First, model trained and tested solely on experimental data achieved a testing R² of approximately 0.91. Second, model trained on synthetic data and evaluated exclusively on experimental data showed reduced generalization, achieving a testing R² of about 0.42, indicating pronounced domain shift effects. Third, synthetic and experimental data were combined through data augmentation and jointly modeled, a testing R² of 0.93 was achieved. The result showed that the use of LLMs for augmentation improved the performance of the model.
본 연구는 항만물류 분야의 산업 설비에 대한 머신러닝 기반 예지정비 시스템 개발을 목적으로 수행되었다. UCI Repository의 Dataset을 활용하여 10,000개의 데이터 포인트를 분석하였으며, 설비 고장 발생 여부를 예측하는 이진 분류와 고장 유형을 분류하는 다중 클래스 분류 과제를 수행하였다. 데이터 전처리 과정에서 클래스 불균형 문제 해결을 위해 SMOTE 기법을 적용하였고, StandardScaler를 이 용한 정규화를 수행하였다. 주성분 분석을 통해 온도 변수, 기계 출력, 공구 마모가 주요 예측 변수로 확인되었다. 로지스틱 회귀, K-최근 접 이웃, 서포트 벡터 머신, 랜덤 포레스트, XGBoost 등 다섯 가지 머신러닝 알고리즘을 적용하여 성능을 비교하였다. 분석 결과, KNN은 상대적으로 낮은 성능을 보였으나 빠른 응답속도를 제공하였고, XGBoost가 모두에서 최고 성능을 보였으며, 이진 분류에서 F1 점수 0.958, 다중 클래스 분류에서 0.989를 달성하였다.
As renewable energy penetration continues to increase, the output variability and forecasting uncertainty of photovoltaic generation have emerged as major operational risks in power systems. This study establishes a sensor-based data quality control procedure to ensure the reliability of meteorological data collected at a PV plant. For temperature, humidity, and wind speed, a four stage QC process physical range check, persistence check, step change check, and median filtering was applied. Solar radiation, which exhibits strong temporal and distributional characteristics, was processed using a three-stage QC procedure consisting of physical range, step change, and frequency distribution checks. Using the quality-controlled meteorological data, PV generation forecasting was performed with SVM and XGBoost models. As a result, the MAPE values improved to 6.32% for SVM and 6.08% for XGBoost after QC application. The findings confirm that meteorological data quality control significantly enhances PV forecasting accuracy and can support future strategies for distributed energy resource management, curtailment mitigation, and power system risk reduction.
본 연구에서는 Romanoff(1957)의 실측 데이터를 사용하여 머신러닝 기반 상수도관의 부식 깊이를 예측하였다. 이를 실제 상수도관망에 적용하여 누적피해도를 분석하였다. 예측한 부식깊이를 사용하여 누적피해도를 분석하였으며 MCS(Monte Carlo Simulation)를 적용한 누적피해도와 비교 분석하였다. 부식깊이 예측모델에 따른 부식깊이를 분석한 결과 MLP-ReLU 모델이 가장 부식속도가 가장 빠르며 MLP-sigmoid가 가장 부식속도가 느렸다. 천안시 신방동과 성환읍 상수도관망에 MCS를 적용한 누적피해도 분석법과 머신러닝을 적용한 누적피해도를 비교 분석하였다. 신방동에서는 두 분석법 모두 2번 상수도관이 먼저 사용 한계에 도달하였으며 성환읍에서는 4번 상수도관이 가장 먼저 사용 한계에 도달하였다. 사용 한계에 가장 먼저 도달한 상수도관은 다른 상수도관보다 사용 년수가 오래되었거나 수압이 높은 것으로 확인되었다. MCS를 적용한 누적피해도 분석 결과 신방동의 경우 45년 만에 사용 한계를 초과한 반면 성환읍의 경우 47년 만에 사용 한계를 초과했다.
As demand grows for electric vehicles and advanced mobility technologies, developing materials for permanent magnets has become increasingly essential. Among them, SmCo-based permanent magnets are gaining attention due to their superior thermal stability compared to conventional NdFeB magnets, making them promising candidates for high-temperature motor applications. However, optimizing the magnetic properties of SmCo alloys remains challenging due to their complex phase structures and elemental interactions. In this study, we develop and optimize machine learning (ML) models to predict the saturation magnetization of SmCo permanent magnets using only composition-based descriptors. A dataset comprising various SmCo alloys was analyzed, with features extracted using Matminer and Pymatgen modules. We applied Random Forest (RF), eXtreme Gradient Boosting (XGB), and Support Vector Regression (SVR) models and compared their regression performance using R2 score and Root-mean-squared-error (RMSE). The RF model demonstrated the best generalization and prediction accuracy. To identify the most influential features, we used permutation feature importance. Further, we refined the feature set using a genetic algorithm (GA), ultimately selecting 9 key features that yielded the highest model performance (R2 = 0.963, RMSE = 4.22 emu/g). This study highlights the potential of combining machine learning with genetic optimization to accelerate the design of high-performance, thermally stable SmCo permanent magnets.
본 연구는 환경 요인을 바탕으로 절화용 국화 생장 예측을 위한 최적의 모델을 개발하는 것을 목표로 하였다. 이를 위해 13개의 모델(Linear Regression, Lasso Regression, Ridge Regression, ElasticNet Regression, K-Nearest Neighbors (KNN), Support Vector Regression (SVR), Neural Network, Decision Tree, Random Forest, XGBoost, AdaBoost, CatBoost, Stacking)의 성능을 R2, MAE, RMSE를 평가 지표 로 비교하였다. 단일 모델 중에서는 Decision Tree가 가장 우수한 성능을 보였으며, R2값은 0.90에서 0.91 사이였다. 앙 상블 모델 중에서는 CatBoost가 가장 높은 성능을 보였으며 (R2=0.90~0.92) Random Forest와 XGBoost 또한 유사한 성 능을 보였다. 전체적으로 트리 기반 앙상블 모델이 국화 생장 예측에 적합한 모델로 나타났다.
Given the hazards posed by black ice, it is crucial to investigate the conditions that contribute to its formation. Two ensemble machinelearning algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), were employed to forecast the occurrence of black ice using atmospheric data. Additionally, explainable artificial intelligence techniques, including Feature Importance (FI) and partial dependence Plot (PDP), were utilized to identify atmospheric conditions that significantly increase the likelihood of black ice formation. The machinelearning algorithms achieved a forecasting accuracy of 90%, demonstrating reliable performance. FI analysis revealed distinct key predictors between the algorithms: relative humidity was the most critical for RF, whereas wind speed was paramount for XGBoost. The PDP analysis identified the specific atmospheric conditions under which black ice was likely to form. This study provides detailed insights into the atmospheric precursors of frost/fog-induced black ice formation. These findings enable road managers to implement proactive winter road maintenance strategies, such as optimizing anti-icing patrol routes and displaying warnings on various message signs, thereby enhancing road safety.
This study aimed to improve the accuracy of road pavement design by comparing and analyzing various statistical and machine-learning techniques for predicting asphalt layer thickness, focusing on regional roads in Pakistan. The explanatory variables selected for this study included the annual average daily traffic (AADT), subbase thickness, and subgrade California bearing ratio (CBR) values from six cities in Pakistan. The statistical prediction models used were multiple linear regression (MLR), support vector regression (SVR), random forest, and XGBoost. The performance of each model was evaluated using the mean absolute percentage error (MAPE) and root-mean-square error (RMSE). The analysis results indicated that the AADT was the most influential variable affecting the asphalt layer thickness. Among the models, the MLR demonstrated the best predictive performance. While XGBoost had a relatively strong performance among the machine-learning techniques, the traditional statistical model, MLR, still outperformed it in certain regions. This study emphasized the need for customized pavement designs that reflect the traffic and environmental conditions specific to regional roads in Pakistan. This finding suggests that future research should incorporate additional variables and data for a more in-depth analysis.
New motor development requires high-speed load testing using dynamo equipment to calculate the efficiency of the motor. Abnormal noise and vibration may occur in the test equipment rotating at high speed due to misalignment of the connecting shaft or looseness of the fixation, which may lead to safety accidents. In this study, three single-axis vibration sensors for X, Y, and Z axes were attached on the surface of the test motor to measure the vibration value of vibration. Analog data collected from these sensors was used in classification models for anomaly detection. Since the classification accuracy was around only 93%, commonly used hyperparameter optimization techniques such as Grid search, Random search, and Bayesian Optimization were applied to increase accuracy. In addition, Response Surface Method based on Design of Experiment was also used for hyperparameter optimization. However, it was found that there were limits to improving accuracy with these methods. The reason is that the sampling data from an analog signal does not reflect the patterns hidden in the signal. Therefore, in order to find pattern information of the sampling data, we obtained descriptive statistics such as mean, variance, skewness, kurtosis, and percentiles of the analog data, and applied them to the classification models. Classification models using descriptive statistics showed excellent performance improvement. The developed model can be used as a monitoring system that detects abnormal conditions of the motor test.
A machine learning-based algorithms have used for constructing species distribution models (SDMs), but their performances depend on the selection of backgrounds. This study attempted to develop a noble method for selecting backgrounds in machine-learning SDMs. Two machine-learning based SDMs (MaxEnt, and Random Forest) were employed with an example species (Spodoptera litura), and different background selection methods (random sampling, biased sampling, and ensemble sampling by using CLIMEX) were tested with multiple performance metrics (TSS, Kappa, F1-score). As a result, the model with ensemble sampling predicted the widest occurrence areas with the highest performance, suggesting the potential application of the developed method for enhancing a machine-learning SDM.
The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.
목적 : 인공지능의 기계학습 또는 심층학습을 이용한 연구가 다양한 분야에서 시도되고 있다. 본 연구는 공공 시력데이터를 자동화 수집하고, 수집한 데이터를 기계학습에 적용 및 예측하였다. 다양한 학습모델간 성능을 비교 함으로써, 시과학분야에서 적용 가능한 기계학습 최적화모델을 제시함에 있다.
방법 : 국민건강보험(NHISS) 및 통계포털(KOSIS)에 발표된 국민 시력분포 현황관련 자료를 특정 색인을 포함하 는 자료검색기법인 크롤링(crawling)을 사용하여 검색 및 수집을 자동화하였다. 2011년부터 2018년까지 보고된 모든 자료를 수집하였으며, 데이터 학습을 위해 Linear Regression, LASSO, Ridge, Elastic Net, Huber Regression, LASSO/LARS, Passive Aggressive Regressor 그리고 Pansacregressor 총 8개 모델을 사용하여 각각 데이터 학습 하였다.
결과 : 수집한 데이터를 기반으로 기계학습 모델을 통해 2018년을 예측하였다. 각 모델간 2018년도 실제-예측데 이터 차이를 MAE(Mean Absolute Error)와 RMSE(Root Mean Square Error) 점수로 각각 나타냈다. 학습모델 별 차이 중 MAE 평가결과 모델간 우/좌 Linear Regression(0.22/0.22), LASSO(0.83/0.81), RIDGE(0.31/0.31), Elastic Net(0.86/0.84), Huber Regression(0.14/0.07), LASSO/LARS(0.15/0.14), Passive Aggressive Regressor (0.29/0.18) 그리고 RANSA Regressor(0.22/0.22)를 보였다. RMSE에서 Linear Regression(0.40/0.40), LASSO (1.08/1.06), Ridge(0.54/0.54), Elastic Net(1.19/1.17), Huber Regression(0.20/0.20), LASSO/LARS(0.24/0.23), Passive Aggressive Regressor(0.21/0.58) 그리고 RANSA Regressor(0.40/0.40) 각각 나타냈다.
결론 : 본 연구는 자동화 자료검색 및 수집을 위한 크롤링 기법을 이용하여 데이터를 수집하였다. 이를 기반으 로 고전 선형모델을 기계학습에 적용할 수 있도록 하고, 데이터 학습을 위한 8개 학습모델들 간 성능을 비교하였다.