This study aimed to improve the accuracy of road pavement design by comparing and analyzing various statistical and machine-learning techniques for predicting asphalt layer thickness, focusing on regional roads in Pakistan. The explanatory variables selected for this study included the annual average daily traffic (AADT), subbase thickness, and subgrade California bearing ratio (CBR) values from six cities in Pakistan. The statistical prediction models used were multiple linear regression (MLR), support vector regression (SVR), random forest, and XGBoost. The performance of each model was evaluated using the mean absolute percentage error (MAPE) and root-mean-square error (RMSE). The analysis results indicated that the AADT was the most influential variable affecting the asphalt layer thickness. Among the models, the MLR demonstrated the best predictive performance. While XGBoost had a relatively strong performance among the machine-learning techniques, the traditional statistical model, MLR, still outperformed it in certain regions. This study emphasized the need for customized pavement designs that reflect the traffic and environmental conditions specific to regional roads in Pakistan. This finding suggests that future research should incorporate additional variables and data for a more in-depth analysis.
New motor development requires high-speed load testing using dynamo equipment to calculate the efficiency of the motor. Abnormal noise and vibration may occur in the test equipment rotating at high speed due to misalignment of the connecting shaft or looseness of the fixation, which may lead to safety accidents. In this study, three single-axis vibration sensors for X, Y, and Z axes were attached on the surface of the test motor to measure the vibration value of vibration. Analog data collected from these sensors was used in classification models for anomaly detection. Since the classification accuracy was around only 93%, commonly used hyperparameter optimization techniques such as Grid search, Random search, and Bayesian Optimization were applied to increase accuracy. In addition, Response Surface Method based on Design of Experiment was also used for hyperparameter optimization. However, it was found that there were limits to improving accuracy with these methods. The reason is that the sampling data from an analog signal does not reflect the patterns hidden in the signal. Therefore, in order to find pattern information of the sampling data, we obtained descriptive statistics such as mean, variance, skewness, kurtosis, and percentiles of the analog data, and applied them to the classification models. Classification models using descriptive statistics showed excellent performance improvement. The developed model can be used as a monitoring system that detects abnormal conditions of the motor test.
A machine learning-based algorithms have used for constructing species distribution models (SDMs), but their performances depend on the selection of backgrounds. This study attempted to develop a noble method for selecting backgrounds in machine-learning SDMs. Two machine-learning based SDMs (MaxEnt, and Random Forest) were employed with an example species (Spodoptera litura), and different background selection methods (random sampling, biased sampling, and ensemble sampling by using CLIMEX) were tested with multiple performance metrics (TSS, Kappa, F1-score). As a result, the model with ensemble sampling predicted the widest occurrence areas with the highest performance, suggesting the potential application of the developed method for enhancing a machine-learning SDM.
Radioactive contaminants, such as 137Cs, are a significant concern for long-term storage of nuclear waste. Migration and retention of these contaminants in various environmental media can pose a risk to the surrounding environment. The distribution coefficient (Kd) is a critical parameter for assessing the behavior of these contaminants and can introduce significant errors in predicting migration and remediation options. Accurate prediction of Kd values is essential to assess the behavior of radioactive contaminants and to ensure environmental safety. In this study, we present machine learning models based on the Japan Atomic Energy Agency Sorption Database (JAEA-SDB) to predict Kd values for Cs in soils. We used three different machine learning models, namely the random forest (RF), artificial neural network (ANN), and convolutional neural network (CNN), to predict Kd values. The models were trained on 14 input variables from the JAEA-SDB, including factors such as Cs concentration, solid phase properties, and solution conditions which are preprocessed by normalization and log transformation. We evaluated the performance of our models using the coefficient of determination (R2) value. The RF, ANN, and CNN models achieved R2 values of over 0.97, 0.86, and 0.88, respectively. Additionally, we analyzed the variable importance of RF using out-of-bag (OOB) and CNN with an attention module. Our results showed that the initial radionuclide concentration and properties of solid phase were important variables for Kd prediction. Our machine learning models provide accurate predictions of Kd values for different soil conditions. The Kd values predicted by our models can be used to assess the behavior of radioactive contaminants in various environmental media. This can help in predicting the potential migration and retention of contaminants in soils and the selection of appropriate site remediation options. Our study provides a reliable and efficient method for predicting Kd values that can be used in environmental risk assessment and waste management.
The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.
목적 : 인공지능의 기계학습 또는 심층학습을 이용한 연구가 다양한 분야에서 시도되고 있다. 본 연구는 공공 시력데이터를 자동화 수집하고, 수집한 데이터를 기계학습에 적용 및 예측하였다. 다양한 학습모델간 성능을 비교 함으로써, 시과학분야에서 적용 가능한 기계학습 최적화모델을 제시함에 있다.
방법 : 국민건강보험(NHISS) 및 통계포털(KOSIS)에 발표된 국민 시력분포 현황관련 자료를 특정 색인을 포함하 는 자료검색기법인 크롤링(crawling)을 사용하여 검색 및 수집을 자동화하였다. 2011년부터 2018년까지 보고된 모든 자료를 수집하였으며, 데이터 학습을 위해 Linear Regression, LASSO, Ridge, Elastic Net, Huber Regression, LASSO/LARS, Passive Aggressive Regressor 그리고 Pansacregressor 총 8개 모델을 사용하여 각각 데이터 학습 하였다.
결과 : 수집한 데이터를 기반으로 기계학습 모델을 통해 2018년을 예측하였다. 각 모델간 2018년도 실제-예측데 이터 차이를 MAE(Mean Absolute Error)와 RMSE(Root Mean Square Error) 점수로 각각 나타냈다. 학습모델 별 차이 중 MAE 평가결과 모델간 우/좌 Linear Regression(0.22/0.22), LASSO(0.83/0.81), RIDGE(0.31/0.31), Elastic Net(0.86/0.84), Huber Regression(0.14/0.07), LASSO/LARS(0.15/0.14), Passive Aggressive Regressor (0.29/0.18) 그리고 RANSA Regressor(0.22/0.22)를 보였다. RMSE에서 Linear Regression(0.40/0.40), LASSO (1.08/1.06), Ridge(0.54/0.54), Elastic Net(1.19/1.17), Huber Regression(0.20/0.20), LASSO/LARS(0.24/0.23), Passive Aggressive Regressor(0.21/0.58) 그리고 RANSA Regressor(0.40/0.40) 각각 나타냈다.
결론 : 본 연구는 자동화 자료검색 및 수집을 위한 크롤링 기법을 이용하여 데이터를 수집하였다. 이를 기반으 로 고전 선형모델을 기계학습에 적용할 수 있도록 하고, 데이터 학습을 위한 8개 학습모델들 간 성능을 비교하였다.
PURPOSES : The purpose of this study is to compare applicability, explanation power, and flexibility of traffic accident models between estimating model using the statistical method and the machine learning method.
METHODS: In order to compare and analyze traffic accident models between model estimated using the statistical method and machine learning method, data acquisition was conducted, and traffic accident models were estimated using statistical methods such as negative binomial regression model, and machine learning methods such as a generalized regression neural network (GRNN). Then, the fitness of model as R2, root mean square error (RMSE), mean absolute percentage error (MAPE), accuracy, etc., were determined to compare the traffic accident models.
RESULTS: The results showed that the annual average daily traffic (AADT), speed limits, number of lanes, land usage, exclusive right turn lanes, and front signals were significant for both traffic accident models. The GRNN model of total traffic accidents had been better statistical significant with R2: 0.829, RMSE: 2.495, MAPE: 32.158, and Accuracy: 66.761 compared with the negative binomial regression model with R2: 0.363, RMSE: 9.033, MAPE: 68.987, and Accuracy: 8.807. The GRNN model of injury traffic accidents also showed similar results of model’s statistical significance.
CONCLUSIONS: Traffic accident models estimated with GRNN had better statistical significance compared with models estimated with statistical methods such as negative binomial regression model.
The road surface condition in winter is important for road maintenance and safety. To estimate the road surface condition in winter, the RWIS(Road Weather Information System) is used. However RWIS is not measured the continuous road surface information but measured the locational road surface information. To overcome the current RWIS limitation, the thermal mapping sensor which can collect the road surface condition employed in some countries. Although the thermal mapping sensor can collect the continuous road surface information, it is difficult to collect vast data due to apply few probe car. This study suggests a specific methodology for the prediction of road surface temperature using vehicular ambient temperature sensors and collect road surface and vehicular ambient temperature data on the defined survey route in 2015 and 2016 year, respectively. To find out the correlation between road surface and ambient temperature which may affect patterns of road surface temperature variation, the various weather and topographical conditions along with the test route were considered. For modelling, all types of collected temperature data should be classified into response and predictor before applying a machine learning tool such as MATLAB. In this study, collected road surface temperature are considered as response while vehicular ambient temperatures defied as predictor. Through data learning using machine learning tool, models were developed and finally compared predicted and actual temperature based on average absolute error. According to comparison results, model enables to estimate actual road surface temperature variation pattern along the roads very well. Model III is slightly better than the rest of models in terms of estimation performance. When correlation between response and predictor is high, when plenty of historical data exists, and when a lot of predictors are available, estimation performance of would be much better.
역삼투 해수담수화 공정에서 막 오염은 생산수량 감소 및 공정의 에너지 소비량 증가를 야기한다. 막간 차압 증가, 생산수량 감소 외에 막 저항 값의 증가는 막 오염 정도를 판단하는 수치로 사용이 가능하다. 특히 막 저항 값 기반의 세정은 막 오염 제어를 통해 역삼투 해수담수화 공정에서 막의 성능 유지 시 사용 가능하다. 이에 본 연구에서는 해수 수질 인자 및 공정 운전 인자에 기반하여 막 저항 값을 예측하는 알고리즘을 제안한다. 알고리즘은 해수담수화 플랜트의 운전 데이터에 기반하여 인자들과 막 저항 값 사이의 관계를 학습하고 검증과정을 거쳐 막 오염 발생 시점을 사전에 예측하는 방식으로 개발되었다. 예측 정확도를 분석하고 개발된 알고리즘의 수정을 통해 예측 정확도 향상을 위한 연구를 진행하였다.
Water resources planning and management are, more and more, becoming important issue for water use and flood control due to the population increase, urbanization, and climate change. In particular, the estimating and the forecasting inflow of dam is the most important hydrologic issue for flood control and reliable water supply. Therefore, this study forecasted monthly inflow of Soyang river dam using VARMA model and 3 machine learning models. The forecasting models were constructed using monthly inflow data in the period of 1974 to 2016 and then the inflows were forecasted at 12- and 24-month ahead lead times. As a result, the forecasted monthly inflows by the models mostly were less than the observed ones, but the peak time and the variation pattern were well forecasted. Especially, the VARMA model showed very good performance in the forecasting. Therefore, the result of this study indicates that the VARMA model can be used efficiently to forecast hydrologic data and also used to establish water supply and management plan.