In this study, the magnetocaloric effect and transition temperature of bulk metallic glass, an amorphous material, were predicted through machine learning based on the composition features. From the Python module ‘Matminer’, 174 compositional features were obtained, and prediction performance was compared while reducing the composition features to prevent overfitting. After optimization using RandomForest, an ensemble model, changes in prediction performance were analyzed according to the number of compositional features. The R2 score was used as a performance metric in the regression prediction, and the best prediction performance was found using only 90 features predicting transition temperature, and 20 features predicting magnetocaloric effects. The most important feature when predicting magnetocaloric effects was the ‘Fe’ compositional ratio. The feature importance method provided by ‘scikit-learn’ was applied to sort compositional features. The feature importance method was found to be appropriate by comparing the prediction performance of the Fe-contained dataset with the full dataset.
고성능 콘크리트(HPC) 압축강도는 추가적인 시멘트질 재료의 사용으로 인해 예측하기 어렵고, 개선된 예측 모델의 개발이 필수적 이다. 따라서, 본 연구의 목적은 배깅과 스태킹을 결합한 앙상블 기법을 사용하여 HPC 압축강도 예측 모델을 개발하는 것이다. 이 논 문의 핵심적 기여는 기존 앙상블 기법인 배깅과 스태킹을 통합하여 새로운 앙상블 기법을 제시하고, 단일 기계학습 모델의 문제점을 해결하여 모델 예측 성능을 높이고자 한다. 단일 기계학습법으로 비선형 회귀분석, 서포트 벡터 머신, 인공신경망, 가우시안 프로세스 회귀를 사용하고, 앙상블 기법으로 배깅, 스태킹을 이용하였다. 결과적으로 본 연구에서 제안된 모델이 단일 기계학습 모델, 배깅 및 스태킹 모델보다 높은 정확도를 보였다. 이는 대표적인 4가지 성능 지표 비교를 통해 확인하였고, 제안된 방법의 유효성을 검증하였다.
Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.
The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.
The sunspot area is a critical physical quantity for assessing the solar activity level; forecasts of the sunspot area are of great importance for studies of the solar activity and space weather. We developed an innovative hybrid model prediction method by integrating the complementary ensemble empirical mode decomposition (CEEMD) and extreme learning machine (ELM). The time series is first decomposed into intrinsic mode functions (IMFs) with different frequencies by CEEMD; these IMFs can be divided into three groups, a high-frequency group, a low-frequency group, and a trend group. The ELM forecasting models are established to forecast the three groups separately. The final forecast results are obtained by summing up the forecast values of each group. The proposed hybrid model is applied to the smoothed monthly mean sunspot area archived at NASA's Marshall Space Flight Center (MSFC). We find a mean absolute percentage error (MAPE) and a root mean square error (RMSE) of 1.80% and 9.75, respectively, which indicates that: (1) for the CEEMD-ELM model, the predicted sunspot area is in good agreement with the observed one; (2) the proposed model outperforms previous approaches in terms of prediction accuracy and operational efficiency.
Interest rate spreads indicate the conditions of the economy and serve as an indicator of the recession. The purpose of this study is to predict Korea's interest rate spreads using US data with long-term continuity. To this end, 27 US economic data were used, and the entire data was reduced to 5 dimensions through principal component analysis to build a dataset necessary for prediction. In the prediction model of this study, three RNN models (BasicRNN, LSTM, and GRU) predict the US interest rate spread and use the predicted results in the SVR ensemble model to predict the Korean interest rate spread. The SVR ensemble model predicted Korea's interest rate spread as RMSE 0.0658, which showed more accurate predictive power than the general ensemble model predicted as RMSE 0.0905, and showed excellent performance in terms of tendency to respond to fluctuations. In addition, improved prediction performance was confirmed through period division according to policy changes. This study presented a new way to predict interest rates and yielded better results. We predict that if you use refined data that represents the global economic situation through follow-up studies, you will be able to show higher interest rate predictions and predict economic conditions in Korea as well as other countries.
There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.
Ensemble verification and prediction of low-level wind shear (LLWS) are an important matter for airplane landing and management. In this study, we compared the prediction performance of LLWS forecasts of ensemble mean, multiple regression model and long short-term memory (LSTM), which belong to the family of recurrent neural network based on the grid points over the Jeju area. The prediction skills of methods were compared by mean absolute error. We found that the prediction skills of forecasts of LSTM were better than the bias-corrected forecasts in terms of deterministic prediction.
본 연구에서는 충주댐 유역에 대해 앙상블 유량예측기법의 강우-유출 모델 매개변수, 입력자료에 따른 불확실성 분석을 수행하였다. 앙상블 유량예측기법으로는 ESP (Ensemble Streamflow Prediction) 기법과 BAYES-ESP (Bayesian-ESP) 기법을 활용하였으며, 강우-유출 모델로는 ABCD를 활용하였다. 모델 매개변수에 따른 불확실성 분석은 GLUE (Generalized Likelihood Uncertainty Estimation) 기법을 적용하였으며, 입력자료에 따른 불확실성 분석은 유량예측 앙상블에 활용되는 기상시나리오의 기간에 따라 수행하였다. 연구결과 앙상블 유량예측 기법은 입력자료 보다 모델 매개변수의 영향을 크게 받았으며, 20년 이상의 관측 기상자료가 확보되었을 때 활용하는 것이 적절하였다. 또한 BAYES-ESP는 ESP에 비해 불확실성을 감소시킬 수 있는 것으로 나타났다. 본 연구는 불확실성 분석을 통해 앙상블 유량예측기법의 특징을 규명하고 오차의 원인을 분석하였다는 점에서 가치가 있다고 판단된다.
In this study, a weighted ensemble method of numerical weather prediction by ensemble models is applied for PyeongChang area. The post-processing method takes into account combination and calibration of forecasts from different numerical models, assigning greater weight to ensemble models that exhibit the better performance. Three different numerical models, including European Center Medium-Range Weather Forecast, Ensemble Prediction System for Global, and Limited Area Ensemble Prediction System, were used to perform the post-processing method. We compared the model outputs from the weighed combination of ensembles with those from the Ensemble Model Output Statistics (EMOS) model for each raw ensemble model. The results showed that the weighted ensemble method can significantly improve the post-processing performance, compared to the raw ensemble method of the numerical models.
일단위 강우-유출모형인 SSARR모형을 이용하여 한강, 낙동강, 섬진강유역에 월 앙상블 유량예측 시스템을 구축하였다. 우선 SSARR모형의 월 평균 유출량에 대한 모의정확성을 평가한 결과 한강과 낙동강유역에서는 과소추정하는 경향이 뚜렷하였으며, 섬진강유역에서는 모의오차의 분산이 커 정확성 개선이 필요하였다. 최적선형 보정기법을 적용하여 SSARR모형의 모의유량을 보정한 결과, 섬진강을 제외한 한강과 낙동강유역의 검증지점에서는 모의 정확성이 크게 개선되었