In this study, the magnetocaloric effect and transition temperature of bulk metallic glass, an amorphous material, were predicted through machine learning based on the composition features. From the Python module ‘Matminer’, 174 compositional features were obtained, and prediction performance was compared while reducing the composition features to prevent overfitting. After optimization using RandomForest, an ensemble model, changes in prediction performance were analyzed according to the number of compositional features. The R2 score was used as a performance metric in the regression prediction, and the best prediction performance was found using only 90 features predicting transition temperature, and 20 features predicting magnetocaloric effects. The most important feature when predicting magnetocaloric effects was the ‘Fe’ compositional ratio. The feature importance method provided by ‘scikit-learn’ was applied to sort compositional features. The feature importance method was found to be appropriate by comparing the prediction performance of the Fe-contained dataset with the full dataset.
It would be advantageous to grow legume forage crops in order to increase the productivity and sustainability of sloped croplands in Hamkyongbukdo. In particular, the identification of potential cultivation areas for alfalfa in the given region could aid decision-making on policies and management related to forage crop production in the future. This study aimed to analyze the climate suitability of alfalfa in Hamkyongbukdo under current and future climate conditions using the Fuzzy Union model. The climate suitability predicted by the Fuzzy Union model was compared with the actual alfalfa cultivation area in the northern United States. Climate data obtained from 11 global climate models were used as input data for calculation of climate suitability in the study region to examine the uncertainty of projections under future climate conditions. The area where the climate suitability index was greater than a threshold value (22.6) explained about 44% of the variation in actual alfalfa cultivation areas by state in the northern United States. The climatic suitability of alfalfa was projected to decrease in most areas of Hamkyongbukdo under future climate scenarios. The climatic suitability in Onseong and Gyeongwon County was analyzed to be over 88 in the current climate conditions. However, it was projected to decrease by about 66% in the given areas by the 2090s. Our study illustrated that the impact of climate change on suitable cultivation areas was highly variable when different climate data were used as inputs to the Fuzzy Union model. Still, the ensemble of the climate suitability projections for alfalfa was projected to decrease considerably due to summer depression in Hamkyongbukdo. It would be advantageous to predict suitable cultivation areas by adding soil conditions or to predict the climate suitability of other leguminous crops such as hairy vetch, which merits further studies.
고성능 콘크리트(HPC) 압축강도는 추가적인 시멘트질 재료의 사용으로 인해 예측하기 어렵고, 개선된 예측 모델의 개발이 필수적 이다. 따라서, 본 연구의 목적은 배깅과 스태킹을 결합한 앙상블 기법을 사용하여 HPC 압축강도 예측 모델을 개발하는 것이다. 이 논 문의 핵심적 기여는 기존 앙상블 기법인 배깅과 스태킹을 통합하여 새로운 앙상블 기법을 제시하고, 단일 기계학습 모델의 문제점을 해결하여 모델 예측 성능을 높이고자 한다. 단일 기계학습법으로 비선형 회귀분석, 서포트 벡터 머신, 인공신경망, 가우시안 프로세스 회귀를 사용하고, 앙상블 기법으로 배깅, 스태킹을 이용하였다. 결과적으로 본 연구에서 제안된 모델이 단일 기계학습 모델, 배깅 및 스태킹 모델보다 높은 정확도를 보였다. 이는 대표적인 4가지 성능 지표 비교를 통해 확인하였고, 제안된 방법의 유효성을 검증하였다.
The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.
기술 트렌드가 증가함에 따라, 엄청난 양의 데이터가 생성되고 있습니다. 많은 양의 데이터가 소비되는 기술 분야 중 하나는 컴퓨터 비전이다. 인간은 기계와 비교할 때 시각에 영향을 미치는 표정, 조명 또는 시야각과 같은 외부 조건에서도 얼굴이나 사물을 쉽게 감지하고 인식할 수 있다. 그 이유는 그것과 관련된 높은 차원 의 데이터 때문이다. 데이터 차원성은 모든 관측치에서 측정되는 변수의 총 수를 말합니다. 이번 사업은 안 면인식시스템에 적합한 다양한 차원감소 기법을 비교하고 조도가 다양한 안면이미지로 구성된 다양한 데이 터세트로 테스트해 모델의 정확도 향상에 도움이 되는 기법의 앙상블 모델을 제안하고 성능을 측정하는 것 이 목적이다.렉스 배경과 표현. 제안된 앙상블 모델은 주성분 분석(PCA)과 로컬 선형 임베딩(LLE)이라는 두 가지 차원 감소 기술의 혼합에서 벡터를 추출하고, 이를 밀도 높은 컨볼루션 신경망(CNN)을 통해 전달하여 야생 면(LFW) 데이터 세트의 얼굴을 예측한다. 이 모형은 0.95의 검정 정확도와 0.94의 검정 F1 점수로 수행 됩니다. 제안된 시스템은 시스템이 얼굴을 예측할 수 있는 제안된 앙상블 모델과 통합된 웹캠에서 라이브 비 디오 스트림을 캡처하는 플라스크를 사용하여 개발된 웹 앱을 포함한다.
Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.
구조물에 장기적으로 발생하는 노후화를 정량적으로 파악하기 위해 상시진동 데이터를 활용한 일반화된 모니터링 시스템에 관한 연구가 세계적으로 활발히 수행중이다. 본 연구에서는 구조물에서 장기적으로 취득되는 동특성을 앙상블 학습에 활용하여 구조물의 이상을 감지하기 위한 보급형 엣지 컴퓨팅 시스템을 구축하였다. 시스템의 하드웨어는 라즈베리파이와 보급형 가속도계, 기울기센서, GPS RTK 모듈, 로라 모듈로 구성됐다. 실험실 규모의 구조물 모형 진동실험을 통해 동특성을 활용한 앙상블 학습의 구조물 이상 감지를 검증하였으며, 실험을 기반으로 한 실시간 동특성 추출 분산처리 알고리즘을 라즈베리파이에 탑재하였다. 구축된 시스템을 하우징하고 포항시 행정복지센터에 설치하여 데이터를 취득함으로써 개발된 시스템의 현장 적용성을 검증하였다.
The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.
This paper proposes an outlier detection model based on machine learning that can diagnose the presence or absence of major engine parts through unsupervised learning analysis of main engine big data of a ship. Engine big data of the ship was collected for more than seven months, and expert knowledge and correlation analysis were performed to select features that are closely related to the operation of the main engine. For unsupervised learning analysis, ensemble model wherein many predictive models are strategically combined to increase the model performance, is used for anomaly detection. As a result, the proposed model successfully detected the anomalous engine status from the normal status. To validate our approach, clustering analysis was conducted to find out the different patterns of anomalies the anomalous point. By examining distribution of each cluster, we could successfully find the patterns of anomalies.
Interest rate spreads indicate the conditions of the economy and serve as an indicator of the recession. The purpose of this study is to predict Korea's interest rate spreads using US data with long-term continuity. To this end, 27 US economic data were used, and the entire data was reduced to 5 dimensions through principal component analysis to build a dataset necessary for prediction. In the prediction model of this study, three RNN models (BasicRNN, LSTM, and GRU) predict the US interest rate spread and use the predicted results in the SVR ensemble model to predict the Korean interest rate spread. The SVR ensemble model predicted Korea's interest rate spread as RMSE 0.0658, which showed more accurate predictive power than the general ensemble model predicted as RMSE 0.0905, and showed excellent performance in terms of tendency to respond to fluctuations. In addition, improved prediction performance was confirmed through period division according to policy changes. This study presented a new way to predict interest rates and yielded better results. We predict that if you use refined data that represents the global economic situation through follow-up studies, you will be able to show higher interest rate predictions and predict economic conditions in Korea as well as other countries.
There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.
Recently, a number of researchers have produced research and reports in order to forecast more exactly air quality such as particulate matter and odor. However, such research mainly focuses on the atmospheric diffusion models that have been used for the air quality prediction in environmental engineering area. Even though it has various merits, it has some limitation in that it uses very limited spatial attributes such as geographical attributes. Thus, we propose the new approach to forecast an air quality using a deep learning based ensemble model combining temporal and spatial predictor. The temporal predictor employs the RNN LSTM and the spatial predictor is based on the geographically weighted regression model. The ensemble model also uses the RNN LSTM that combines two models with stacking structure. The ensemble model is capable of inferring the air quality of the areas without air quality monitoring station, and even forecasting future air quality. We installed the IoT sensors measuring PM2.5, PM10, H2S, NH3, VOC at the 8 stations in Jeonju in order to gather air quality data. The numerical results showed that our new model has very exact prediction capability with comparison to the real measured data. It implies that the spatial attributes should be considered to more exact air quality prediction.
In this paper, the characteristic of intrinsic mode function(IMF) and its orthogonalization of ensemble empirical mode decomposition(EEMD), which is often used in the analysis of the non-linear or non-stationary signal, has been studied. In the decomposition process, the orthogonal IMF of EEMD was obtained by applying the Gram-Schmidt(G-S) orthogonalization method, and was compared with the IMF of orthogonal EMD(OEMD). Two signals for comparison analysis are adopted as the analytical test function and El Centro seismic wave. These target signals were compared by calculating the index of orthogonality(IO) and the spectral energy of the IMF. As a result of the analysis, an IMF with a high IO was obtained by GSO method, and the orthogonal EEMD using white noise was decomposed into orthogonal IMF with energy closer to the original signal than conventional OEMD.
본 논문은 보컬 앙상블(Vocal Ensemble)이 효과적으로 나타난 밴드 퀸(Queen)의 4집 앨범「어나잇 앳더 오페라(A Night At the Opera)」의 수록 곡 중‘보헤미안 랩소디(Bohemian Rhapsody)’를 선행 연구 곡으로 선정하여 분석 하였다. 보컬 앙상블 기법은 보컬 앙상블 유형과 보컬 앙상블 패닝 두 가지로 분석하였고 보컬 앙상블 유형은 분석 후 악보화 하였으며, 보컬 앙상블 패닝은 스펙트로그램(Spectrogram)을 사용하여 분석하였다.‘보헤미안 랩소디’에서 보컬 앙상블 유형에 따라 패닝을 효과적으로 사용하여 보컬 앙상블이 돋보이는 것을 알 수 있다. 이러한 연구 결과에 따라 보컬 앙상블을 필요로 하는 음악과 작곡가, 편곡가들에게 활용 방안을 제시하였다.
2010년 에야피얏라흐요쿳(Eyjafjallajökull) 화산 분화에 의한 화산재의 확산은 유럽 전역의 항공기 운항을 중단시켰으며 전 지구적인 사회 및 경제적 관심을 불러일으켰다. 또한 국내에서도 한반도에 영향을 미칠 수 있는 주변 화산 분화 활동과 백두산 분화 전조현상에 대한 연구 활동이 꾸준히 진행되고 있다. 화산재 확산 예측은 기상데이터를 이용한 확산 수치 모형이 일반적으로 이용되는데, 기상데이터와 수치 모형의 불확실성을 줄이기 위한 방법으로 앙상블 분석이 주로 활용되고 있다. 본 연구에서는 오일러 방법 기반의 수치 모형에 의한 화산재 확산 해석을 유사한 기상장을 갖는 날짜에 대하여 수행했으며, 앙상블 분석을 통한 불확실성 감소 방법을 제시하였다. 특히 대부분의 앙상블 방법은 현장 관측데이터를 주요 데이터로 간주하는데 반하여, 화산재의 현장 측정은 얻기가 매우 어려운 상황이다. 그러므로 신뢰도 앙상블 평균(REA; Reliability Ensemble Averaging) 방법의 과거기간 시나리오의 모의 변수에 대 한 오차항을 배제하고 시나리오간 모의 변수의 평균 차이항만을 고려하여 화산재 확산 해석 결과만을 이용해 앙상블을 수행했으며 단순 모형 평균(SMA; Simple Model Averaging) 방법과 비교하여 불확실성이 감소하는 것을 확인할 수 있었다.
Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.