A machine learning-based algorithms have used for constructing species distribution models (SDMs), but their performances depend on the selection of backgrounds. This study attempted to develop a noble method for selecting backgrounds in machine-learning SDMs. Two machine-learning based SDMs (MaxEnt, and Random Forest) were employed with an example species (Spodoptera litura), and different background selection methods (random sampling, biased sampling, and ensemble sampling by using CLIMEX) were tested with multiple performance metrics (TSS, Kappa, F1-score). As a result, the model with ensemble sampling predicted the widest occurrence areas with the highest performance, suggesting the potential application of the developed method for enhancing a machine-learning SDM.
고성능 콘크리트(HPC) 압축강도는 추가적인 시멘트질 재료의 사용으로 인해 예측하기 어렵고, 개선된 예측 모델의 개발이 필수적 이다. 따라서, 본 연구의 목적은 배깅과 스태킹을 결합한 앙상블 기법을 사용하여 HPC 압축강도 예측 모델을 개발하는 것이다. 이 논 문의 핵심적 기여는 기존 앙상블 기법인 배깅과 스태킹을 통합하여 새로운 앙상블 기법을 제시하고, 단일 기계학습 모델의 문제점을 해결하여 모델 예측 성능을 높이고자 한다. 단일 기계학습법으로 비선형 회귀분석, 서포트 벡터 머신, 인공신경망, 가우시안 프로세스 회귀를 사용하고, 앙상블 기법으로 배깅, 스태킹을 이용하였다. 결과적으로 본 연구에서 제안된 모델이 단일 기계학습 모델, 배깅 및 스태킹 모델보다 높은 정확도를 보였다. 이는 대표적인 4가지 성능 지표 비교를 통해 확인하였고, 제안된 방법의 유효성을 검증하였다.
This paper proposes an outlier detection model based on machine learning that can diagnose the presence or absence of major engine parts through unsupervised learning analysis of main engine big data of a ship. Engine big data of the ship was collected for more than seven months, and expert knowledge and correlation analysis were performed to select features that are closely related to the operation of the main engine. For unsupervised learning analysis, ensemble model wherein many predictive models are strategically combined to increase the model performance, is used for anomaly detection. As a result, the proposed model successfully detected the anomalous engine status from the normal status. To validate our approach, clustering analysis was conducted to find out the different patterns of anomalies the anomalous point. By examining distribution of each cluster, we could successfully find the patterns of anomalies.
Recently, a number of researchers have produced research and reports in order to forecast more exactly air quality such as particulate matter and odor. However, such research mainly focuses on the atmospheric diffusion models that have been used for the air quality prediction in environmental engineering area. Even though it has various merits, it has some limitation in that it uses very limited spatial attributes such as geographical attributes. Thus, we propose the new approach to forecast an air quality using a deep learning based ensemble model combining temporal and spatial predictor. The temporal predictor employs the RNN LSTM and the spatial predictor is based on the geographically weighted regression model. The ensemble model also uses the RNN LSTM that combines two models with stacking structure. The ensemble model is capable of inferring the air quality of the areas without air quality monitoring station, and even forecasting future air quality. We installed the IoT sensors measuring PM2.5, PM10, H2S, NH3, VOC at the 8 stations in Jeonju in order to gather air quality data. The numerical results showed that our new model has very exact prediction capability with comparison to the real measured data. It implies that the spatial attributes should be considered to more exact air quality prediction.
In this paper, the characteristic of intrinsic mode function(IMF) and its orthogonalization of ensemble empirical mode decomposition(EEMD), which is often used in the analysis of the non-linear or non-stationary signal, has been studied. In the decomposition process, the orthogonal IMF of EEMD was obtained by applying the Gram-Schmidt(G-S) orthogonalization method, and was compared with the IMF of orthogonal EMD(OEMD). Two signals for comparison analysis are adopted as the analytical test function and El Centro seismic wave. These target signals were compared by calculating the index of orthogonality(IO) and the spectral energy of the IMF. As a result of the analysis, an IMF with a high IO was obtained by GSO method, and the orthogonal EEMD using white noise was decomposed into orthogonal IMF with energy closer to the original signal than conventional OEMD.
Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.
In this study, we compared the prediction performances according to the bias and dispersion of temperature using ensemble machine learning. Ensemble machine learning is meta-algorithm that combines several base learners into one prediction model in order to improve prediction. Multiple linear regression, ridge regression, LASSO (Least Absolute Shrinkage and Selection Operator; Tibshirani, 1996) and nonnegative ride and LASSO were used as base learners. Super learner (van der Lann et al ., 1997) was used to produce one optimal predictive model. The simulation and real data for temperature were used to compare the prediction skill of machine learning. The results showed that the prediction performances were different according to the characteristics of bias and dispersion and the prediction error was more improved in temperature with bias compared to dispersion. Also, ensemble machine learning method showed similar prediction performances in comparison to the base learners and showed better prediction skills than the ensemble mean.
본 연구에서는 충주댐 유역에 대해 앙상블 유량예측기법의 강우-유출 모델 매개변수, 입력자료에 따른 불확실성 분석을 수행하였다. 앙상블 유량예측기법으로는 ESP (Ensemble Streamflow Prediction) 기법과 BAYES-ESP (Bayesian-ESP) 기법을 활용하였으며, 강우-유출 모델로는 ABCD를 활용하였다. 모델 매개변수에 따른 불확실성 분석은 GLUE (Generalized Likelihood Uncertainty Estimation) 기법을 적용하였으며, 입력자료에 따른 불확실성 분석은 유량예측 앙상블에 활용되는 기상시나리오의 기간에 따라 수행하였다. 연구결과 앙상블 유량예측 기법은 입력자료 보다 모델 매개변수의 영향을 크게 받았으며, 20년 이상의 관측 기상자료가 확보되었을 때 활용하는 것이 적절하였다. 또한 BAYES-ESP는 ESP에 비해 불확실성을 감소시킬 수 있는 것으로 나타났다. 본 연구는 불확실성 분석을 통해 앙상블 유량예측기법의 특징을 규명하고 오차의 원인을 분석하였다는 점에서 가치가 있다고 판단된다.
최근 수문자료에서 비정상성 현상들이 관측됨에 따라 비정상성 빈도해석에 관한 연구들이 활발하게 진행되고 있다. 시간에 따라 변화하는 통계 적 특성을 고려하기 위하여 다양한 형태의 비정상성 확률분포형이 제시되고 있으며, 비정상성 매개변수를 추정할 수 있는 다양한 방법들이 연구되 고 있는 추세이다. 본 연구에서는 앙상블 경험적 모드분해법을 이용한 비정상성 Gumbel 분포형의 매개변수 추정방법을 제시하고 기존에 비정상 성 매개변수 추정방법으로 주로 사용되어온 최우도법과 비교해보고자 하였다. 국내 자료의 적용을 위하여 기상청 지점의 다양한 지속기간에 대해 경향성이 나타나는 연 최대치 강우자료를 사용하였다. 적용 결과 선형적 경향성을 나타내는 자료에 대해서는 두 가지 방법 모두 적절한 모형을 선 정하였으나, 2차 곡선 형태의 경향성이 존재하는 자료에 대해서는 앙상블 경험적 모드분해법의 경우에만 이러한 경향성을 반영하는 비정상성 Gumbel 모형을 선정하였다.