검색결과 - koreascholar

2023.10 구독 인증기관·개인회원 무료

Development of ensemble background selection method for enhancing the performance of machine learning-based species distribution models

Sunhee Yoon, Wang-Hee Lee

한국응용곤충학회 학술대회논문집 2023 한국응용곤충학회 임시총회 및 추계학술발표회 p.154 한국응용곤충학회

A machine learning-based algorithms have used for constructing species distribution models (SDMs), but their performances depend on the selection of backgrounds. This study attempted to develop a noble method for selecting backgrounds in machine-learning SDMs. Two machine-learning based SDMs (MaxEnt, and Random Forest) were employed with an example species (Spodoptera litura), and different background selection methods (random sampling, biased sampling, and ensemble sampling by using CLIMEX) were tested with multiple performance metrics (TSS, Kappa, F1-score). As a result, the model with ensemble sampling predicted the widest occurrence areas with the highest performance, suggesting the potential application of the developed method for enhancing a machine-learning SDM.

2023.02 KCI 등재 구독 인증기관 무료, 개인회원 유료

배깅 및 스태킹 기반 앙상블 기계학습법을 이용한 고성능 콘크리트 압축강도 예측모델 개발

Development of a High-Performance Concrete Compressive-Strength Prediction Model Using an Ensemble Machine-Learning Method Based on Bagging and Stacking

곽윤지, 고채연, 곽신영, 임승현

한국전산구조공학회 논문집 제36권 1호 pp.9-18 한국전산구조공학회

고성능 콘크리트(HPC) 압축강도는 추가적인 시멘트질 재료의 사용으로 인해 예측하기 어렵고, 개선된 예측 모델의 개발이 필수적 이다. 따라서, 본 연구의 목적은 배깅과 스태킹을 결합한 앙상블 기법을 사용하여 HPC 압축강도 예측 모델을 개발하는 것이다. 이 논 문의 핵심적 기여는 기존 앙상블 기법인 배깅과 스태킹을 통합하여 새로운 앙상블 기법을 제시하고, 단일 기계학습 모델의 문제점을 해결하여 모델 예측 성능을 높이고자 한다. 단일 기계학습법으로 비선형 회귀분석, 서포트 벡터 머신, 인공신경망, 가우시안 프로세스 회귀를 사용하고, 앙상블 기법으로 배깅, 스태킹을 이용하였다. 결과적으로 본 연구에서 제안된 모델이 단일 기계학습 모델, 배깅 및 스태킹 모델보다 높은 정확도를 보였다. 이는 대표적인 4가지 성능 지표 비교를 통해 확인하였고, 제안된 방법의 유효성을 검증하였다.

4,000원

2022.08 KCI 등재 구독 인증기관 무료, 개인회원 유료

수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구

Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence

박정수

상하수도학회지 제36권 제4호 pp.239-248 대한상하수도학회

The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.

4,000원

2021.12 KCI 등재 구독 인증기관 무료, 개인회원 유료

앙상블 머신러닝 모형을 이용한 하천 녹조발생 예측모형의 입력변수 특성에 따른 성능 영향

Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction

강병구, 박정수

상하수도학회지 제35권 제6호 pp.417-424 대한상하수도학회

Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.

4,000원

2021.02 KCI 등재 구독 인증기관 무료, 개인회원 유료

딥러닝과 앙상블 머신러닝 모형의 하천 탁도 예측 특성 비교 연구

Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river

박정수

상하수도학회지 제35권 제1호 pp.83-91 대한상하수도학회

The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.

4,000원

2020.12 KCI 등재 SCOPUS 구독 인증기관 무료, 개인회원 유료

SUNSPOT AREA PREDICTION BASED ON COMPLEMENTARY ENSEMBLE EMPIRICAL MODE DECOMPOSITION AND EXTREME LEARNING MACHINE

Lingling Peng

천문학회지 제53권 제6호 pp.139-147 한국천문학회

The sunspot area is a critical physical quantity for assessing the solar activity level; forecasts of the sunspot area are of great importance for studies of the solar activity and space weather. We developed an innovative hybrid model prediction method by integrating the complementary ensemble empirical mode decomposition (CEEMD) and extreme learning machine (ELM). The time series is first decomposed into intrinsic mode functions (IMFs) with different frequencies by CEEMD; these IMFs can be divided into three groups, a high-frequency group, a low-frequency group, and a trend group. The ELM forecasting models are established to forecast the three groups separately. The final forecast results are obtained by summing up the forecast values of each group. The proposed hybrid model is applied to the smoothed monthly mean sunspot area archived at NASA's Marshall Space Flight Center (MSFC). We find a mean absolute percentage error (MAPE) and a root mean square error (RMSE) of 1.80% and 9.75, respectively, which indicates that: (1) for the CEEMD-ELM model, the predicted sunspot area is in good agreement with the observed one; (2) the proposed model outperforms previous approaches in terms of prediction accuracy and operational efficiency.

4,000원

2019.06 KCI 등재 서비스 종료(열람 제한)

앙상블 기계 학습을 이용한 기온 예측

Forecast of Temperature using Ensemble Machine Learning Method

황유선, 김찬수

기후연구 제14권 제2호 pp.129-143 건국대학교 기후연구소

In this study, we compared the prediction performances according to the bias and dispersion of temperature using ensemble machine learning. Ensemble machine learning is meta-algorithm that combines several base learners into one prediction model in order to improve prediction. Multiple linear regression, ridge regression, LASSO (Least Absolute Shrinkage and Selection Operator; Tibshirani, 1996) and nonnegative ride and LASSO were used as base learners. Super learner (van der Lann et al ., 1997) was used to produce one optimal predictive model. The simulation and real data for temperature were used to compare the prediction skill of machine learning. The results showed that the prediction performances were different according to the characteristics of bias and dispersion and the prediction error was more improved in temperature with bias compared to dispersion. Also, ensemble machine learning method showed similar prediction performances in comparison to the base learners and showed better prediction skills than the ensemble mean.