논문 상세보기

앙상블 머신러닝 모형을 이용한 하천 녹조발생 예측모형의 입력변수 특성에 따른 성능 영향 KCI 등재

Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction

  • 언어KOR
  • URLhttps://db.koreascholar.com/Article/Detail/411886
구독 기관 인증 시 무료 이용이 가능합니다. 4,000원
상하수도학회지 (Journal of the Korean Society of Water and Wastewater)
대한상하수도학회 (Korean Society Of Water And Wastewater)
초록

Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.

목차
ABSTRACT
1. 서 론
2. 재료 및 실험방법
    2.1 연구대상지역
    2.2 LightGBM 모형
    2.3 입력 자료
    2.4 LightGBM 모형 성능 검정 및 비교
3. 결과 및 고찰
    3.1 LightGBM 예측결과
    3.2 차분값에 따른 모형성능 비교
    3.3 머신러닝 모형 성능에 대한 입력변수 영향에 대한 고찰
4. 결 론
References
저자
  • 강병구(국립한밭대학교 건설환경공학과) | Byeong-Koo Kang (Department of Civil and Environmental Engineering, Hanbat National University)
  • 박정수(국립한밭대학교 건설환경공학과) | Jungsu Park (Department of Civil and Environmental Engineering, Hanbat National University) Corresponding author