상하수도학회지 제36권 제4호 (p.239-248)

수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구

Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence
키워드 :
Ensemble machine learning,Explainable artificial intelligence,Machine learning,Water quality management,Water quality prediction,앙상블 머신러닝,설명가능한 인공지능,머신러닝,수질관리,수질예측

목차

ABSTRACT
1. 서 론
2. 재료 및 실험방법
   2.1 연구대상지역 및 분석자료
   2.2 GBDT 모형구축
   2.3 입력자료의 군집화를 통한 모형구축
   2.4 설명가능한 인공지능
   2.5 모형 성능 평가
3. 결과 및 고찰
   3.1 입력자료 군집화 결과
   3.2 모형예측결과
   3.3 설명가능한 인공지능을 이용한 모형 결과분석
4. 결 론
References

초록

The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.