The management of algal bloom is essential for the proper management of water supply systems and to maintain the safety of drinking water. Chlorophyll-a(Chl-a) is a commonly used indicator to represent the algal concentration. In recent years, advanced machine learning models have been increasingly used to predict Chl-a in freshwater systems. Machine learning models show good performance in various fields, while the process of model development requires considerable labor and time by experts. Automated machine learning(auto ML) is an emerging field of machine learning study. Auto ML is used to develop machine learning models while minimizing the time and labor required in the model development process. This study developed an auto ML to predict Chl-a using auto sklearn, one of most widely used open source auto ML algorithms. The model performance was compared with other two popular ensemble machine learning models, random forest(RF) and XGBoost(XGB). The model performance was evaluated using three indices, root mean squared error, root mean squared error-observation standard deviation ratio(RSR) and Nash-Sutcliffe coefficient of efficiency. The RSR of auto ML, RF, and XGB were 0.659, 0.684 and 0.638, respectively. The results shows that auto ML outperforms RF, and XGB shows better prediction performance than auto ML, while the differences between model performances were not significant. Shapley value analysis, an explainable machine learning algorithm, was used to provide quantitative interpretation about the model prediction of auto ML developed in this study. The results of this study present the possible applicability of auto ML for the prediction of water quality.
The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.
Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.
The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.
PURPOSES : The purpose of this study is to compare applicability, explanation power, and flexibility of traffic accident models between estimating model using the statistical method and the machine learning method.
METHODS: In order to compare and analyze traffic accident models between model estimated using the statistical method and machine learning method, data acquisition was conducted, and traffic accident models were estimated using statistical methods such as negative binomial regression model, and machine learning methods such as a generalized regression neural network (GRNN). Then, the fitness of model as R2, root mean square error (RMSE), mean absolute percentage error (MAPE), accuracy, etc., were determined to compare the traffic accident models.
RESULTS: The results showed that the annual average daily traffic (AADT), speed limits, number of lanes, land usage, exclusive right turn lanes, and front signals were significant for both traffic accident models. The GRNN model of total traffic accidents had been better statistical significant with R2: 0.829, RMSE: 2.495, MAPE: 32.158, and Accuracy: 66.761 compared with the negative binomial regression model with R2: 0.363, RMSE: 9.033, MAPE: 68.987, and Accuracy: 8.807. The GRNN model of injury traffic accidents also showed similar results of model’s statistical significance.
CONCLUSIONS: Traffic accident models estimated with GRNN had better statistical significance compared with models estimated with statistical methods such as negative binomial regression model.
Water resources planning and management are, more and more, becoming important issue for water use and flood control due to the population increase, urbanization, and climate change. In particular, the estimating and the forecasting inflow of dam is the most important hydrologic issue for flood control and reliable water supply. Therefore, this study forecasted monthly inflow of Soyang river dam using VARMA model and 3 machine learning models. The forecasting models were constructed using monthly inflow data in the period of 1974 to 2016 and then the inflows were forecasted at 12- and 24-month ahead lead times. As a result, the forecasted monthly inflows by the models mostly were less than the observed ones, but the peak time and the variation pattern were well forecasted. Especially, the VARMA model showed very good performance in the forecasting. Therefore, the result of this study indicates that the VARMA model can be used efficiently to forecast hydrologic data and also used to establish water supply and management plan.