본 연구에서는 대파의 가락시장 도매가격을 이용하여 기존 시계열 모형인 ARIMA 모형, 홀트-윈터스 평활법과 대표적인 기계학습 방법인 랜덤 포레스트(Random forest) 분석 기법의 가격 예측력을 비교하였다. 세 모형의 예측력을 분석한 결과는 다음과 같다. 가장 예측력이 높게 나타난 모형은 3년(36개월)을 주기로 설정한 ARIMA 모형이었다. 또한 ARIMA 모형과 홀트-윈터스 평활법은 일별 데이터보다 월별 데이터를 이용한 예측 결과의 정확도가 더 높아 훈련 데이터에 대한 과적합(overfitting)이 오히려 예측력을 낮추는 현상을 보였다. 반면, 랜덤 포레스트는 월별 데이터 보다 일별 데이터를 사용한 모형의 예측력이 더 높았다. 이는 학습량이 많을수록 높은 예측력을 보여주는 기계학습의 특징을 보여주었다. 그러나 기계학습 방법을 활용한 가격 예측에는 가격에 영향을 주는 설명변수를 찾고, 양질의 훈련 데이터 축적이 필요하다는 것을 알 수 있었다. 향후 연구에서는 다양한 설명변수와 기계학습 및 딥러닝 기법을 적용한다면 농축산물 가격 예측력을 높이는데 도움이 될 것으로 판단된다.
본 연구는 서울교육청 교육연구정보원의 「서울교육종단연구(SELS)」에 서 수집된 자료를 활용하여, 고등학생 3학년인 9차(2018년) 자료에서 학 생 2,793명을 연구 대상자로 정하였다. 청소년의 학교만족도와 관련한 예측요인을 확인하기 위해 SPSS 26.0을 사용하여 의사결정나무모형 분 석을 실시하였다. 연구결과를 살펴보면, 첫째, 청소년의 학교만족도의 분 류에서 개인적인 요인으로는 성별, 자아개념, 자기평가, 사회적 관계 요 인으로 보호자, 학교교사, 학교 특성/문화 요인으로는 학교에 대한 평가, 학교풍토가 유의한 변인으로 확인되었다. 둘째, 학교만족도 분류에 영향 을 주는 변인들 중에서는 학교에 대한 평가가 가장 영향력을 가진 변인 으로 나타났다. 셋째, 학교교사 수치가 높은 집단에서는 학교풍토, 자아 개념이 분류의 중요한 의미 있는 변인이었고, 학교교사 수치가 낮은 집 단에서는 자기평가, 학교풍토, 학교에 대한 평가가 영향력 있는 변인이었 다. 넷째, 학교에 대한 평가 수준 및 학교풍토가 바람직하고 좋으면 학교 만족도가 긍정적으로 상승하는 것으로 확인되었다. 본 연구결과는 청소 년의 학교만족도 증진을 위한 방안 모색, 교육정책 수립 및 프로그램 운 영에 도움이 될 것으로 사료된다.
PURPOSES : The primary purpose of this study is to develop a framework for predicting the demand and distribution of pedestrians when an open space zone is built at the top through the undergroundization of the Gyeongin Expressway.
METHODS : After analyzing the current status through a survey on the number of people, students, surrounding traffic volume, and future socioeconomic indicators, the rate of change in the floating population and the rate of increase and decrease in the traffic volume of pedestrians were calculated to evaluate the effect. In addition, microscopic analysis results were derived by setting a pedestrian analysis zone (PAZ). A walking environment index (WEI) was developed that can quantitatively evaluate the degree of walking activation by indicating walking-related surrounding environmental factors. Based on this, a walking demand prediction model was developed. In addition, the results were validated by calculating the walking volume through a micro-simulation in/around the open space zone.
RESULTS : The number of crosswalks and schools, transit development indicators, and pedestrian volume increased as the WEI value increased. However, the log form of the distance was observed to be a factor that reduced walking.
CONCLUSIONS : This study attempted to reliably predict the demand for walking on the Gyeongin Expressway by calculating the amount of induced walking and the amount of passing walking. The pedestrian demand can be boosted by improving walking environments.
The management of algal bloom is essential for the proper management of water supply systems and to maintain the safety of drinking water. Chlorophyll-a(Chl-a) is a commonly used indicator to represent the algal concentration. In recent years, advanced machine learning models have been increasingly used to predict Chl-a in freshwater systems. Machine learning models show good performance in various fields, while the process of model development requires considerable labor and time by experts. Automated machine learning(auto ML) is an emerging field of machine learning study. Auto ML is used to develop machine learning models while minimizing the time and labor required in the model development process. This study developed an auto ML to predict Chl-a using auto sklearn, one of most widely used open source auto ML algorithms. The model performance was compared with other two popular ensemble machine learning models, random forest(RF) and XGBoost(XGB). The model performance was evaluated using three indices, root mean squared error, root mean squared error-observation standard deviation ratio(RSR) and Nash-Sutcliffe coefficient of efficiency. The RSR of auto ML, RF, and XGB were 0.659, 0.684 and 0.638, respectively. The results shows that auto ML outperforms RF, and XGB shows better prediction performance than auto ML, while the differences between model performances were not significant. Shapley value analysis, an explainable machine learning algorithm, was used to provide quantitative interpretation about the model prediction of auto ML developed in this study. The results of this study present the possible applicability of auto ML for the prediction of water quality.
PURPOSES : In this study, surface distress (SD), rutting depth (RD), and international roughness index (IRI) prediction models are developed based on the zones of Incheon and road classes using regression analysis. Regression analysis is conducted based on a correlation analysis between the pavement performance and influencing factors.
METHODS : First, Incheon was categorized by zone such as industrial, port, and residential areas, and the roads were categorized into major and sub-major roads. A weather station triangle network for Incheon was developed using the Delaunay triangulation based on the position of the weather station to match the road sections in Incheon and environmental factors. The influencing factors of the road sections were matched Based on the developed triangular network. Meanwhile, based on the matched influencing factors, a model of the current performance of the road pavement in Incheon was developed by performing multiple regression analysis. Sensitivity analysis was conducted using the developed model to determine the influencing factor that affected each performance factor the most significantly.
RESULTS : For the SD model, frost days, daily temperature range, rainy days, tropical nights, and minimum temperatures are used as independent variables. Meanwhile, the truck ratio, freeze–thaw days, precipitation days, annual temperature range, and average temperatures are used for the RD model. For the IRI model, the maximum temperature, freeze–thaw days, average temperature, annual precipitation, and wet days are used. Results from the sensitivity analysis show that frost days for the SD model, precipitation days and freeze–thaw days for the RD model, and wet days for the IRI model impose the most significant effects.
CONCLUSIONS : We developed a road pavement performance prediction model using multiple regression analysis based on zones in Incheon and road classes. The developed model allows the influencing factors and circumstances to be predicted, thus facilitating road management.
PURPOSES : To efficiently manage pavements, a systematic pavement management system must be established based on regional characteristics. Suppose that the future conditions of a pavement section can be predicted based on data obtained at present. In this case, a more reasonable road maintenance strategy should be established. Hence, a prediction model of the annual surface distress (SD) change for national highway pavements in Gangwon-do, Korea is developed based on influencing factors.
METHODS : To develop the model, pavement performance data and influencing factors were obtained. Exploratory data analysis was performed to analyze the data acquired, and the results show that the data were preprocessed. The variables used for model development were selected via correlation analysis, where variables such as surface distress, international roughness index, daily temperature range, and heat wave days were used. Best subset regression was performed, where the candidate model was selected from all possible subsets based on certain criteria. The final model was selected based on an algorithm developed for rational model selection. The sensitivity of the annual SD change was analyzed based on the variables of the final model.
RESULTS : The result of the sensitivity analysis shows that the annual SD change is affected by the variables in the following order: surface distress ˃ heat wave days ˃ daily temperature range ˃ international roughness index.
CONCLUSIONS : An annual SD change prediction model is developed by considering the present performance, traffic volume, and climatic conditions. The model can facilitate the establishment of a reasonable road maintenance strategy. The prediction accuracy can be improved by obtaining additional data, such as the construction quality, material properties, and pavement thickness.
해상공사에서 발생하는 부유사는 해수의 탁도를 증가시키고 광량을 감소시켜 해양생물에 악영향을 미치므로 해양환경영향평 가에서 중요한 요소이다. 하지만 평가에 적용되는 인자에 대한 공식적인 자료의 부족과 평가자의 능력에 따라 그 영향이 달리 평가되고 있다. 따라서 본 연구에서는 해역이용영향평가센터에서 검토한 3년간(2012–2014)의 매립, 준설, 외곽시설물 설치 등 총 58건 사업에 대한 부유사 확산 평가에 대한 실태를 진단하고 개선방안을 제시하였다. 개선방안 제시를 위해 4가지의 평가지표(격자체계의 적정성, 원단위의 적정성, 대표입경 및 침강속도의 적정성)를 적용하였다. 각 항목별 신뢰도에 평균점수 분석결과, 격자체계는 25점, 원단위는 60점, 대표입 경은 34점 그리고 침강속도는 17점으로 평가항목에 대한 개선방안이 필요한 것으로 나타났다. 본 연구에서는 부유사 확산 평가상태에 대 한 진단 및 신뢰도 평가 결과를 활용하여 부유사 확산예측에 대한 개선방안을 제안하였다. 먼저, 부유사 발생원단위 및 대표입경별 침강 속도에 대한 공신력 있는 값이 가이드라인을 통해 제공해야 한다. 그리고 실무에선 신뢰성 향상을 위해 격자체계의 적정성과 결과의 검 증을 철저히 해야 한다.
Predicting remaining useful life (RUL) becomes significant to implement prognostics and health management of industrial systems. The relevant studies have contributed to creating RUL prediction models and validating their acceptable performance; however, they are confined to drive reasonable preventive maintenance strategies derived from and connected with such predictive models. This paper proposes a data-driven preventive maintenance method that predicts RUL of industrial systems and determines the optimal replacement time intervals to lead to cost minimization in preventive maintenance. The proposed method comprises: (1) generating RUL prediction models through learning historical process data by using machine learning techniques including random forest and extreme gradient boosting, and (2) applying the system failure time derived from the RUL prediction models to the Weibull distribution-based minimum-repair block replacement model for finding the cost-optimal block replacement time. The paper includes a case study to demonstrate the feasibility of the proposed method using an open dataset, wherein sensor data are generated and recorded from turbofan engine systems.
PURPOSES : The surface distress of asphalt pavements is one of the major factors affecting the safety of road users. The aim of this study was to analyze the factors influencing the occurrence of surface distress and statistically predict its annual change to contribute to more reasonable asphalt pavement management using the data periodically collected by the national highway pavement data management system.
METHODS : In this study, the factors that were expected to influence the surface distress were determined by reviewing the literature. The normality was secured by changing the forms of the variables to make the distribution of the variables got closer to normal distribution. In addition, min-max normalization was performed to minimize the effect of the unit and magnitude of the candidate independent variables on the dependent variable. The final candidate independent variables were determined by analyzing the correlation between the annual surface distress change and each candidate independent variable. In addition, a prediction model was developed by performing data grouping and multi-regression analysis. RESULTS : An annual surface distress change prediction model was developed using present surface distress, age, and below 0 ℃ days as the independent variables. As a result of sensitivity analysis, the surface distress affected the annual surface distress change the most. The positive correlation between the dependent variable and each independent variable demonstrated engineering and statistical meaningfulness of the prediction model.
CONCLUSIONS : The surface distress in the future can be predicted by applying the annual surface distress prediction model to the national highway asphalt pavement sections with survey data. In addition, the prediction model can be applied to the national highway pavement condition index (NHPCI) evaluating the national highway asphalt pavement conditions to be used in the prediction of future NHPCI.
Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.
The purpose of this study is to compare short-term price predictive power among ARMA ARMAX and VAR forecasting models based on the MDM test using monthly consumer price data of frozen mackerel. This study also aims to help policymakers and economic actors make reasonable choices in the market on monthly consumer price of frozen mackerel. To analyze this study, the frozen wholesale prices and new consumer prices were used as variables while the price time series data were used from December 2013 to July 2021. Through the unit root test, it was confirmed that the time series variables employed in the models were stable while the level variables were used for analysis. As a result of conducting information standards and Granger causality tests, it was found that the wholesale prices and fresh consumer prices from the previous month have affected the frozen consumer prices. Then, the model with the highest predictive power was selected by RMSE, RMSPE, MAE, MAPE, and Theil’s inequality coefficient criteria where the predictive power was compared by the MDM test in order to examine which model is superior. As a result of the analysis, ARMAX(1,1) with the frozen wholesale, ARMAX(1,1) with the fresh consumer model and VAR model were selected. Through the five criteria and MDM tests, the VAR model was selected as the superior model in predicting the monthly consumer price of frozen mackerel.
This article suggests the machine learning model, i.e., classifier, for predicting the production quality of free-machining 303-series stainless steel(STS303) small rolling wire rods according to the operating condition of the manufacturing process. For the development of the classifier, manufacturing data for 37 operating variables were collected from the manufacturing execution system(MES) of Company S, and the 12 types of derived variables were generated based on literature review and interviews with field experts. This research was performed with data preprocessing, exploratory data analysis, feature selection, machine learning modeling, and the evaluation of alternative models. In the preprocessing stage, missing values and outliers are removed, and oversampling using SMOTE(Synthetic oversampling technique) to resolve data imbalance. Features are selected by variable importance of LASSO(Least absolute shrinkage and selection operator) regression, extreme gradient boosting(XGBoost), and random forest models. Finally, logistic regression, support vector machine(SVM), random forest, and XGBoost are developed as a classifier to predict the adequate or defective products with new operating conditions. The optimal hyper-parameters for each model are investigated by the grid search and random search methods based on k-fold cross-validation. As a result of the experiment, XGBoost showed relatively high predictive performance compared to other models with an accuracy of 0.9929, specificity of 0.9372, F1-score of 0.9963, and logarithmic loss of 0.0209. The classifier developed in this study is expected to improve productivity by enabling effective management of the manufacturing process for the STS303 small rolling wire rods.