We aimed to evaluate the effectiveness of ensemble optimal interpolation (EnOI) in improving the analysis of significant wave height (SWH) within wave models using satellite-derived SWH data. Satellite observations revealed higher SWH in mid-latitude regions (30o to 60o in both hemispheres) due to stronger winds, whereas equatorial and coastal areas exhibited lower wave heights, attributed to calmer winds and land interactions. Root mean square error (RMSE) analysis of the control experiment without data assimilation revealed significant discrepancies in high-latitude areas, underscoring the need for enhanced analysis techniques. Data assimilation experiments demonstrated substantial RMSE reductions, particularly in high-latitude regions, underscoring the effectiveness of the technique in enhancing the quality of analysis fields. Sensitivity experiments with varying ensemble sizes showed modest global improvements in analysis fields with larger ensembles. Sensitivity experiments based on different decorrelation length scales demonstrated significant RMSE improvements at larger scales, particularly in the Southern Ocean and Northwest Pacific. However, some areas exhibited slight RMSE increases, suggesting the need for region-specific tuning of assimilation parameters. Reducing the observation error covariance improved analysis quality in certain regions, including the equator, but generally degraded it in others. Rescaling background error covariance (BEC) resulted in overall improvements in analysis fields, though sensitivity to regional variability persisted. These findings underscore the importance of data assimilation, parameter tuning, and BEC rescaling in enhancing the quality and reliability of wave analysis fields, emphasizing the necessity of region-specific adjustments to optimize assimilation performance. These insights are valuable for understanding ocean dynamics, improving navigation, and supporting coastal management practices.
In this study, the magnetocaloric effect and transition temperature of bulk metallic glass, an amorphous material, were predicted through machine learning based on the composition features. From the Python module ‘Matminer’, 174 compositional features were obtained, and prediction performance was compared while reducing the composition features to prevent overfitting. After optimization using RandomForest, an ensemble model, changes in prediction performance were analyzed according to the number of compositional features. The R2 score was used as a performance metric in the regression prediction, and the best prediction performance was found using only 90 features predicting transition temperature, and 20 features predicting magnetocaloric effects. The most important feature when predicting magnetocaloric effects was the ‘Fe’ compositional ratio. The feature importance method provided by ‘scikit-learn’ was applied to sort compositional features. The feature importance method was found to be appropriate by comparing the prediction performance of the Fe-contained dataset with the full dataset.
It would be advantageous to grow legume forage crops in order to increase the productivity and sustainability of sloped croplands in Hamkyongbukdo. In particular, the identification of potential cultivation areas for alfalfa in the given region could aid decision-making on policies and management related to forage crop production in the future. This study aimed to analyze the climate suitability of alfalfa in Hamkyongbukdo under current and future climate conditions using the Fuzzy Union model. The climate suitability predicted by the Fuzzy Union model was compared with the actual alfalfa cultivation area in the northern United States. Climate data obtained from 11 global climate models were used as input data for calculation of climate suitability in the study region to examine the uncertainty of projections under future climate conditions. The area where the climate suitability index was greater than a threshold value (22.6) explained about 44% of the variation in actual alfalfa cultivation areas by state in the northern United States. The climatic suitability of alfalfa was projected to decrease in most areas of Hamkyongbukdo under future climate scenarios. The climatic suitability in Onseong and Gyeongwon County was analyzed to be over 88 in the current climate conditions. However, it was projected to decrease by about 66% in the given areas by the 2090s. Our study illustrated that the impact of climate change on suitable cultivation areas was highly variable when different climate data were used as inputs to the Fuzzy Union model. Still, the ensemble of the climate suitability projections for alfalfa was projected to decrease considerably due to summer depression in Hamkyongbukdo. It would be advantageous to predict suitable cultivation areas by adding soil conditions or to predict the climate suitability of other leguminous crops such as hairy vetch, which merits further studies.
A machine learning-based algorithms have used for constructing species distribution models (SDMs), but their performances depend on the selection of backgrounds. This study attempted to develop a noble method for selecting backgrounds in machine-learning SDMs. Two machine-learning based SDMs (MaxEnt, and Random Forest) were employed with an example species (Spodoptera litura), and different background selection methods (random sampling, biased sampling, and ensemble sampling by using CLIMEX) were tested with multiple performance metrics (TSS, Kappa, F1-score). As a result, the model with ensemble sampling predicted the widest occurrence areas with the highest performance, suggesting the potential application of the developed method for enhancing a machine-learning SDM.
본 연구는 남한 지역에서 서식하는 멸종 위기 종 2급인 왕은점표범나비의 기후 변화에 따른 서식지 변화를 분석하고자 한다. 이를 위해 단일모델의 장단점을 보완하기 위해서 생물 보전과 동물 생태학 분야에서 널리 사용 되는 앙상블 모델을 활용하여 기후변화 시나리오 자료를 이용하여 현재와 미래 기후 조건에서의 잠재적 서식지 변화를 평가하였다. 연구 결과에 따르면, 미래에는 왕은점표범나비의 서식지가 줄어들 것으로 예상되며, 이 변화 는 기온과 강수량 모두에 영향을 받을 것으로 나타났다. 특히 강수량의 계절적 변동이 가장 큰 영향을 미칠 것으로 분석되었다. 이러한 결과는 기후 변화로 인한 생물종의 서식 분포의 이해를 향상시켜 멸종 위기 종 관리와 생태계 복원과 같은 다양한 분야에서 생물다양성 증진을 위한 중요한 기초 데이터로 활용될 것으로 기대된다.
고성능 콘크리트(HPC) 압축강도는 추가적인 시멘트질 재료의 사용으로 인해 예측하기 어렵고, 개선된 예측 모델의 개발이 필수적 이다. 따라서, 본 연구의 목적은 배깅과 스태킹을 결합한 앙상블 기법을 사용하여 HPC 압축강도 예측 모델을 개발하는 것이다. 이 논 문의 핵심적 기여는 기존 앙상블 기법인 배깅과 스태킹을 통합하여 새로운 앙상블 기법을 제시하고, 단일 기계학습 모델의 문제점을 해결하여 모델 예측 성능을 높이고자 한다. 단일 기계학습법으로 비선형 회귀분석, 서포트 벡터 머신, 인공신경망, 가우시안 프로세스 회귀를 사용하고, 앙상블 기법으로 배깅, 스태킹을 이용하였다. 결과적으로 본 연구에서 제안된 모델이 단일 기계학습 모델, 배깅 및 스태킹 모델보다 높은 정확도를 보였다. 이는 대표적인 4가지 성능 지표 비교를 통해 확인하였고, 제안된 방법의 유효성을 검증하였다.
The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.
기술 트렌드가 증가함에 따라, 엄청난 양의 데이터가 생성되고 있습니다. 많은 양의 데이터가 소비되는 기술 분야 중 하나는 컴퓨터 비전이다. 인간은 기계와 비교할 때 시각에 영향을 미치는 표정, 조명 또는 시야각과 같은 외부 조건에서도 얼굴이나 사물을 쉽게 감지하고 인식할 수 있다. 그 이유는 그것과 관련된 높은 차원 의 데이터 때문이다. 데이터 차원성은 모든 관측치에서 측정되는 변수의 총 수를 말합니다. 이번 사업은 안 면인식시스템에 적합한 다양한 차원감소 기법을 비교하고 조도가 다양한 안면이미지로 구성된 다양한 데이 터세트로 테스트해 모델의 정확도 향상에 도움이 되는 기법의 앙상블 모델을 제안하고 성능을 측정하는 것 이 목적이다.렉스 배경과 표현. 제안된 앙상블 모델은 주성분 분석(PCA)과 로컬 선형 임베딩(LLE)이라는 두 가지 차원 감소 기술의 혼합에서 벡터를 추출하고, 이를 밀도 높은 컨볼루션 신경망(CNN)을 통해 전달하여 야생 면(LFW) 데이터 세트의 얼굴을 예측한다. 이 모형은 0.95의 검정 정확도와 0.94의 검정 F1 점수로 수행 됩니다. 제안된 시스템은 시스템이 얼굴을 예측할 수 있는 제안된 앙상블 모델과 통합된 웹캠에서 라이브 비 디오 스트림을 캡처하는 플라스크를 사용하여 개발된 웹 앱을 포함한다.
Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.
구조물에 장기적으로 발생하는 노후화를 정량적으로 파악하기 위해 상시진동 데이터를 활용한 일반화된 모니터링 시스템에 관한 연구가 세계적으로 활발히 수행중이다. 본 연구에서는 구조물에서 장기적으로 취득되는 동특성을 앙상블 학습에 활용하여 구조물의 이상을 감지하기 위한 보급형 엣지 컴퓨팅 시스템을 구축하였다. 시스템의 하드웨어는 라즈베리파이와 보급형 가속도계, 기울기센서, GPS RTK 모듈, 로라 모듈로 구성됐다. 실험실 규모의 구조물 모형 진동실험을 통해 동특성을 활용한 앙상블 학습의 구조물 이상 감지를 검증하였으며, 실험을 기반으로 한 실시간 동특성 추출 분산처리 알고리즘을 라즈베리파이에 탑재하였다. 구축된 시스템을 하우징하고 포항시 행정복지센터에 설치하여 데이터를 취득함으로써 개발된 시스템의 현장 적용성을 검증하였다.
The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.
The sunspot area is a critical physical quantity for assessing the solar activity level; forecasts of the sunspot area are of great importance for studies of the solar activity and space weather. We developed an innovative hybrid model prediction method by integrating the complementary ensemble empirical mode decomposition (CEEMD) and extreme learning machine (ELM). The time series is first decomposed into intrinsic mode functions (IMFs) with different frequencies by CEEMD; these IMFs can be divided into three groups, a high-frequency group, a low-frequency group, and a trend group. The ELM forecasting models are established to forecast the three groups separately. The final forecast results are obtained by summing up the forecast values of each group. The proposed hybrid model is applied to the smoothed monthly mean sunspot area archived at NASA's Marshall Space Flight Center (MSFC). We find a mean absolute percentage error (MAPE) and a root mean square error (RMSE) of 1.80% and 9.75, respectively, which indicates that: (1) for the CEEMD-ELM model, the predicted sunspot area is in good agreement with the observed one; (2) the proposed model outperforms previous approaches in terms of prediction accuracy and operational efficiency.