A machine learning-based algorithms have used for constructing species distribution models (SDMs), but their performances depend on the selection of backgrounds. This study attempted to develop a noble method for selecting backgrounds in machine-learning SDMs. Two machine-learning based SDMs (MaxEnt, and Random Forest) were employed with an example species (Spodoptera litura), and different background selection methods (random sampling, biased sampling, and ensemble sampling by using CLIMEX) were tested with multiple performance metrics (TSS, Kappa, F1-score). As a result, the model with ensemble sampling predicted the widest occurrence areas with the highest performance, suggesting the potential application of the developed method for enhancing a machine-learning SDM.
This study explored the usefulness and implications of the Bayesian hyperparameter optimization in developing species distribution models (SDMs). A variety of machine learning (ML) algorithms, namely, support vector machine (SVM), random forest (RF), boosted regression tree (BRT), XGBoost (XGB), and Multilayer perceptron (MLP) were used for predicting the occurrence of four benthic macroinvertebrate species. The Bayesian optimization method successfully tuned model hyperparameters, with all ML models resulting an area under the curve (AUC) > 0.7. Also, hyperparameter search ranges that generally clustered around the optimal values suggest the efficiency of the Bayesian optimization in finding optimal sets of hyperparameters. Tree based ensemble algorithms (BRT, RF, and XGB) tended to show higher performances than SVM and MLP. Important hyperparameters and optimal values differed by species and ML model, indicating the necessity of hyperparameter tuning for improving individual model performances. The optimization results demonstrate that for all macroinvertebrate species SVM and RF required fewer numbers of trials until obtaining optimal hyperparameter sets, leading to reduced computational cost compared to other ML algorithms. The results of this study suggest that the Bayesian optimization is an efficient method for hyperparameter optimization of machine learning algorithms.
효과적인 보호구역의 보전 관리를 위해서는 외래종의 정착 모니터링 및 확산 위험에 대한 저감 노력이 수반되어야 한다. 본 연구는 울진에 위치한 산림유전자원보호구역(2,274ha)에서 조사된 외래식물 분포 정보를 대상으로 활용도가 높은 세가지 종분포모형(Bioclim, GLM, MaxEnt)을 활용하여 외래식물의 잠재출현지역을 모의하였고, 모의 결과를 비교하여 지역적 지리 및 생태 관리 특성이 반영된 현실성 및 적합성 높은 종분포모형을 선발하였다. 분석에서 예측된 외래식물의 출현지역은 실제 분포와 같이 도로 같은 선형 경관 요소를 따라 분포하는 경향이었으며, 일부 벌채지가 포함되었다. 본 연구에서 적용한 각 모형의 예측력과 정확도를 통계적으로 비교한 결과, GLM과 MaxEnt 모형은 대체로 높은 예측력과 정확도를 보였지만, Bioclim 모형은 낮았다. Bioclim은 가장 넓은 면적을 출현예상지역으로 계산하였고, GLM, 그리고 MaxEnt 순으로 면적이 작았다. 모의 결과의 현상학적 검토에서는 GLM과 Bioclim 모형은 표본 수에 따라 예측력이 크게 영향을 받는 것으로 나타났고, 표본 수와 관계없이 가장 일관성 높은 모형은 MaxEnt로 평가되었다. 종합적으로, 본 연구에 사용된 모형 중 외래식물 분포 예측을 위한 최적 모형은 MaxEnt 모형인 것으로 판단되었다. 본 연구에서 제시한 정밀 생물종 분포 자료 기반의 모델 선발 접근 방식은 산림생태계 보호구역의 보전 관리 및 지역 특성이 반영된 현실적이고 정교한 모델 발굴 연구에 도움이 될 것이다.
Species abundance patterns have been recently one of key issues in ecology regarding determination of the allowable species richness in a determined area (i.e., island biogeography) and elucidation of structural coherence residing in the relationships between species and their corresponding abundance in communities. The topic of relative species abundance or species abundance distribution (SAD) is considered significant in revealing origination of community establishment in theoretical aspect and in presenting ecological states in response to environmental impact in practical aspect. Conventional models used in SADs including geometric series, log series, log normal distribution and broken stick model were introduced along with example cases. Theoretical interpretation of species abundance patterns was additionally outlined covering the neutral model, power law analysis and application of principles in statistical physics. The future of SADs and the species‐abundance related topics was discussed regarding community organization mechanism and ecological monitoring in response to disturbances.