Correlation analysis between biological and physicochemical indicators using an unsupervised learning approach for an integrated understanding of freshwater ecosystems
Freshwater ecosystems support biodiversity and provide essential ecosystem services. In Korea, the Water Environment Information System monitors these ecosystems using separate biological and physicochemical indicators. Complex interactions occur among diverse biological taxa and physicochemical conditions. Thus, integrating heterogeneous monitoring data is crucial for accurately assessing ecosystem health. However, differences in data characteristics between the indicators present significant integration challenges. Given the scale and heterogeneity of the monitoring data, advanced analytical techniques are necessary to detect interactions among variables. This study aimed to identify key correlations among biological and physicochemical indicators by clustering similar variables and removing noise using the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm, followed by Spearman’s rank correlation coefficient and maximal information coefficient (MIC) analyses. HDBSCAN effectively eliminated noise indicators and grouped biological and physicochemical indicators into clusters based on shared characteristics, thereby enhancing the interpretability of the correlation analysis. Spearman analysis showed strong associations among biological indicators, particularly among species with similar ecological traits. MIC analysis further detected nonlinear associations between ecological assessment indices and specific biological species, which also reflected similar ecological characteristics. These findings are significant in that the comprehensive analysis of existing monitoring data revealed relationships within biological and physicochemical indicators while preserving the original purpose and function of each monitoring network. This study is expected to serve as a foundational resource for freshwater environmental monitoring and the development of effective management strategies.