논문 상세보기

간소화된 디스크립터 기반 머신러닝 모델을 활용한 고독성 PCB 이성질체 독성 예측 KCI 등재

Toxicity Prediction of Hazardous PCB Isomers Using a Machine Learning Model with Simplified Descriptors

  • 언어KOR
  • URLhttps://db.koreascholar.com/Article/Detail/445047
구독 기관 인증 시 무료 이용이 가능합니다. 4,000원
생태와 환경 (Korean Journal of Ecology and Environment)
초록

This study developed a QSAR regression model using the XGBoost machine learning algorithm to predict the acute aquatic toxicity of highly hazardous PCBs. EC50 values for Daphnia magna were obtained from QSAR Toolbox 4.7. Input features consisted of approximately 3,000 molecular descriptors and fingerprints generated from official structure data using RDKit and the Morgan algorithm, excluding mixtures. The dataset was split into training and test sets (7 : 3) based on 500,000 randomized seeds, and the most balanced combination was selected using Kolmogorov-Smirnov and Wilcoxon rank-sum tests. Z-score standardization was applied based on the training set, and the XGBoost model was trained using 5-fold cross-validation with grid search optimization. The final model showed excellent predictive performance (R2 =0.97, RMSE= 0.19). A simplified model using only the top 10 predictive molecular features retained approximately 95% of the original accuracy while improving interpretability and efficiency. The model was applied to 38 PCB compounds lacking EC50 values, and the predicted values showed a statistically similar distribution to the measured group, with only minor differences in a few structural fingerprints. These results demonstrate the applicability of XGBoost-based models for reliable toxicity prediction and offer a promising alternative approach for assessing the environmental risk of untested PCBs.

목차
Abstract
서 론
재료 및 방 법
    1. 데이터 수집 및 전처리
    2. 분자특성 (Descriptor) 산출 및 표준화
    3. 데이터셋 분할 및 분포 검증
    4. XGBoost 기반 EC50 예측모델 구축
    5. 예측 기여도 상위 분자특성의 분포 비교 및통계적 차이 검정
    6. 핵심 분자특성 기반 예측모델 (Feature Selection모델) 재학습
    7. 미측정 화합물 (예측불가 그룹)에 대한 EC50 예측 및신뢰성 평가
결과 및 고 찰
    1. 데이터셋 특성 및 분포
    2. XGBoost 예측모델의 성능 평가
    3. 분자특성 중요도 분석 결과
    4. 핵심 분자특성 기반 예측모델 성능 비교
    5. 예측불가 화합물의 구조적 특성 및 예측 결과 해석
결 론
REFERENCES
저자
  • 김세현(전남대학교 환경에너지공학과) | Sehyeon Kim (Department of Environmental and Energy Engineering, Chonnam National University, Gwangju 61186, Republic of Korea)
  • 박용균(전남대학교 환경에너지공학과) | Yonggyun Park (Department of Environmental and Energy Engineering, Chonnam National University, Gwangju 61186, Republic of Korea)
  • 홍진경(전남대학교 환경에너지공학과) | Jinkyung Hong (Department of Environmental and Energy Engineering, Chonnam National University, Gwangju 61186, Republic of Korea)
  • 이권섭(전남대학교 환경에너지공학과) | Kwonseob Lee (Department of Environmental and Energy Engineering, Chonnam National University, Gwangju 61186, Republic of Korea)
  • 김성준(전남대학교 환경에너지공학과) | Seongjun Kim (Department of Environmental and Energy Engineering, Chonnam National University, Gwangju 61186, Republic of Korea) Corresponding author