논문 상세보기

콘크리트공학에서 데이터 증강을 위한 대규모 언어 모델: 바이오차 활용 시멘트 대체 머신러닝 연구

Large Language Models for Data Augmentation in Concrete Engineering: A Machine Learning Study on Biochar Cement Replacement

  • 언어ENG
  • URLhttps://db.koreascholar.com/Article/Detail/449580
모든 회원에게 무료로 제공됩니다.
한국도로학회 (Korean Society of Road Engineers)
초록

The application of machine learning in concrete technology has expanded rapidly, yet its reliability is often constrained by limited experimental data, heterogeneous testing conditions, and inconsistencies across published studies. This study investigates the integration of machine learning and synthetic data augmentation to predict the compressive strength of concrete incorporating biochar as a partial replacement for cement. An experimental dataset was compiled from peer-reviewed journal articles indexed in Web of Science, focusing on biochar-modified concrete mixtures. Input variables included cement content, fine and coarse aggregates, biochar dosage, water to binder ratio, superplasticizer content, and curing age, with compressive strength as the target variable. Extreme Gradient Boosting was adopted due to its strong performance on nonlinear tabular data. Model performance was evaluated using the mean absolute error (MAE), mean squared error (MSE), and coefficient of determination (R²), alongside five-fold cross-validation. Hyperparameter optimization was performed using Optuna. To address data scarcity, a synthetic dataset of 1000 samples was generated using ChatGPT. the large language model approach relied solely on natural language prompts. Only feature definitions and the target variable were provided, without exposing the original data or implementing data generation algorithms. Three modeling strategies were examined. First, model trained and tested solely on experimental data achieved a testing R² of approximately 0.91. Second, model trained on synthetic data and evaluated exclusively on experimental data showed reduced generalization, achieving a testing R² of about 0.42, indicating pronounced domain shift effects. Third, synthetic and experimental data were combined through data augmentation and jointly modeled, a testing R² of 0.93 was achieved. The result showed that the use of LLMs for augmentation improved the performance of the model.

저자
  • 풀룬쇼아비데미바시루(강원대학교 건설융합공학과 박사과정) | Folorunsho Abidemi Bashiru
  • 파우델아시쉬(강원대학교 토목건설공학과 석사과정) | Poudel Ashish
  • 아이작파코야(강원대학교 토목건설공학과 석사과정) | Isaac Fakoya
  • 김승원(강원대학교 미래토목건설공학과 부교수) | Kim Seungwon
  • 박철우(강원대학교 미래토목건설공학과 교수) | Park Cheolwoo Corresponding author