Galaxy evolution studies require the measurement of the physical properties of galaxies at different redshifts. In this work, we build supervised machine learning models to predict the redshift and physical properties (gas-phase metallicity, stellar mass, and star formation rate) of star-forming galaxies from the broad-band and medium-band photometry covering optical to near-infrared wavelengths, and present an evaluation of the model performance. Using 55 magnitudes and colors as input features, the optimized model can predict the galaxy redshift with an accuracy of σ(Δz/1+z) = 0.008 for a redshift range of z < 0.4. The gas-phase metallicity [12 + log(O/H)], stellar mass [log(Mstar)], and star formation rate [log(SFR)] can be predicted with the accuracies of σNMAD = 0.081, 0.068, and 0.19 dex, respectively. When magnitude errors are included, the scatter in the predicted values increases, and the range of predicted values decreases, leading to biased predictions. Near-infrared magnitudes and colors (H, K, and H −K), along with optical colors in the blue wavelengths (m425–m450), are found to play important roles in the parameter prediction. Additionally, the number of input features is critical for ensuring good performance of the machine learning model. These results align with the underlying scaling relations between physical parameters for star-forming galaxies, demonstrating the potential of using medium-band surveys to study galaxy scaling relations with large sample of galaxies.
작물 증발산량은 잠재 증발산량에서 작물계수를 곱하여 작 물의 요수량을 산출할 수 있어 수자원 관리에 널리 사용되는 방법이다. 특히 유엔식량농업기구(FAO)가 관개 및 배수 논 문 NO.56에서 발표한 Penman-Monteith 방정식(FAO 56-PM) 은 잠재 증발산량을 추정하는 표준방법으로, 평균온도, 최대 온도, 최소온도, 상대습도, 풍속 및 일사량의 6가지 기상 데이 터가 필요하다. 그러나 농경지 인근에 설치된 기상센서는 설 치 및 유지보수 비용이 높아 결측, 이상치와 같은 데이터 신뢰 성 문제를 야기하여 정확한 증발산량 계산을 복잡하게 만든 다. 본 연구에서는 인근 기상청의 데이터를 사용하여 필요한 6가지 기상 변수를 예측함으로써 기상 센서 없이 작물 증발산량을 추정할 수 있는지 조사하였다. 우리는 기상청의 API를 통해 수집할 수 있는 22개의 기상 변수를 입력 데이터로 활용 했다. 9개의 회귀 모델을 학습한 후 성능에 따라 상위 3개를 선 택하고 하이퍼파라미터 튜닝을 적용하여 최적의 모델을 식별 했다. 가장 좋은 성능을 보인 모델은 Extreme Gradient Boosting Regression(XGBR)이었으며 평균온도, 최대온도, 최소온도, 상대습도, 풍속 및 일사량에서 결정계수(R2)가 각 0.98, 0.99, 0.99, 0.91, 0.72, 0.86로 높은 결과를 얻을 수 있었다. 이러한 결과는 XGBR 모델이 작물 기상 데이터를 사용하여 작물 증 발산 모델에 필요한 입력 값을 정확하게 예측할 수 있어 값비 싼 기상 센서가 필요 없음을 시사한다. 이 접근 방식은 센서 설 치 및 유지보수가 어려운 지역에서 특히 유용할 수 있으며, 직 접적인 센서 데이터 없이도 표준 증발산 모델의 사용을 가능 하게 한다.
본 연구는 표현 형질 생육 데이터인 엽장, 엽 수와 기상 데이 터인 생육도일을 활용하여 여러 기계 학습을 통해 마늘의 생 체중을 예측하는 모델을 개발하고자 하였다. 검증 데이터에 서 random forest 모델의 결정계수가 0.924, 평균제곱근오차 (g)는 13.583 그리고 평균절대오차는 8.885로 가장 우수하였 다. 평가 데이터에서는 Catboost 모델이 결정계수가 0.928, 평균제곱근오차(g)는 13.486 그리고 평균절대오차는 9.181 로 가장 우수하였다. 그러나 Catboost, Random forest 그리고 LightGBM 모델을 0.5, 0.3 그리고 0.2 가중치를 두어 학습한 Weighted ensemble 모델이 마늘 생체중 예측의 검증 및 평가 에 있어서 검증 데이터의 결정계수가 0.922, 평균제곱근오차 (g)가 13.752 그리고 평균절대오차는 8.877이었으며 평가 데 이터에서는 결정계수가 0.923, 평균제곱근오차(g)가 13.992 그리고 평균절대오차가 9.437로 두 번째로 우수한 결과를 나 타내었다. 이러한 결과들을 종합적으로 미루어 보았을 때, Weighted ensemble 모델이 모델의 안정성 측면에서 최적의 모델이라고 판단하였다. 따라서 농가들이 표현 형질과 기상 데이터만으로도 기계학습 기법을 통하여 마늘의 생체중 예측 을 통해 작형 모니터링이 가능할 것으로 보이며 추가적으로 다년도 데이터 취득과 검증을 통하여 성능을 고도화가 가능할 것으로 판단된다.
Background: Automated classification systems using Artificial Intelligence (AI) and Machine Learning (ML) can enhance accuracy and efficiency in diagnosing pet skin diseases within veterinary medicine. Objectives: This study created a system that classifies pet skin diseases by evaluating multiple ML models to determine which method is most effective. Design: Comparative experimental study. Methods: Pet skin disease images were obtained from AIHub. Models, including Multi-Layer Perceptron (MLP), Boosted Stacking Ensemble (BSE), H2O AutoML, Random Forest, and Tree-based Pipeline Optimization Tool (TPOT), were trained and their accuracy assessed. Results: The TPOT achieved the highest accuracy (94.50 percent), due to automated pipeline optimization and ensemble learning. H2O AutoML also performed well at 94.25 percent, illustrating the effectiveness of automated selection for intricate imaging tasks. Other models scored lower. Conclusion: These findings highlight the potential of AI-driven solutions for faster and more precise pet skin disease diagnoses. Future investigations should incorporate broader disease varieties, multimodal data, and clinical validations to solidify the practicality of these approaches in veterinary medicine.
This study develops a machine learning-based tool life prediction model using spindle power data collected from real manufacturing environments. The primary objective is to monitor tool wear and predict optimal replacement times, thereby enhancing manufacturing efficiency and product quality in smart factory settings. Accurate tool life prediction is critical for reducing downtime, minimizing costs, and maintaining consistent product standards. Six machine learning models, including Random Forest, Decision Tree, Support Vector Regressor, Linear Regression, XGBoost, and LightGBM, were evaluated for their predictive performance. Among these, the Random Forest Regressor demonstrated the highest accuracy with R2 value of 0.92, making it the most suitable for tool wear prediction. Linear Regression also provided detailed insights into the relationship between tool usage and spindle power, offering a practical alternative for precise predictions in scenarios with consistent data patterns. The results highlight the potential for real-time monitoring and predictive maintenance, significantly reducing downtime, optimizing tool usage, and improving operational efficiency. Challenges such as data variability, real-world noise, and model generalizability across diverse processes remain areas for future exploration. This work contributes to advancing smart manufacturing by integrating data-driven approaches into operational workflows and enabling sustainable, cost-effective production environments.
Bearing-shaft systems are essential components in various automated manufacturing processes, primarily designed for the efficient rotation of a main shaft by a motor. Accurate fault detection is critical for operating manufacturing processes, yet challenges remain in sensor selection and optimization regarding types, locations, and positioning. Sound signals present a viable solution for fault detection, as microphones can capture mechanical sounds from remote locations and have been traditionally employed for monitoring machine health. However, recordings in real industrial environments always contain non-negligible ambient noise, which hampers effective fault detection. Utilizing a high-performance microphone for noise cancellation can be cost-prohibitive and impractical in actual manufacturing sites, therefore to address these challenges, we proposed a convolution neural network-based methodology for fault detection that analyzes the mechanical sounds generated from the bearing-shaft system in the form of Log-mel spectrograms. To mitigate the impact of environmental noise in recordings made with commercial microphones, we also developed a denoising autoencoder that operates without requiring any expert knowledge of the system. The proposed DAE-CNN model demonstrates high performance in fault detection regardless of whether environmental noise is included(98.1%) or not(100%). It indicates that the proposed methodology effectively preserves significant signal features while overcoming the negative influence of ambient noise present in the collected datasets in both fault detection and fault type classification.
This study analyzes the impact of ESG (Environmental, Social, and Governance) activities on Corporate Financial Performance(CFP) using machine learning techniques. To address the linear limitations of traditional multiple regression analysis, the study employs AutoML (Automated Machine Learning) to capture the nonlinear relationships between ESG activities and CFP. The dataset consists of 635 companies listed on KOSPI and KOSDAQ from 2013 to 2021, with Tobin's Q used as the dependent variable representing CFP. The results show that machine learning models outperformed traditional regression models in predicting firm value. In particular, the Extreme Gradient Boosting (XGBoost) model exhibited the best predictive performance. Among ESG activities, the Social (S) indicator had a positive effect on CFP, suggesting that corporate social responsibility enhances corporate reputation and trust, leading to long-term positive outcomes. In contrast, the Environmental (E) and Governance (G) indicators had negative effects in the short term, likely due to factors such as the initial costs associated with environmental investments or governance improvements. Using the SHAP (Shapley Additive exPlanations) technique to evaluate the importance of each variable, it was found that Return on Assets (ROA), firm size (SIZE), and foreign ownership (FOR) were key factors influencing CFP. ROA and foreign ownership had positive effects on firm value, while major shareholder ownership (MASR) showed a negative impact. This study differentiates itself from previous research by analyzing the nonlinear effects of ESG activities on CFP and presents a more accurate and interpretable prediction model by incorporating machine learning and XAI (Explainable AI) techniques.
This study aimed to improve the accuracy of road pavement design by comparing and analyzing various statistical and machine-learning techniques for predicting asphalt layer thickness, focusing on regional roads in Pakistan. The explanatory variables selected for this study included the annual average daily traffic (AADT), subbase thickness, and subgrade California bearing ratio (CBR) values from six cities in Pakistan. The statistical prediction models used were multiple linear regression (MLR), support vector regression (SVR), random forest, and XGBoost. The performance of each model was evaluated using the mean absolute percentage error (MAPE) and root-mean-square error (RMSE). The analysis results indicated that the AADT was the most influential variable affecting the asphalt layer thickness. Among the models, the MLR demonstrated the best predictive performance. While XGBoost had a relatively strong performance among the machine-learning techniques, the traditional statistical model, MLR, still outperformed it in certain regions. This study emphasized the need for customized pavement designs that reflect the traffic and environmental conditions specific to regional roads in Pakistan. This finding suggests that future research should incorporate additional variables and data for a more in-depth analysis.
Aluminum-based composites are in high demand in industrial fields due to their light weight, high electrical conductivity, and corrosion resistance. Due to its unique advantages for composite fabrication, powder metallurgy is a crucial player in meeting this demand. However, the size and weight fraction of the reinforcement significantly influence the components' quality and performance. Understanding the correlation of these variables is crucial for building high-quality components. This study, therefore, investigated the correlations among various parameters—namely, milling time, reinforcement ratio, and size—that affect the composite’s physical and mechanical properties. An artificial neural network model was developed and showed the ability to correlate the processing parameters with the density, hardness, and tensile strength of Al2024-B4C composites. The predicted index of relative importance suggests that the milling time has the most substantial effect on fabricated components. This practical insight can be directly applied in the fabrication of high-quality Al2024-B4C composites.
본 논문에서는 15차 bézier 곡선을 사용하여 기존의 연구보다 더 유연한 빔 형상을 설계하고, 더 넓은 설계 공간에서 최적 설계를 수 행하여 최적의 열전도도를 갖는 빔 형상을 설계한다. 설계 공간이 넓어지면 그 만큼 계산양이 증가하게 되는데, 고차원 변수 공간에서 효율적으로 작동하는 인공신경망을 사용하여 최적 설계를 가속화하여 계산 한계를 극복하였다. 더 나아가 최적의 탄성계수를 갖는 빔의 형상과 비교하였으며 열전도와 탄성학 사이의 수학적 유사성을 이용하여 빔 형상을 설명한다. 본 연구에서는 인공지능을 활용 한 형상 최적설계를 통해 기존의 한계를 뛰어넘는 격자구조의 빔 형상을 제안한다. 먼저, SC(Simple Cubic), BC(Body Centered Cubic) 격자 구조 빔 형상을 bézier 곡선으로 모델링하고 bézier 곡선의 제어점 좌표를 무작위로 설정하여 학습데이터를 확보하였다. NN(Neural Network) 및 GA(Genetic Algorithm)를 통해 우수한 유효 열전도도를 가진 빔 형상을 생성하여 최적의 빔 형상을 설계하였 다. 본 연구를 통해 추후 다양한 열 조건에서 격자구조의 적절한 구조적 해답을 제시할 수 있을 것으로 기대된다.
This study develops a model to determine the input rate of the chemical for coagulation and flocculation process (i.e. coagulant) at industrial water treatment plant, based on real-world data. To detect outliers among the collected data, a two-phase algorithm with standardization transformation and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is applied. In addition, both of the missing data and outliers are revised with linear interpolation. To determine the coagulant rate, various kinds of machine learning models are tested as well as linear regression. Among them, the random forest model with min-max scaled data provides the best performance, whose MSE, MAPE, R2 and CVRMSE are 1.136, 0.111, 0.912, and 18.704, respectively. This study demonstrates the practical applicability of machine learning based chemical input decision model, which can lead to a smart management and response systems for clean and safe water treatment plant.
Many school buildings are vulnerable to earthquakes because they were built before mandatory seismic design was applied. This study uses machine learning to develop an algorithm that rapidly constructs an optimal reinforcement scheme with simple information for non-ductile reinforced concrete school buildings built according to standard design drawings in the 1980s. We utilize a decision tree (DT) model that can conservatively predict the failure type of reinforced concrete columns through machine learning that rapidly determines the failure type of reinforced concrete columns with simple information, and through this, a methodology is developed to construct an optimal reinforcement scheme for the confinement ratio (CR) for ductility enhancement and the stiffness ratio (SR) for stiffness enhancement. By examining the failure types of columns according to changes in confinement ratio and stiffness ratio, we propose a retrofit scheme for school buildings with masonry walls and present the maximum applicable stiffness ratio and the allowable range of stiffness ratio increase for the minimum and maximum values of confinement ratio. This retrofit scheme construction methodology allows for faster construction than existing analysis methods.
Dynamic responses of nuclear power plant structure subjected to earthquake loads should be carefully investigated for safety. Because nuclear power plant structure are usually constructed by material of reinforced concrete, the aging deterioration of R.C. have no small effect on structural behavior of nuclear power plant structure. Therefore, aging deterioration of R.C. nuclear power plant structure should be considered for exact prediction of seismic responses of the structure. In this study, a machine learning model for seismic response prediction of nuclear power plant structure was developed by considering aging deterioration. The OPR-1000 was selected as an example structure for numerical simulation. The OPR-1000 was originally designated as the Korean Standard Nuclear Power Plant (KSNP), and was re-designated as the OPR-1000 in 2005 for foreign sales. 500 artificial ground motions were generated based on site characteristics of Korea. Elastic modulus, damping ratio, poisson’s ratio and density were selected to consider material property variation due to aging deterioration. Six machine learning algorithms such as, Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN), eXtreme Gradient Boosting (XGBoost), were used t o construct seispic response prediction model. 13 intensity measures and 4 material properties were used input parameters of the training database. Performance evaluation was performed using metrics like root mean square error, mean square error, mean absolute error, and coefficient of determination. The optimization of hyperparameters was achieved through k-fold cross-validation and grid search techniques. The analysis results show that neural networks present good prediction performance considering aging deterioration.
본 연구는 성격이 정신병리를 예측하는 가를 지도식 기계학습 방법론을 통해 확인해보고자 하였다. 이를 위해, 한국판 싱어루미스 심리 유형 검사(K-SLTDI) 제 2판과, KSCL-95 검사를 사용하여 전국의 총 521명의 성인을 대상으로 비대면 설문조사를 실시하였다. 예측 분석을 위하여 군집분석, 분류분석, 회귀기반 디코딩을 수행하였다. 그 결과 정신병리의 심 각도를 반영하는 4개의 군집을 확인하였다. 또한, 한국판 싱어루미스 심리 유형 검사로 정신병리 수준에 대한 가설 기반 및 데이터 기반 심각도가 반영된 군집을 예측할 수 있었으며, 이는 전체 KSCL-95 및 3개의 상위 범주, 그리고 타당도에 대해 모두 정확하게 분류되었다. 회귀기반 디코딩 결과는 SLTDI 유형검사는 전체 검사 데이터를 활용하였을 때 임상수준 을 유의미하게 예측할 수 있었으며, KSCL-95의 22가지 하위 범주 중 긍정왜곡, 우울, 불안, 강박, PTSD, 정신증, 스트레 스 취약성, 대인민감, 낮은 조절을 유의수준에서 개별적으로 예측하였다. 이러한 연구 결과는 성격 검사가 정신병리의 심 각도에 대한 선별 도구로 활용될 수 있고 예방 및 조기 개입 전략을 구현하는 데 활용될 수 있음을 시사한다.
최근 늘어나고 있는 이상 기상 현상으로 산사태 위험이 점차 증가하고 있다. 산사태는 막대한 인명 피해와 재산 피해를 초래할 수 있기에 이러한 위험을 사전에 평가함은 매우 중요하다. 최근 기술 발전으로 인해 능동형 원격탐사 방법을 사용하여 더 정확하고 상세한 지표 변위 및 강수 데이터를 얻을 수 있게 되었다. 그러나 이러한 데이터를 활용하여 산사태 예측 모델을 개발하는 연구는 찾기 힘들다. 따라서 본 연구에서는 합성개구레이더 간섭법(InSAR)을 사용한 지표 변위 자료와 하이브리드 고도면 강우(HSR) 추정 기법을 통한 강수 정보를 활용하여 산사태 민감도를 예측하는 기계학습 모델을 제시하고 있다. 나아가 기계학습의 블랙박스 문제를 극복할 수 있는 해석가능한 기계학습 방법인 SHAP을 이용하여 산사태 민감도의 영향 변수에 대한 중요도를 체계적으로 평가하였다. 경상북도 울진군을 대상으로 사례 연구를 수행한 결과, XGBoost가 가장 좋은 예측 성능을 보이며, 도로로부터의 거리, 지표 고도, 일 최대 강우 강도, 48시간 선행 누적 강우량, 사면 경사, 지형습윤지수, 단층으로 부터의 거리, 경사도, 지표 변위, 하천으로부터의 거리가 산사태 예측에 영향을 미치는 주요 변수로 밝혀졌다. 특히, 능동형 원격탐사를 통해 얻은 자료인 강우 강도와 지표 변위의 절댓값이 높을수록 산사태 발생 확률이 높음을 확인하였다. 본 연구는 능동형 원격탐사 자료의 산사태 민감도 연구에서의 활용 가능성을 실증적으로 보여주고 있으며, 해당 자료를 바탕으로 시공간적 으로 변하는 산사태 민감도를 도출함으로써 향후 산사태 민감도 모니터링에 효과적으로 활용될 수 있을 것으로 기대된다.
본 연구는 돼지 간 거리(PD), 돈사 내 상대 습도(RRH), 돈사 내 이산화탄소(RCO2) 세 가지 변수를 사용하여, 네 개의 데이터 세트를 구성하고, 이를 다중 선형 회귀(MLR), 서포트 벡터 회귀(SVR) 및 랜덤 포레스트 회귀(RFR) 세 가지 모델 기계학습(ML)에 적용하여, 돈사 내 온도(RT)를 예측하고자 한다. 2022년 10월 5일부터 11월 19일까지 실험을 진행하였다. Hik-vision 2D카메라를 사용하여, 돈사 내 영상을 기록하였다. 이후 ArcMap 프로그램을 사용하여, 돈사 내 영상에서 추출한 이미지 안 돼지의 PD를 계산하였다. 축산환경관리시스템(LEMS) 센서를 사용하여, RT, RRH 및 RCO2를 측정하였다. 연구 결과 각 변수 간 상관분석 시 RT와 PD 간의 강한 양의 상관관계가 나타났다(r > 0.75). 네 가지 데이터 세트 중 데이터 세트 3을 사용한 ML 모델이 높은 정확도가 나타났으며, 세 가지 회귀 모델 중에서 RFR 모델이 가장 우수한 성능을 보였다.
Existing reinforced concrete (RC) building frames constructed before the seismic design was applied have seismically deficient structural details, and buildings with such structural details show brittle behavior that is destroyed early due to low shear performance. Various reinforcement systems, such as fiber-reinforced polymer (FRP) jacketing systems, are being studied to reinforce the seismically deficient RC frames. Due to the step-by-step modeling and interpretation process, existing seismic performance assessment and reinforcement design of buildings consume an enormous amount of workforce and time. Various machine learning (ML) models were developed using input and output datasets for seismic loads and reinforcement details built through the finite element (FE) model developed in previous studies to overcome these shortcomings. To assess the performance of the seismic performance prediction models developed in this study, the mean squared error (MSE), R-square (R2), and residual of each model were compared. Overall, the applied ML was found to rapidly and effectively predict the seismic performance of buildings according to changes in load and reinforcement details without overfitting. In addition, the best-fit model for each seismic performance class was selected by analyzing the performance by class of the ML models.
New motor development requires high-speed load testing using dynamo equipment to calculate the efficiency of the motor. Abnormal noise and vibration may occur in the test equipment rotating at high speed due to misalignment of the connecting shaft or looseness of the fixation, which may lead to safety accidents. In this study, three single-axis vibration sensors for X, Y, and Z axes were attached on the surface of the test motor to measure the vibration value of vibration. Analog data collected from these sensors was used in classification models for anomaly detection. Since the classification accuracy was around only 93%, commonly used hyperparameter optimization techniques such as Grid search, Random search, and Bayesian Optimization were applied to increase accuracy. In addition, Response Surface Method based on Design of Experiment was also used for hyperparameter optimization. However, it was found that there were limits to improving accuracy with these methods. The reason is that the sampling data from an analog signal does not reflect the patterns hidden in the signal. Therefore, in order to find pattern information of the sampling data, we obtained descriptive statistics such as mean, variance, skewness, kurtosis, and percentiles of the analog data, and applied them to the classification models. Classification models using descriptive statistics showed excellent performance improvement. The developed model can be used as a monitoring system that detects abnormal conditions of the motor test.