This study developed a model to predict employee turnover intention using data from the 2022 Korean Labor & Income Panel Study (KLIPS) with 2471 participants. CopulaGAN and Isolation Forests were employed for data augmentation and variable importance. A logistic regression model using the augmented data achieved an accuracy of 0.80, precision of 0.60, recall of 0.72, and an F1-score of 0.65. Key variables included Job Satisfaction, Wage Satisfaction, Work Hours, Job Stability, and Job-Related Training. The study highlights the potential of these techniques for enhancing turnover prediction and aiding proactive HR strategies.
This study examines career trajectories among women with career breaks, using data from the 2019 National Survey of Women on Career Breaks (n=1,138). The data underwent preprocessing, including outlier detection, feature scaling, and class imbalance correction with SMOTEENN. Three machine learning models were evaluated, with the Random Forest model achieving the best performance. Key predictors included flexible leave policies, social insurance, remote work options, and job security. The findings highlight the importance of supportive organizational policies in retaining female employees. Future research should explore longitudinal impacts and additional variables like organizational culture.
This study investigates using Conditional Tabular Generative Adversarial Networks (CT-GAN) to generate synthetic data for turnover prediction in large employment datasets. The effectiveness of CT-GAN is compared with Adaptive Synthetic Sampling (ADASYN), Synthetic Minority Over-sampling Technique (SMOTE), and Random Oversampling (ROS) using Logistic Regression (LR), Linear Discriminant Analysis (LDA), Random Forest (RF), and Extreme Learning Machines (ELM), evaluated with AUC and F1-scores. Results show that GAN-based techniques, especially CT-GAN, outperform traditional methods in addressing data imbalance, highlighting the need for advanced oversampling methods to improve classification accuracy in imbalanced datasets.
This study integrates TabTransformer and CTGAN for predicting job satisfaction among South Korean college graduates. TabTransformer handles complex tabular data relationships with self-attention, while CTGAN generates high-quality synthetic samples. The combined approach achieves an accuracy of 0.85, precision of 0.83, recall of 0.82, F1-score of 0.82, and an AUC of 0.88. Cross-validation confirms the model's robustness and generalizability with a mean accuracy of 0.85 and a standard deviation of 0.008. The integration of TabTransformer and CTGAN enhances predictive accuracy and model generalizability, providing valuable insights for employment policy and research.
This study explores the use of a Deep Autoencoder model to predict depression among plant and machine operators, utilizing data from the Korean National Health and Nutrition Examination Survey (KNHANES, n=3,852). The Deep Autoencoder model outperformed the Logistic Regression, Naive Bayes, XGBoost, and LightGBM models, achieving an accuracy of 86.5%. Key factors influencing depression included work stress, exposure to hazardous substances, and ergonomic conditions. The findings highlight the potential of the Deep Autoencoder model as a robust tool for early identification and intervention in workplace mental health.
This study examines factors influencing occupational injuries among plant and machine operators using the Semi-supervised MarginBoost algorithm. Data from the 2007-2009 Korean National Health and Nutrition Examination Survey (KNHANES) were analyzed, covering 4,062 employed participants. The MarginBoost model achieved 84.3% accuracy, outperforming other models. Key factors identified included exposure to hazardous substances, ergonomic conditions, and psychosocial stress. The findings emphasize the need for targeted interventions to enhance workplace safety and offer a robust predictive tool for the effective management of occupational health.
This study aims to predict return-to-work outcomes for workers injured in industrial accidents using a TabNet-RUSBoost hybrid model. The study analyzed data from 1,383 workers who had completed recuperation. Key predictors identified include length of recuperation, disability grade, occupation activity, self-efficacy, and socioeconomic status. The model effectively addresses class imbalance and demonstrates superior predictive performance. These findings underscore the importance of a holistic approach, incorporating both medical and psychosocial factors.
Abstract Handling imbalanced datasets in binary classification, especially in employment big data, is challenging. Traditional methods like oversampling and undersampling have limitations. This paper integrates TabNet and Generative Adversarial Networks (GANs) to address class imbalance. The generator creates synthetic samples for the minority class, and the discriminator, using TabNet, ensures authenticity. Evaluations on benchmark datasets show significant improvements in accuracy, precision, recall, and F1-score for the minority class, outperforming traditional methods. This integration offers a robust solution for imbalanced datasets in employment big data, leading to fairer and more effective predictive models.