This study investigates using Conditional Tabular Generative Adversarial Networks (CT-GAN) to generate synthetic data for turnover prediction in large employment datasets. The effectiveness of CT-GAN is compared with Adaptive Synthetic Sampling (ADASYN), Synthetic Minority Over-sampling Technique (SMOTE), and Random Oversampling (ROS) using Logistic Regression (LR), Linear Discriminant Analysis (LDA), Random Forest (RF), and Extreme Learning Machines (ELM), evaluated with AUC and F1-scores. Results show that GAN-based techniques, especially CT-GAN, outperform traditional methods in addressing data imbalance, highlighting the need for advanced oversampling methods to improve classification accuracy in imbalanced datasets.
Abstract Handling imbalanced datasets in binary classification, especially in employment big data, is challenging. Traditional methods like oversampling and undersampling have limitations. This paper integrates TabNet and Generative Adversarial Networks (GANs) to address class imbalance. The generator creates synthetic samples for the minority class, and the discriminator, using TabNet, ensures authenticity. Evaluations on benchmark datasets show significant improvements in accuracy, precision, recall, and F1-score for the minority class, outperforming traditional methods. This integration offers a robust solution for imbalanced datasets in employment big data, leading to fairer and more effective predictive models.