This study investigates using Conditional Tabular Generative Adversarial Networks (CT-GAN) to generate synthetic data for turnover prediction in large employment datasets. The effectiveness of CT-GAN is compared with Adaptive Synthetic Sampling (ADASYN), Synthetic Minority Over-sampling Technique (SMOTE), and Random Oversampling (ROS) using Logistic Regression (LR), Linear Discriminant Analysis (LDA), Random Forest (RF), and Extreme Learning Machines (ELM), evaluated with AUC and F1-scores. Results show that GAN-based techniques, especially CT-GAN, outperform traditional methods in addressing data imbalance, highlighting the need for advanced oversampling methods to improve classification accuracy in imbalanced datasets.
PURPOSES : The purpose of this study was to evaluate the common performance of asphalt pavements, determine the timing of preventive maintenance, and determine the optimal timing of application of the preventive maintenance methods by analyzing PMS data. METHODS : Using PMS data on asphalt pavement performance on highways, we derived the major damage factors and evaluated them according to the public period and traffic level. Among the factors evaluated, we determined those that could be improved by preventive maintenance, calculated the amount of change annually, and derived the timing of the application of the preventive maintenance method through correlation analysis. RESULTS : Among highway PMS data factors, crack variation was found to affect preventive maintenance, which increased rapidly after five years of performance. Traffic analysis showed that changes increased rapidly in the fifth, sixth, and seventh years when AADT exceeded 20,000, exceeded 10,000, and was under 10,000, respectively. Analysis of the amount of crack variation according to the pavement type showed that crack variation increased rapidly in the overlay section compared to the general AP section. CONCLUSIONS : Crack variation is the performance factor that was expected to be effective in preventive maintenance, and the PMS data showed that the initial application time of the preventive maintenance method varied by one year, depending on the traffic volume.
The metal bush assembling process is a process of inserting and compressing a metal bush that serves to reduce the occurrence of noise and stable compression in the rotating section. In the metal bush assembly process, the head diameter defect and placement defect of the metal bush occur due to metal bush omission, non-pressing, and poor press-fitting. Among these causes of defects, it is intended to prevent defects due to omission of the metal bush by using signals from sensors attached to the facility. In particular, a metal bush omission is predicted through various data mining techniques using left load cell value, right load cell value, current, and voltage as independent variables. In the case of metal bush omission defect, it is difficult to get defect data, resulting in data imbalance. Data imbalance refers to a case where there is a large difference in the number of data belonging to each class, which can be a problem when performing classification prediction. In order to solve the problem caused by data imbalance, oversampling and composite sampling techniques were applied in this study. In addition, simulated annealing was applied for optimization of parameters related to sampling and hyper-parameters of data mining techniques used for bush omission prediction. In this study, the metal bush omission was predicted using the actual data of M manufacturing company, and the classification performance was examined. All applied techniques showed excellent results, and in particular, the proposed methods, the method of mixing Random Forest and SA, and the method of mixing MLP and SA, showed better results.
Background: While efforts have been made to differentiate fall risk in older adults using wearable devices and clinical methodologies, technologies are still infancy. We applied a decision tree (DT) algorithm using inertial measurement unit (IMU) sensor data and clinical measurements to generate high performance classification models of fall risk of older adults.
Objects: This study aims to develop a classification model of fall risk using IMU data and clinical measurements in older adults.
Methods: Twenty-six older adults were assessed and categorized into high and low fall risk groups. IMU sensor data were obtained while walking from each group, and features were extracted to be used for a DT algorithm with the Gini index (DT1) and the Entropy index (DT2), which generated classification models to differentiate high and low fall risk groups. Model’s performance was compared and presented with accuracy, sensitivity, and specificity.
Results: Accuracy, sensitivity and specificity were 77.8%, 80.0%, and 66.7%, respectively, for DT1; and 72.2%, 91.7%, and 33.3%, respectively, for DT2.
Conclusion: Our results suggest that the fall risk classification using IMU sensor data obtained during gait has potentials to be developed for practical use. Different machine learning techniques involving larger data set should be warranted for future research and development.
오늘날 디지털 경제로의 전환 속에서 데이터는 사업자의 경쟁력을 결정하는 핵심적인 투입요소 로 평가되고 있다. 데이터 기반 사업모델은 구글, 애플, 페이스북, 아마존 등 거대 온라인 플랫폼의 성공을 배경으로 디지털 경제에서 큰 비중을 차 지하고 있다. ‘데이터 경제’ 혹은 ‘데이터 주도 경 제’라는 표현은 이러한 디지털 경제의 특징을 잘 나타낸다. 그러나데이터 활용의긍정적인 측면의 이면에서는 데이터를 이용한 진입장벽의 구축과 같은 부정적인 효과도 함께 논의되고 있다. 과거 에는 플랫폼 사업자가 이용자의 데이터를 대량으 로 수집하여 상업적으로 활용하는 것이 경쟁법의 문제가 아닌 개인정보 보호법의 문제로 다루어졌 으나, 근래에는 데이터 집중에 관한 논의가 개인 정보 보호 정책상 의제에 그치지 않고 각국 경쟁 당국의 관심을 끌고 있다. 대표적으로 독일 연방 카르텔청은 2019. 2. 6. 페이스북이 이용자의 개 인정보를 포함한 데이터를 방대하게 수집하여 맞 춤형 광고에 이용한 행위가 시장지배적 지위 남 용행위에 해당한다고 결정하였고, 독일 연방최고 법원도 2020. 6. 23. 페이스북의 데이터 수집⋅ 이용행위가 착취 남용에 해당한다고 판단하였다. 우리나라 공정거래위원회도 데이터 주도 경제의 특수성을 법집행에 반영하기 위하여 심사기준을 제⋅개정하는 등 노력해왔고, 2020. 12. 28. 배 달의 민족과 요기요 간 기업결합 사건에서는 데 이터 집중으로 인한 경쟁제한 우려를 해소하기 위해 요기요 지분의 매각을 명하면서 매각이 완 료될 때까지 요기요의 데이터를 공유하지 못하도 록 조치하기도 하였다. 이처럼 디지털 경제가 가 속화되면서 과거에는 경쟁법 적용 대상으로 여겨 지지 않았던 데이터 집중 내지 독점 현상이 경쟁 법적 관점에서 문제가 되고 있는바, 시장의 변화 를 예의주시하고 해외 경쟁당국들의 데이터 독점 에 대한 반독점 규제 동향을 참고하여 데이터 관 련 경쟁제한 우려에 대응해 나갈 필요가 있다.
PURPOSES : This study introduces the use of instantaneous speed collected by in-vehicle sensing devices for space mean speed and time mean speed estimation.
METHODS : Using probe vehicles’ instantaneous speed data and GPS location at each designated time interval enables analysts to calculate the link-based space mean speed and time mean speed for a certain time-space domain directly. This study proves the equations mathematically and compares the results of time mean speed using Wardrop’s equation and section average speed by trips.
RESULTS : This study introduces the concept of link-based average speed using both time mean and space mean speed definitions. First, an unbiased space mean speed was explained for a roadway section can be calculated from high-resolution vehicle trajectory data without necessarily continuously tracking a vehicle. Second, this study proves that the average section speed by occupancy resulted in identical values to the true space mean speed formulation by Wardrop. In addition, the average section speed by trip was nearly identical to the time mean speed.
CONCLUSIONS : Each section on a signalized intersection corridor yielded various speed distributions whose parameters varied as measurements were taken closer to the intersection proper. By exploring the space mean speed with speed distribution, the intersection impact area can be defined more accurately.
PURPOSES : Despite the availability of larger traffic data and more advanced data collection methods, the problem of missing data is yet to be solved. Imputing missing data to ensure data quality and reliability of statistics has always been challenging. Missing data are imputed via several existing methods, such as autoregressive integrated moving average, exponential smoothing, and interpolation. However, these methods are complicated and results in significant errors.
METHODS : A deep-learning method was applied in this study to impute traffic volume data of the South Korean national highway. Traffic data were trained using the long short-term memory method, which is a suitable deep-learning method for time series analysis.
RESULTS : Three cases were proposed to estimate the traffic volume. In the first case, which represented the general conditions, the mean absolute percentage error (MAPE) was 12.7%. The second estimation case, which was based on the opposite traffic flow, exhibited a MAPE of 17%~18%. The third case, which was estimated using adjacent-section data, had a MAPE of 18.2%. CONCLUSIONS : Deep learning may be a suitable alternative data imputation method based on the limited site and data. However, its application depends on the specific situation. Furthermore, deep-learning models can be improved using an ensemble method, batch-size, or through model-structure optimization.
The fourth industrial revolution encourages manufacturing industry to pursue a new paradigm shift to meet customers' diverse demands by managing the production process efficiently. However, it is not easy to manage efficiently a variety of tasks of all the processes including materials management, production management, process control, sales management, and inventory management. Especially, to set up an efficient production schedule and maintain appropriate inventory is crucial for tailored response to customers' needs. This paper deals with the optimized inventory policy in a steel company that produces granule products under supply contracts of three targeted on-time delivery rates. For efficient inventory management, products are classified into three groups A, B and C, and three differentiated production cycles and safety factors are assumed for the targeted on-time delivery rates of the groups. To derive the optimized inventory policy, we experimented eight cases of combined safety stock and data analysis methods in terms of key performance metrics such as mean inventory level and sold-out rate. Through simulation experiments based on real data we find that the proposed optimized inventory policy reduces inventory level by about 9%, and increases surplus production capacity rate, which is usually used for the production of products in Group C, from 43.4% to 46.3%, compared with the existing inventory policy.
After decades of vigorous development, data mining technology has achieved fruitful theoretical and application results. As a highly applicable subject, data mining technology has penetrated into various fields of the national economy, and has aroused great attention from academia and industry. A large amount of chart data is stored in the electronic chart database, and its application is very extensive, providing a valuable decision basis for managers in all walks of life. It is of great significance to establish a complete data management mechanism based on data mining technology. The traditional data analogy extraction technology, because of the data association index and the poor ability of data association, leads to the difference between the extraction data and the target data. Therefore, the application of data mining technology on electronic chart data management is studied. Data mining technology uses rough set to obtain the basic information of electronic chart data management according to similarity function, mining electronic chart data management association rules; through the comprehensive evaluanon data system of electronic chart data management, building rule base, setting up the evaluation index of electronic chart data management, achieving the similarity evaluation of the mining results. Experimental test results: compared with the traditional data analogy extraction technology, the results obtained by data mining technology have higher similarity with the target data and meet the requirements of electronic chart data management acquisition. It can be seen that this technology is more suitable for the application of electronic chart data management
Big data analysis methods are useful tools for sorting valuable data and products. Achyranthes Radix root extract (AR) is a well-known herbal medicine in East Asia due to its anti-osteoarthritis, pro-circulatory, and anti-osteoporosis effects. In this stud y, we investigated the liver- and kidney-protective effects of AR by applying big data analysis to traditional medicine. CDDP (cis-diamminedichloridoplatinum) is an effective cancer cell anti-proliferative agent used in the treatment of diverse types of tumors. However, it is clinically limited due to liver and kidney toxicity. The current study was designed to assess the potential protective effects of AR against CDDP-induced hepato-renal toxicity. For this purpose, male Sprague-Dawley (SD) rats were assigned to four groups, each consisting of four animals. Intravenous injection or oral administration of either saline or AR was performed daily for 14 days, whereas CDDP was injected intraperitoneally on day 3 following AR treatment. Serum biochemistry results revealed that CDDP induced clear hepatic and renal damage while the AR treatment groups showed less damage relative to controls. Next, we tested the pharmacokinetics of AR using 20-hydroxyecdysone (20-HE), which is the most abundant component of AR extract. After intravenous administration of AR, the plasma concentration of 20-HE rapidly declined with a terminal half-life (t1/2) of 0.99±0.47 h. The area under the plasma concentration vs. time curve was 24.96±3.5 h*ng/mL. The present study provides valuable tools for further verification studies of the classical herbal literature and its scientific relevance.
Lately, there have been tremendous shifts in the business technology landscape. Advances in cloud technology and mobile applications have enabled businesses and IT users to interact in entirely new ways. One of the most rapidly growing technologies in this sphere is business intelligence, and associated concepts such as big data and data mining. BI is the collection of systems and products that have been implemented in various business practices, but not the information derived from the systems and products. On the other hand, big data has come to mean various things to different people. When comparing big data vs business intelligence, some people use the term big data when referring to the size of data, while others use the term in reference to specific approaches to analytics. As the volume of data grows, businesses will also ask more questions to better understand the data analytics process. As a result, the analysis team will have to keep up with the rising demands on the infrastructure that supports analytics applications brought by these additional requirements. It’s also a good way to ascertain if we have built a valuable analysis system. Thus, Business Intelligence and Big Data technology can be adapted to the business’ changing requirements, if they prove to be highly valuable to business environment.