PURPOSES : For autonomous vehicles, abnormal situations, such as sudden changes in driving speed and sudden stops, may occur when they leave the operational design domain. This may adversely affect the overall traffic flow by affecting not only autonomous vehicles but also the driving environment of manual vehicles. Therefore, to minimize the traffic problems and adverse effects that may occur in mixed traffic situations involving manual and autonomous vehicles, an autonomous vehicle driving support system based on traffic operation optimization is required. The main purpose of this study was to build a big-data-classification system by specifying data classification to support the self-driving of Lv.4 autonomous vehicles and matching it with spatio-temporal data. METHODS : The research methodology is explained through a review of related literature, and a traffic management index and big-dataclassification system were built. After collecting and mapping the ITS history traffic information data of an actual Living Lab city, the data were classified using the traffic management indexing method. An AI-based model was used to automatically classify traffic management indices for real-time driving support of Lv.4 autonomous vehicles. RESULTS : By evaluating the AI-based model performance using the test data from the Living Lab city, it was confirmed that the data indexing accuracy was more than 98% for the KNN, Random Forest, LightGBM, and CatBoost algorithms, but not for Logistics Regression. The data were severely unbalanced, and it was necessary to classify very low probability nonconformities; therefore, precision is also important. All four algorithms showed similarly good performances in terms of accuracy. CONCLUSIONS : This paper presents a method for efficient data classification by developing a traffic management index to easily fuse and analyze traffic data collected from various institutions and big data collected from autonomous vehicles. Additionally, EdgeRSU is presented to support the driving of Lv.4 autonomous vehicles in mixed autonomous and manual vehicles traffic situations. Finally, a database was established by classifying data automatically indexed through AI-based models to quickly collect and use data in real-time in large quantities.
본 연구는 치유정원 및 치유정원 내 도입 프로그램과 관련된 시기별 이용행태의 변화를 파악하여 프로 그램 및 서비스 제공에 있어 개선하는데 도움이 되는 기초자료를 제공하는 것을 목적으로 한다. 이를 위해 텍스트마이닝 기법을 활용하고 『수목원정원법』시행 및 코로나19 전후를 기점으로 하여 2014 년, 2019년, 2023년 세 가지 시기로 구분하여 시계열적으로 시기별 이용행태 간의 변화를 조사하였다. 연구결과 치유정원과 치유정원 내 도입 프로그램은 이용자들에게 있어 긍정적 경험으로 나타났다. 프 로그램의 경우 초기에는 치유농업 및 원예를 중심으로 시작되었으나 시간이 지남에 따라 산림치유를 비롯하여 가드닝을 포함한 다양한 활동으로 확장되었으며, 이용자 계층 또한 다양한 계층으로 확대되 었다. 아울러 치유정원은 원예치료, 산림치유 등 다양한 자연환경 기반 치유분야의 도입요소로 사용됨 에 따라 혼용되어 사용되고 있는 것으로 나타났다. 따라서 치유정원에 대한 명확한 개념정립과 함께 다양한 계층을 고려한 프로그램이 필요한 것으로 나타났다.
The use of big data needs to be emphasized in policy formulation by public officials in order to improve the transparency of government policies and increase efficiency and reliability of government policies. ‘Hye-Ahn’, a government-wide big data platform was built with this goal, and the subscribers of ‘Hye-Ahn’ has grown significantly from 2,000 at the end of 2016 to 100,000 at August 2018. Additionally, the central and local governments are expanding their big data related budgets. In this study, we derived the costs and benefits of ‘Hye-Ahn’ and used them to conduct an economic feasibility analysis. As a result, even if only some quantitative benefits are considered without qualitative benefits, the net present value, the benefit/ cost, and internal rate of return turned out to be 22,662 million won, 2.3213, and 41.8%, respectively. Since this is larger than the respective comparison criteria of 0 won, 1.0, and 5.0%, it can be seen that ‘Hye-Ahn’ has had economic feasibility. As noticed earlier, the number of analysis using ‘Hye-Ahn’ is increasing, so it is expected that the benefits will increase as time passes. Finally, the socioeconomic value gained when the results of analysis using ‘Hye-Ahn’ are used in policy is expected to be significant.
Abstract Handling imbalanced datasets in binary classification, especially in employment big data, is challenging. Traditional methods like oversampling and undersampling have limitations. This paper integrates TabNet and Generative Adversarial Networks (GANs) to address class imbalance. The generator creates synthetic samples for the minority class, and the discriminator, using TabNet, ensures authenticity. Evaluations on benchmark datasets show significant improvements in accuracy, precision, recall, and F1-score for the minority class, outperforming traditional methods. This integration offers a robust solution for imbalanced datasets in employment big data, leading to fairer and more effective predictive models.
빅데이터 고객차별의 본질적인 문제는 플랫폼에 의한 사용자 정보의 과도한 수집 과 알고리즘을 부당하게 사용하여 소비자에게 차별화된 가격 전략을 구현하는 것이 다. 이것은 소비자의 합법적인 권익 침해 문제와 관련이 있다. 중국 정부는 소비자의 개인 정보 보호와 플랫폼 업체의 차별화된 가격 책정 금지 전략을 중심으로 관련 법 규를 제정했다. 하지만 2024년 5월 현재까지 빅데이터 고객차별에 대한 관련 행정처 벌 사례가 없고 소송으로 소비자가 승소한 사례도 없다. 따라서 현행법은 실제 실천 에 있어 많은 문제점이 있다. 본 연구는 소비자 정보 보호 및 빅데이터 고객차별과 관련된 현행 법률 및 규정의 단점을 분석하고 법적 수준에서 빅데이터 고객차별을 효과적으로 규제하기 위한 관련 정책 방안을 제안하였다.
The objective of this study is to analyze the indoor air quality of multi-use facilities using an IoT-based monitoring and control system. Thise study aims to identify effective management strategies and propose policy improvements. This research focused on 50 multi-use facilities, including daycare centers, medical centers, and libraries. Data on PM10, PM2.5, CO2, temperature, and humidity were collected 24 hours a day from June 2019 to April 2020. The analysis included variations in indoor air quality by season, hour, and day of the week (including both weekdays and weekends). Additionally, ways to utilize IoT monitoring systems using big data were propsed. The reliability analysis of the IoT monitoring network showed an accuracy of 81.0% for PM10 and 76.1% for PM2.5. Indoor air quality varied significantly by season, with higher particulate matter levels in winter and spring, and slightly higher levels on weekends compared to weekdays. There was a positive correlation found between outdoor and indoor pollutant levels. Indoor air quality management in multi-use facilities requires season-specific strategies, particularly during the winter and spring. Furhtermore, enhanced management is necessary during weekends due to higher pollutant levels.
Until now, research on consumers’ purchasing behavior has primarily focused on psychological aspects or depended on consumer surveys. However, there may be a gap between consumers’ self-reported perceptions and their observable actions. In response, this study aimed to investigate consumer purchasing behavior utilizing a big data approach. To this end, this study investigated the purchasing patterns of fashion items, both online and in retail stores, from a data-driven perspective. We also investigated whether individual consumers switched between online websites and retail establishments for making purchases. Data on 516,474 purchases were obtained from fashion companies. We used association rule analysis and K-means clustering to identify purchase patterns that were influenced by customer loyalty. Furthermore, sequential pattern analysis was applied to investigate the usage patterns of online and offline channels by consumers. The results showed that high-loyalty consumers mainly purchased infrequently bought items in the brand line, as well as high-priced items, and that these purchase patterns were similar both online and in stores. In contrast, the low-loyalty group showed different purchasing behaviors for online versus in-store purchases. In physical environments, the low-loyalty consumers tended to purchase less popular or more expensive items from the brand line, whereas in online environments, their purchases centered around items with relatively high sales volumes. Finally, we found that both high and low loyalty groups exclusively used a single preferred channel, either online or in-store. The findings help companies better understand consumer purchase patterns and build future marketing strategies around items with high brand centrality.
PURPOSES : This study aimed to predict the number of future COVID-19 confirmed cases more accurately using public and transportation big data and suggested priorities for introducing major policies by region. METHODS : Prediction analysis was performed using a long short-term memory (LSTM) model with excellent prediction accuracy for time-series data. Random forest (RF) classification analysis was used to derive regional priorities and major influencing factors. RESULTS : Based on the daily number of COVID-19 confirmed cases from January 26 to December 12, 2020, as well as the daily number of confirmed cases in Gyeonggi Province, which was expected to occur on December 24 and 25, depending on social distancing, the accuracy of the LSTM artificial neural network was approximately 95.8%. In addition, as a result of deriving the major influencing factors of COVID-19 through random forest classification analysis, according to the number of people, social distancing stages, and masks worn, Bucheon, Yongin, and Pyeongtaek were identified as regions expected to be at high risk in the future. CONCLUSIONS : The results of this study can help predict pandemics such as COVID-19.
As new AI techniques are developed and various types of big data accumulated, new approaches for pest management are also being attempted. Various spatio-temporal scale big data are being accumulated, and attempts are being made to utilize them to classify target objects and analyze their characteristics. Remote sensing data is widely used across various fields, and is being measured, stored, and shared in diverse formats. Hyperspectral imaging and satellite data are ecologically relevant big data, with distinct formats and potential applications. We will introduce real-world AI examples of utilizing hyperspectral image analysis, as well as estimating pest population density using satellite data.
This study utilizes social big data to investigate the factors influencing the awareness, attitude, and behavior toward vegan fashion consumption among global and Korean consumers. Social media posts containing the keyword “vegan fashion” were gathered, and meaningful discourse patterns were identified using semantic network analysis and sentiment analysis. The study revealed that diverse factors guide the purchase of vegan fashion products within global consumer groups, while among Korean consumers, the predominant discourse involved the concepts of veganism and ethics, indicating a heightened awareness of vegan fashion. The research then delved into the factors underpinning awareness (comprehension of animal exploitation, environmental concerns, and alternative materials), attitudes (both positive and negative), and behaviors (exploration, rejection, advocacy, purchase decisions, recommendations, utilization, and disposal). Global consumers placed great significance on product-related information, whereas Korean consumers prioritized ethical integrity and reasonable pricing. In addition, environmental issues stemming from synthetic fibers emerged as a significant factor influencing the awareness, attitude, and behavior regarding vegan fashion consumption. Further, this study confirmed the potential presence of cultural disparities influencing overall awareness, attitude, and behavior concerning the acceptance of vegan fashion, and offers insights into vegan fashion marketing strategies tailored to specific cultures, aiming to provide vegan fashion companies and brands with a deeper understanding of their consumer base.
본 연구에서는 빅데이터를 통해 교사의 융합교육역량에 대한 사회적 인식을 살펴봄으로써 교사의 융합 교육역량 증진 방안 마련을 위한 기초자료를 제공하는데 목적이 있었다. 본 연구목적을 달성하기 위해 Textom에서 제공하는 빅데이터를 활용하여 교사 + 융합교육 + 역량을 키워드로 rawDATA를 수집하였 다. 수집된 데이터는 1차2차 정제과정을 마친 데이터들 중 빈도분석 결과를 바탕으로 200개 핵심 키워드 를 선정하였으며, 이를 1-모드 매트릭스 데이터 셋으로 변환하여 키워드 네트워크 분석을 실시하였다. 연 구결과는 다음과 같다: 첫째, 빈도분석에서는 교육, 인공지능, 강화, 연수, 수업이 가장 빈번하게 출현하는 것으로 나타났다. 둘째, 전체 네트워크 분석에서는 교육, 학생, 연수, 강화, 대상이 모든 중심성에서 높게 나타났다. 셋째, 에고 네트워크 분석에서는 교사, 융합교육, 역량을 중심으로 다양하게 논의되고 있음을 확 인할 수 있었다. 이러한 결과를 바탕으로 교사의 융합교육역량과 관련된 후속연구 및 증진방안에 대해 제 언하였다.
In this study, we propose a novel approach to analyze big data related to patents in the field of smart factories, utilizing the Latent Dirichlet Allocation (LDA) topic modeling method and the generative artificial intelligence technology, ChatGPT. Our method includes extracting valuable insights from a large data-set of associated patents using LDA to identify latent topics and their corresponding patent documents. Additionally, we validate the suitability of the topics generated using generative AI technology and review the results with domain experts. We also employ the powerful big data analysis tool, KNIME, to preprocess and visualize the patent data, facilitating a better understanding of the global patent landscape and enabling a comparative analysis with the domestic patent environment. In order to explore quantitative and qualitative comparative advantages at this juncture, we have selected six indicators for conducting a quantitative analysis. Consequently, our approach allows us to explore the distinctive characteristics and investment directions of individual countries in the context of research and development and commercialization, based on a global-scale patent analysis in the field of smart factories. We anticipate that our findings, based on the analysis of global patent data in the field of smart factories, will serve as vital guidance for determining individual countries' directions in research and development investment. Furthermore, we propose a novel utilization of GhatGPT as a tool for validating the suitability of selected topics for policy makers who must choose topics across various scientific and technological domains.
본 연구는 독일어권의 사물인터넷을 이용한 데이터 거래와 블록체인 기술로 인한 사회혁신을 조망하고자 한다. 먼저, 독일어권 국가에서의 빅 데이터와 블록체인 기술의 활용을 조사하기 위해 문헌 연구 및 선행 연 구 검토가 수행되었다. 또한, 데이터레이드(Datarade)와 같은 독일의 데 이터 회사 및 정부의 데이터 경제 관련 프로젝트(GAIA-X)에 대한 사례 연구가 진행되었다. 이를 통해 독일에서의 데이터 및 블록체인 활용 현 황을 파악하고, 각 산업 분야에서의 적용 사례를 식별하였다. 금융 산업 에서는 블록체인 기술을 활용하여 계좌 번호 및 구매 세부 정보를 안전 하게 저장하고 있으며, 부동산 산업에서는 임대 계약, 임대료 결제 확인 등을 블록체인을 통해 효율적으로 관리하고 있다. 특히 교육 부문에서 블록체인 기술의 활용에 대한 현지 사례 및 연구 결과를 종합하여 분석 하였다. 블록체인의 보안이라는 장점을 살려 학습자의 학습 성과나 평가, 성적 증명, 학습낙오자나 성적부진자의 학습활동 추적, 부정행위 방지, 스마트 계약을 통한 과제 관리, 평생학습증 및 학습이력부 제공 등의 방 식으로 이미 독일은 교육계에 혁신을 이루어나가고 있다. 교육 부문에서 의 이러한 조사 방법을 통해 독일에서의 기술 혁신 및 사회적 변화에 대 한 종합적인 이해를 제공하고자 한다. 이러한 결과들은 독일정부 주도의 데이터거래와 블록체인 분야의 기술혁신의 효과를 입증하기에 한국정부 의 산업혁신에도 활용할 수 있는 중요한 통찰을 제공할 것이다.
Purpose: Even today, cancer remains a challenge to overcome. The purpose of this study is to understand the current status of lip-oral-pharyngeal cancer in Koreans by identifying the survival rate of lip-oral-pharyngeal cancer in Koreans through long-term big data. Material and Method: This study utilized 2023 KOSIS (Cancer Registration Statistics, Ministry of Health and Welfare) academically. The 5-year relative survival rates of lip-oral-pharyngeal cancer from 1996 to 2020 were compared and analyzed at 5-year intervals. Results: The 5-year relative survival rate for lip-oral-pharyngeal cancer was 47.4% from 1996 to 2000, 54.5% from 2001 to 2005, 61.1% from 2006 to 2010, 65.5% from 2011 to 2015, and 69.7% from 2016 to 2020. From 1996 to 2005, the 5-year relative survival rate for lip-oral-pharyngeal cancer was higher than the 5-year relative survival rate for all cancers. However, in the recent 15 years from 2006 to 2020, the 5-year relative survival rate for lip-oral-pharyngeal cancer was lower than for all cancers. Conclusions: In conclusion, this long-term big data showed that the 5-year relative survival rate of lip-oral-pharyngeal cancer in Koreans has increased further in modern times. However, in order to increase the overall survival rate of all human cancers, continuous efforts to improve the survival rate of lip-oral-pharyngeal cancer are needed in the future.
This study analyzes consumer fashion purchase patterns from a big data perspective. Transaction data from 1 million transactions at two Korean fashion brands were collected. To analyze the data, R, Python, the SPADE algorithm, and network analysis were used. Various consumer purchase patterns, including overall purchase patterns, seasonal purchase patterns, and age-specific purchase patterns, were analyzed. Overall pattern analysis found that a continuous purchase pattern was formed around the brands’ popular items such as t-shirts and blouses. Network analysis also showed that t-shirts and blouses were highly centralized items. This suggests that there are items that make consumers loyal to a brand rather than the cachet of the brand name itself. These results help us better understand the process of brand equity construction. Additionally, buying patterns varied by season, and more items were purchased in a single shopping trip during the spring season compared to other seasons. Consumer age also affected purchase patterns; findings showed an increase in purchasing the same item repeatedly as age increased. This likely reflects the difference in purchasing power according to age, and it suggests that the decision-making process for purchasing products simplifies as age increases. These findings offer insight for fashion companies’ establishment of item-specific marketing strategies.
국내 소비자들의 식품 영양성분에 대한 관심이 계속적 으로 증가하고 있지만 영양성분과 관련된 식품의 소비자 선호도 분석 연구는 부족한 실정이다. 본 연구는 대국민 정보 서비스인 식품영양성분 데이터베이스 플랫폼에 수집 된 빅데이터의 로그분석을 수행하여 소비자들이 영양학적 측면에서 관심을 가지는 식품에 대한 선호도 결과를 제시 하였다. 수집 기간은 2020년 1월부터 2022년 12월까지의 3개년으로 설정하여 총 2,243,168건의 식품명 검색어가 수 집되었으며, 식품명을 병합하여 품목대표 식품명으로 가 공하였다. 분석도구는 R프로그램을 이용하였으며, 영양정 보를 확인하고자 하는 식품명의 검색 빈도를 전체 기간 및 계절별로 분석하였다. 전체 기간 동안 빈도수 분석 결 과, 한국인이 일반적으로 자주 섭취하는 쌀밥, 닭고기, 달 걀의 빈도수가 가장 높았다. 계절성에 따른 선호도 분석 결과, 봄과 여름에는 대체적으로 국물이 없고 뜨겁지 않 은 음식의 빈도수가 높았으며, 가을과 겨울에는 국물이 있 고 따뜻한 음식의 빈도수가 높았다. 또한, 외식업체에서 계절식품으로 판매하는 냉면, 콩국수 등과 같은 식품의 빈 도수도 계절성을 가지는 것으로 확인되었다. 이러한 결과 는 소비자들이 일반적으로 자주 섭취하는 식품의 영양정 보에 관심을 가지는 패턴을 확인할 수 있었으며, 소비 트 렌드와 간접적인 연관성을 가진다는 점에서 외식업계에서 계절별 마케팅 전략 수립 시 기초 자료로 활용될 수 있을 것으로 기대된다.